Linear regressions, regression methods in general, as well as inductive
methods, differ from descriptive analyses mainly in the reference of the model.
Regressions always refer to an underlying model and try to verify whether the
model assumptions are correct or can be rejected, and/or to quantify the effects.
On the following pages, I do not want to give you the details of the individual
methods or the calculation steps, but I would like to give you a graphical
representation of the idea behind regression and explain some aspects for your
understanding.
In most cases, and for a linear regression this can be shown most easily, the
aim is to fit a curve (model) to an existing data set in the best possible way. In
simple linear regression, this is a straight line, which you can also fit manually to
a data cloud and then compare with the regression result and the true relation. In
addition, we also provide the real and estimated parameter values to visualize the
contrast.
We start by presenting the true relation and the generation of the data. Of
course, both are not known for real problems and are therefore suppressed on the
following pages.
In our example, the true relation is:
where the slope (linear relation) is
, the constant is
and the measurement
error or error term is .
I.e. for each data point (measured value) the y-value depends on the x-value, where
applies and the error term
is added. The x-value
is thus multiplied by ,
and as well as a
random error term
are added.
If you click the button "Generate and estimate data", data will be generated according to the parameters of your model and the linear regression estimation will be performed.
Here, you can set the parameters and
For technical reasons, the number of data points n is limited to 2000. Higher numbers would lead to too much computational effort and should be applied with suitable statistical programs. Here, for the didactic effect, a number of 200 seems sufficient to us.
Number of data points:
In this graphic, data points are randomly generated around the given curve. For this purpose, x-values are determined randomly, the corresponding y-value is calculated according to the true model, and then a random error term is added. Then a straight line is estimated, which fits as good as possible into the data cloud. Both the type of the true relation (here straight line) and the type of error (here independent of the x-value and additive) can be designed differently. The adjustment is done here using the method of least squares, which we will discuss in detail on the corresponding page.
wahr | geschätzt | |
---|---|---|
α | ||
β |
In our example, the relationship of x- and y- values depends on two parameters: the y- axis intercept (level) and the slope . The table below compares the true values and the estimated values.
If you enter as the true value, the true relation is that the y value no longer depends on the x value. The straight line runs flat (horizontal). The estimated value should then also be close to 0. In Statistics you will learn that the estimated value for does not differ significantly from 0, i.e. due to the data situation you cannot say for sure whether the true (y does not depend on x), or is slightly different from 0 (y depends on x). You can find out more about this in your statistics lecture.