In linear regression, the aim is to model the relationship between a dependent variable Y and one or more explanatory variables denoted by X1, X2, ..., Xk. We hypothesize that the relationship between dependent and independent variable is linear.
Purposes regression analysis:
- Identify the relationship: examine whether independent X does have an impact on dependent Y;
- Measure the degree of the association: know about the direction and the strength of this effect;
- Measure the error.
THE MODEL, WITH AN EXAMPLE
A good start is to draw scatter plots. The dependent variable Y is on the vertical axis, the independent variable X on horizontal axis.
The model is:
Y = aX + b
a = (∑XY - nXY) / (∑X2 - nX2)
b = Y - aX
Where,
X = Mean of the independent variable
Y = Mean of the dependent variable
a = Slope of the least-squares regression line
b = Value of Y when X equals zero
n = Number of observations
The linear model is:
Y = 2.28X - 23.01
STATISTICS
The standard error of estimate.
'How closely the actual data fits the regression line'
Syx = √( ∑(Y - Ypredicted)2 / (n - k) )
Where:
Y = Mean of the dependent variable
Ypredicted = Predicted depended Y
e = Errors = Y - Ypredicted
k = Number of estimated parameters (a and b), here 2
Syx = 5.030
Coefficient of estimate.
The signicance of the standard error of estimate
Calculate the confidence interval of the prediction. It is done with the standard deviation.
Y = Ypredicted +/- Standard-deviation * Sxy
Y = (aX + b) +/- Standard-deviation * Sxy
For a 68 percent confidence, the standard deviation is 2
For the example serie:
Yconfidence = (2.28X - 23.01) +/- 2 * 5.03 = (2.28X - 23.01) +/- 10.06
Coefficient of determination (R-square).