f u t u r e ra
Docs: forecasting / Toc / Regression

SUMMARY

-linear regression

In linear regression, the aim is to model the relationship between a dependent variable Y and one or more explanatory variables denoted by X1, X2, ..., Xk. We hypothesize that the relationship between dependent and independent variable is linear.

Purposes regression analysis:

- Identify the relationship: examine whether independent X does have an impact on dependent Y;
- Measure the degree of the association: know about the direction and the strength of this effect;
- Measure the error.


THE MODEL, WITH AN EXAMPLE




A good start is to draw scatter plots. The dependent variable Y is on the vertical axis, the independent variable X on horizontal axis.



The model is:

Y = aX + b
a = (∑XY - nXY) / (∑X2 - nX2)
b = Y - aX

Where,
X = Mean of the independent variable
Y = Mean of the dependent variable
a = Slope of the least-squares regression line
b = Value of Y when X equals zero
n = Number of observations

The linear model is:
Y = 2.28X - 23.01


Calculation
X = (20 + 20 + ... + 45) / 12 = 372 / 12 = 31.00
Y = (27 + 23 + ... + 84) / 12 = 572 / 12 = 47.67

∑XY = 19205
(20 * 27 + 20 * 23 + ... + 45 * 84) = 19205

∑X2 = 12178
(20 * 20 + 20 * 20 + ... + 45 * 45) = 12178

a = (∑XY - nXY) / (∑X2 - nX2)
a = (19205 - 12 * 31 * 47.67) / (12178 - 12 * 31 * 31) = 2.28

b = Y - aX
b = 47.67 - 2.28 * 31 = -23.01


Scatter chart with the linear model


Table and chart of the prediction



Chart of the X, Y and Y-predicted



STATISTICS



The standard error of estimate.
'How closely the actual data fits the regression line'

Syx = (  ∑(Y - Ypredicted)2 / (n - k)  )

Where:
Y = Mean of the dependent variable
Ypredicted = Predicted depended Y
e = Errors = Y - Ypredicted
k = Number of estimated parameters (a and b), here 2

Syx = 5.030


Calculation
Syx = (  ∑(Y - Ypredicted)2 / (n - k)  ) = (  ∑e2 / (n - k)  )

n = 12
k = 2
∑e2 = (19.492 + 0.172 + ... + 19.457) = 253.004

Syx = ( 253.004 / 10) = 5.030


Coefficient of estimate.



The signicance of the standard error of estimate

Calculate the confidence interval of the prediction. It is done with the standard deviation.

Y = Ypredicted +/- Standard-deviation * Sxy
Y = (aX + b) +/- Standard-deviation * Sxy
For a 68 percent confidence, the standard deviation is 2
For the example serie:

Yconfidence = (2.28X - 23.01) +/- 2 * 5.03 = (2.28X - 23.01) +/- 10.06



Table and chart of the confidence




Chart of the prediction and confidence


Coefficient of determination (R-square).


R2 = (Explained variation) / (Total Variation)
R2 = 1- (Unexplained variation) / (Total Variation)
R2 = 1- (Syx2 / Sy2)
R2 = 1- (5.032 / 18.1222) = 0.923

R-square = 1 means that all points are on the regression line. More squatter mans that R-square moves to 0.

© 2025 futurera

BETA