Simple linear regression (SLR) is a type of linear regression and it’s similar to correlation in the sense that they both use interval data and focus on relationships between two variables. While in Pearson the goal is to find out whether or not two variables are correlated, in SLR the a variable (predictor variable or variable A) is used to predict another variable (criterion variable or variable B). Normally, the values of both variables are visualised in a scatterplot with the predictor variable in the X axis and the criterion variable in the Y axis.

The regression line

The regression line is defined by the regression equation: Y’ = A + BX. A is the point at which Y’ is computed when X is 0. The regression line is called so because it represents the “effect” of the predictor values on the predicted values. The inclination or slope of the regression line indicates the degree to which the predicted values are predictable by the predictor values.

Residual error

The regression line can be thought of as an ideal state where predictor and predicted variables are completely linear. However, it is almost always the case that not all values will fall within the regression line. The source of this variance is said to be due to individual differences. This variance or deviation is called residual error and is similar to ANOVA’s error variance. The deviation for every participant can be calculated as the difference between the predicted Y’ value and and the actual Y’ value. This difference represents residual error.

Rationale of ANOVA to test the linear regression

We need to know whether the regression line is a good (i.e. statistically significant) predictor of the predicted variable values. ANOVA (in the good ol’ predicted variance divided by error variance) is used to assess the significance of the predictions made from the predictor variable. The two sources of variances are predicted variance and residual error. If ANOVA gives a non-signficant output, it means that there is too much residual error (i.e. reflecting too much deviations from the regression line). This would support the null hypothesis that all the pair-values in the scatterplot are scattered randomly and makes drawing a regression line impossible.