Simple linear regression requires several conditions of the data collection and the model. We include R-code for checking assumptions for this test on the R-directions page.
- Linearity – There must be a linear relationship between the variables. In the scatterplot of the two variables, the data points form a straight line.
- Zero Mean of Errors – The errors, or differences between projected (the regression line) and actual data points, have a mean of 0. Note that this condition is automatically satisfied with the regression methodology we use here, so there is no need to check with this simple linear regression procedure.
- Constant Variance of Errors – The variability of model errors remains the same and does not change depending on the x-value. To check this condition, generate a residuals vs. fitted plot and look for a random scatter of points around the line (i.e. absence of a pattern).
- Independence of Errors – The difference between an actual data point and its predicted location by the regression line does not influence this difference for other data points. This is not checked by a graph — generally we consider this condition to be satisfied if the data is collected in such a way that data points are independent.
- Randomness – For an observational study, subjects must be selected randomly from the relevant population. For an experiment, there usually must also be some type of random assignment.
- Normality of Errors – The model errors follow a normal distribution. While model errors can be plotted in a histogram or density plot to see if they form something visually resembling a normal distribution, this condition is often checked using a normal quantile plot, in which errors must approximately follow the normal line.