Simple linear regression requires several conditions of the data collection and the model.  We include R-code for checking assumptions for this test on the R-directions page.

  1. Linearity – There must be a linear relationship between the variables. In the scatterplot of the two variables, the data points form a straight line.
  2. Zero Mean of Errors – The errors, or differences between projected (the regression line) and actual data points, have a mean of 0.  Note that this condition is automatically satisfied with the regression methodology we use here, so there is no need to check with this simple linear regression procedure.
  3. Constant Variance of Errors – The variability of model errors remains the same and does not change depending on the x-value.  To check this condition, generate a residuals vs. fitted plot and look for a random scatter of points around the line (i.e. absence of a pattern).
  4. Independence of Errors – The difference between an actual data point and its predicted location by the regression line does not influence this difference for other data points.  This is not checked by a graph — generally we consider this condition to be satisfied if the data is collected in such a way that data points are independent.
  5. Randomness – For an observational study, subjects must be selected randomly from the relevant population.  For an experiment, there usually must also be some type of random assignment.
  6. Normality of Errors – The model errors follow a normal distribution.  While model errors can be plotted in a histogram or density plot to see if they form something visually resembling a normal distribution, this condition is often checked using a normal quantile plot, in which errors must approximately follow the normal line.