R directions simple linear regression | Statistics for Biology

Here is what the same analysis run in R would look like.

Note that you can import a dataset from Excel instead of manually entering it as was done here. This is particularly useful for larger datasets.

Before using the model, we should be careful that it meets the requirements of simple linear regression.

Linearity (a linear relationship between variables) – We already checked that this was true using a scatterplot in the previous page.
Zero Mean of Errors – Automatically satisfied by our regression methodology (there is no need to worry about this condition at all with this simple linear regression procedure).
Constant Variance of Errors – We can check this condition with the R code below:

For this condition to be satisfied, the points must be randomly scattered above and below the 0 residual line (i.e. no clear pattern is visible). The graph allows us to determine that this condition seems to be satisfied .

Independence of Errors – We will assume the data was collected in a way that assured independence.
Random Sampling – We will assume the data was collected using a random process.
Normality of Errors (errors follow a normal distribution) – We can check this condition with the R code below:

nstead of plotting all the errors in a histogram or density plot, we will use a normal quantile plot (as is usual of regression) to ensure that this assumption is met. We also visited this methodology on the normal distributions page. A normal quantile plot shows the standardized (i.e. for scale) residuals against a line representing “perfectly” normally distributed data of the same number of observations. Deviation from the line represents deviation from the normal distribution. There is little of such deviation in this graph. All points are within a reasonable distance from the line except maybe one (the point labeled by a red 4). Thus we would say the normality of errors condition is met.

By our analysis, all conditions of simple linear regression were met, so we can proceed to interpreting the model. If this had not been the case, we would not have been able to use the model, and perhaps would consider a nonparametric alternative instead. If the conditions were questionable, but not clearly violated, we could note that we would be proceeding “with caution”.

Also note that while we should not interpret the model before checking conditions, we must create the model in R with the lm() function before being able to create the graphs that check the conditions, as the mplot() function used to create residual plots requires a created model as its argument. This requirement is reflected in the order of R screenshots on this page. The model output is pictured again below, used in interpreting the model.

The slope of 11.68 indicates that for a one unit increase in bite force, we expect an average of 11.68 units of increase for territory area. With an intercept of -31.54 , the equation of our regression line is 11.68(bite force) – 31.54 = territory area.

The F test statistic and P-value are also close to what we obtained with Excel. The F test is an overall model utility test. The t-test statistics are used to assess significance for individual predictors, and would be of more use in a multiple linear regression. In simple linear regression, the two test procedures are equivalent. You can note the P-values match here.

Based on either the F-test or the individual slope t-test statistic of 2.41 with associated P-value of 0.039, the predictor is significant for predicting the response (i.e. the slope is significantly different from zero).

We see the same R-squared value of .392, which means we are only explaining 39.2% of the variation in territory area using its linear relationship with bite force. This suggests the model fits moderately well.