One-way Anova

In a one-way ANOVA, we are interested in whether there is a difference in the means of a continuous response variable across independent groups (identified by a categorical predictor variable).  ANOVA stands for analysis of variance. It may seem odd that we analyze variance to establish if there is a difference in means, but the idea is that understanding the variation around the means for the populations we sampled will help us determine if there really are differences by group. Let’s try to understand why by looking at the side-by-side boxplots below.

In the first set of boxplots, we see that there is some difference in center (the medians (dots) are slightly different), but there is a lot of variation within the groups too. The boxes are fairly large in size, overlap somewhat, and the groups have substantially overlapping ranges. It is hard to tell here if the means are really different between the groups due to the interplay of the between and within group variability.

 

In the second set of boxplots, there is clearly a difference in center (medians, means, any measure of center) between the groups. This is clear because the variation between groups is large (the medians are very different) and the variation within groups is small (the boxes are small, the ranges are small, and the ranges don’t overlap at all). Here, honestly, you would not even need to run the ANOVA to figure this out. But in the first example, the ANOVA will help us determine if the difference we see is really significant or not. And everything is done by comparing the between and within group variances. We will not go into detail about how, but that’s what happens!

  

Quick Test Highlights:

For an ANOVA, the null hypothesis is that there is no difference between the means for the different groups. The alternative is that there is at least one difference in means (at least one mean is different from another mean), or equivalently, not all population means are the same. The test statistic is an F statistic, and there is a corresponding P-value.

Example

Dispersal of seeds or offspring may dictate whether a species can survive in a complex landscape.  In particular, fragmented (patchy) populations may have difficulty persisting is dispersal is impeded or prevented.  Molofsky and Ferdy (2005) looked at how level of isolation impacted the number of generations that the plant Cardamine pensylvanica could persist.  They had four levels for their predictor variable, level of isolation:  isolated (completely blocked from adjacent populations), medium distance (separated by 23.2 cm), long distance (separated by 49.5 cm), and continuous (no separation).  We’ll use their data pulled from the relevant example from Whitlock and Schluter (2009).

For this test:

HO:  There is no difference in the mean number of generations of C. pensylvanica persisting by isolation distance.

HA:  There is a difference in the mean number of generations of C. pensylvanica persisting by isolation distance.

Excel Directions

Video

Data

 

Two-way ANOVA

In a two-way ANOVA, we can test the effect of two categorical variables (factors) on a continuous dependent variable. Basically, we want to see if the means for the groups differ based on either categorical variable, or maybe, both at once. This leads to the concept of an interaction between the variables.

Incorporating interaction in a two-way ANOVA is an interesting aspect of the analysis, but we want to examine the concept before doing examples with it. Let’s consider a simple example. Suppose you are measuring the amount of a crop produced as the response variable, and that your categorical variables are levels of fertilizer and levels of a pesticide. Generally speaking, you might expect more fertilizer to increase crop production, and similarly, more pesticide to increase crop production. However, if you had a lot of fertilizer AND a lot of pesticide, that amount of chemicals might be toxic to the crops and hence, actually result in a decrease in crop production. This is an interaction between the two factors. Interactions could also result in larger increases than expected (i.e. they do not need to be deleterious).  For more information, follow this link.

Understanding interaction is important because it modifies how we can talk about the results. For instance, in the example above, it would be inappropriate to talk about the effect of fertilizer without also referring to the level of pesticide. This is because the effect of fertilizer depends on the level of pesticide due to the interaction described.

There is an associated test in the two-way ANOVA that allows you to test for the presence of interaction. Sometimes, the presence of interaction is what you are interesting in testing for!

In the example below we are interested in how soybean plants respond to stress (shaking) and levels of light.  We could easily see how level of light might influence the leaf area a plant can produce or how stress might influence leaf area.  However, it is possible that light may counteract stress, resulting in an interaction of the two factors.

Example

A scientist is interested in how light levels may interact with physical stress to produce different growth responses in soybean plants with respect to leaf area (Samuels et al. 2012). There are two factors in this experiment:  light and stress.  Plans either have low or moderate light and are either exposed to stress or not.  There are 13 replicates in each of these four combinations. This is referred to as a two-way ANOVA layout with replications.

Excel Directions

Video

R directions

Data