Two-Sample KS Test R Example

Let’s first work with the example in the main KS Test page.  A two-sample t-test or a Mann-Whitney test would not detect the difference between these two distributions as their means and medians are both the same, and the distributions do not meet the conditions for these tests anyway.  Would a KS test pick up the notable visual difference between these two distributions, in spite of their superficial similarities in summary statistics?

Not entirely.  The p-value is 0.382, which is not significant at a standard 0.05 significance/alpha level.  Still, this is a lower p-value than we would obtain using a two-sample t-test or Mann-Whitney test on the same data (because the means and medians between the two distributions are identical, these procedures would yield a p-value near 1).  However, we don’t have conclusive statistical evidence that the two distributions are shaped differently, but it does appear that the KS test is picking up some things relating to their shapes that the two-sample t-test and Mann-Whitney test would not.

Now, let’s try a real-world example.  Recall that the KS test can be used instead of the two-sample t-test or Mann-Whitney tests in situations where the conditions for both fail or when you want to test for differences in the distribution outside of centers.  We will try the KS test with the same example we used with those tests since it can be used in a similar setting.  Recall the description of the dataset from the two-sample t-test page:

Professor Temeles studies plant pollinator interactions. Sometimes bees ‘rob’ flowers of their nectar by piercing the flower from the outside to obtain nectar rather than by entering the flower and picking up pollen which then gets transferred to other plants.  This means that the plant does not obtain the benefit of pollination, but the bee does obtain nectar.  Pierced flowers may suffer other consequences, with one potentially being a shorter life span (hours) of the flower.

Also recall that the data had some concerns with the normality and shift model conditions that we noticed previously, though we proceeded with caution in the two-sample t-test and Mann-Whitney test examples.

Suppose we were a little more concerned about the differing shapes of the distributions (notice that the pierced curve has a higher kurtosis than the unpierced curve, and the spreads appear to be different) and also wanted to test about these differences in shape in addition to the differences in center.  A KS test would be appropriate in this situation.  We will assume that the data was collected in a way that upheld the randomness and independence conditions.  Other than those, we have no additional conditions to check for the KS test.

The code to run this procedure will differ depending on how your data is stored. The code is somewhat complicated for long data (see long vs. wide data page), as the ks.test() command does not accept categorical variables:

With wide data, we get a simpler command, because we can use the *with* function to access both variables in the data set:

With a p-value of 0.287, insignificant under a typical 0.05 alpha level, we fail to reject the null hypothesis that the two samples come from groups of the same distribution.  Thus, we can’t say that the distributions of lifespan for pierced and unpierced flowers are significantly different.  This may feel like a surprising outcome, as we rejected the null hypothesis with the same data in both the two-sample t-test and Mann-Whitney test examples.  However, this may be the low power of the KS test coming into play, and because of it this test requires stronger differences than the others to reject a null hypothesis.