In my previous post, I discussed effect sizes and provided one conceptualization of them as measures of how easy it is to see your effect, unaided by measurement, in the population. The two measures I am going to discuss here are Cohen’s d, and eta/partial-eta squared.
According to a variety of internet sources, Cohen has defined d’s of .2, .5. and .8 as delimiting small, medium, and large effect sizes, respectively (I cannot get access to his 88 book, so I am not able to say these things with certainty). With respect to the proportion of variance accounted for by the effect (i.e., eta squared), he has set .02, .13, and .26 as the delimiters. As with anything to do with statistics there are a variety of arguments against these descriptors, and they are generally considered very rough guidelines at best. Because d represents the number of standard deviations apart a set of means is, it also represents overlap. A d of .2 represents an overlap of about 85%, .5 at about 67% and .8 at about 53%.
I think it is hard to really visualize effect sizes. There is an excellent demonstration maintained by Kristoffer Magnusson (his entire site is excellent and recommended reading) that allows us to visualize what is meant by effect size, to a certain extent. The image below was copied from a screen shot of his work.
Here, we have two nicely smoothed distributions that are spaced about .9 standard deviations apart. That would be, by Cohen’s criteria, a large effect. By looking at these, it is relatively easy to see that they are two different distributions. But, in a way, the variability represented by the width of the curves is hidden by their smoothness and the coloring. Looking at two distributions is not quite the same as observing an effect of a variable in the absence of measurement.
What I tried to do was to construct a representation that allows the interaction of variance within and between groups to be better represented. I created a spreadsheet that generates 2-d scatterplots of points drawn from two different populations.
(When you go to the Google Drive to download the sheet, it will display the contents as if there were 174 different pages. Just look for the download link (usually an arrow pointing down at the top of the page) to download the file. The file will not open and function correctly in Google Sheets.)
The cell and scroll bar at Top left allows you to enter the size of your sample, and your desired effect size as Cohen’s d. You can type in an effect size directly at cell F3, but you must enter one 10 times greater than what you want (i.e., enter 5 for an effect size of .5).
You can also modify the standard deviations of the points. The spreadsheet will then generate 2000 sets of points, (1000 per group) such that the standard deviation of their variation in each of X and Y will be equal to what you set, and the effect size as measured between their centroids will be more or less equal to the d you set. These two large samples will be treated as populations and are graphed in the scatterplot on the right of the screen.
On the left of that plot you will see a scatterplot of the two small samples drawn from the larger samples.
The spreadsheet reports the local univariate stats on the samples (i.e., means, standard deviations on X and Y separately) as well as doing a 2d ANOVA on the samples.
All of that information is updated in real time with changes in the values of the Sample Size, desired d, and Standard Deviation.
Below, for example, is an ANOVA on two samples of size 30 were drawn from populations with an effect size of d=.7. The ANOVA shows the means to be significantly different a p =.02 with d estimated at .59.
To the Left of the ANOVA table are the stats on the two larger samples (populations).
To arrive at effect sizes for these 2-variable distributions I calculated a sum of squares for the difference between the distributions (distance between the two group centroids squared times the number of points per group). Within each group the sum of squared distances from their respective centroids was computed to get the “Error” term, and finally the sum of squared distances from every point in each group from the grand mean was calculated to get the Total. As in standard ANOVA, SS total = SS Effect + SS Error. Eta squared is simply SS Effect / SS Total.
d was calculated by taking the distance between the group centroids and dividing it by the average within-group error.
There are many fun things to do with the spreadsheet. First, fix a sample size and play with modifications of the effect size. Then you can get a feel for how big effect sizes need to be to really be able to see them in your samples and populations. To be readily apparent, they generally need to be bigger than even Cohen’s “Large” effect size.
You can also see the difference between “significance” and effect size. For example, set the sample size to 5 and the effect size to .5. Now, press Cntrl+alt+F9 all at the same time. That key combination should cause the sheet to generate new samples and update all cells. Keep doing that repeatedly.
As you press Cntrl-Alt-F9 repeatedly, there are many things to observe. First observe that the Global parameters based on N = 1000 change very little. The statistics for the smaller samples change more considerably. The Red triangle and Blue Square should jump around quite a bit. That reflects the sample-to-sample variability you get with small N.
Notice also the p in the ANOVA table. Seldom will it appear < .05. It is only when the sample-to-sample variability produces a large effect size that p will be < .05. The two “populations” really are different. They were programmed to be different with an effect size of about .5. But, whenever the samples accurately estimate that effect size, no significant difference is found (a type II error is made).
With a small effect size in the population, and small samples, we only make the correct inference as to the populations being different when the samples incorrectly over-estimate the population effect size. For that, there are a number of “corrections” (.e.g., Hedges g for d, omega-squared for eta etc.) that can be applied to effect sizes to compensate for that overestimation. A fun exercise would be to head over to Wikipedia, grab the formula and enter them into the spreadsheet to see how well these compensations work.
Now, keeping N set at 5, increase the effect size to something Cohen would think of as huge, like 1.5 and repeat your cntrl-alt-F9 cycling. What you should observe is that we find more significant differences, and over-estimate the population effect size less in those cases. With small N we don’t know very well if a significant difference reflects a big effect in the population, or an over-estimated effect in the population.
Next, investigate the effect of changing N. Set the effect size to .5 again, and increase N to 50. As you cycle with cntrl-alt-F9 you should observe considerably less variability in your group means (red triangle-blue square). P should be significant more often, and our over-estimations of the effect size should be considerably reduced.
Creating this spreadsheet was a fun exercise, and it was very instructional for me. I hope that you find it at least somewhat useful in understanding effect sizes, significance, and their relationships to sample size.