Would you like me to measure yours first?

Hi there everyone…

21 to win
—Rob Zombie, “Dragula”

One of my smartest friends sent me a note saying that she didn’t understand the difference between effect size and significance.  Even given my self-imposed 400 word limit, if I couldn’t be clear to her (who is way smarter than me), I didn’t do a good job.

So, let’s try again, shall we? Strictly speaking, I’m not going to savage someone else’s math here. I’m going to savage my own.

The textbook definition of “statistical significance” refers to the likelihood that two things would differ simply by bad luck.  

// Side note: If I were political, I would suggest that a great way to study surprising statistical significance would be to examine the voter results from Ohio in the first election of Bush II. But I’m not that cynical, so let’s stay grounded, shall we? //

Before going into this point, let me emphasize that statistical significance is usually wrong. Reality rarely matches statistical assumptions.  In the real world, stuff is messy. In stats-land, all is lovely and pristine, smelling not of the putrefaction of modern politics but rather of the perfume of your favorite lover.  In the real world, sadly, your statistical lover is oft the morn, not the eve.

Effect size captures how much any particular difference matters.  A big effect size means that the thing you are looking at REALLY makes a big difference in what you’re measuring.  Effect size and significance are related: The bigger an effect size, the easier it is to find a significant difference.  But the semi-converse is also true: Although a little effect is hard to find, given enough cases in your data, even a little teeny effect will be statistically significant.

Let’s get practical, shall we?

I need someone to hold on to
—Nine Inch Nails, “Terrible Lie”

Imagine, if you will, that we are using 2 rulers to measure 3 pieces of 8.5x11” paper. We are wondering whether, in fact, the rulers — notionally 12” long — are the same as each other.  The first ruler measures the 3 pieces of paper as being {11.1”, 11.1”, 11”}. This sounds pretty good, yes? The second ruler measures the same pieces of paper as {“10.9”, 10.9”, 11”}. Pretty close, yes?

According to the standard statistical test you’d use here, these two rulers are different from one another with a “p-value” of less than .05 (the normal standard).  So, they are significantly different, despite varying by only about 2%.  They are different, but, really, who cares — the effect size is about 2%.

A small effect, but a significant one. I don’t think anyone will get rich on that 2%, although most papers, journalists, and reports will report simply the significance of the difference.

Check the effect size. Do you really care, at the end of the day, about 2% in rulers? Probably not.