Binoculars and Significance
When you read statistically-based research studies, it is important to realize that there is a connection between n and p. The first of these two things, n, is the sample size, while the second of these two things, p, is the computer's assessment of how probable the sample data are if the null hypothesis happened to be true. Of course, if p is smaller that the researcher's alpha level, the finding is said to be statistically significant.
Now, if the sample size is quite large, p will be small (thus "beating" the alpha level) . . . even if the sample data deviate just a little from whatever the null hypothesis says. For example, if the null hypothesis says that the average IQ of a population of males is equal to the average IQ of a population of females, p will turn out to be less than .05 even if the two sample means are very similar (such as 104.6 and 104.8) IF THE SAMPLE SIZES ARE GIGANTIC. Hence, it's possible for a researcher to claim correctly that his/her finding is "significant," but it might well be significant in only a statistical sense. If there's only a tiny difference between the sample data and Ho, there may well be nothing significant in any meaningful (or "clinical" sense) sense . . . even though the result turns out to be "statistically significant."
If the sample size is quite small, just the reverse can happen. For example, suppose the male and female mean IQ scores, in the two populations being studied, are 104.6 and 124.6, respectively. That difference, most assuredly, is a gigantic difference with huge practical implications. However, if two tiny samples are used to evaluate a null hypothesis that says males and females in these two populations have equal mean IQ scores, and even if the two sample means are quite DISsimilar (e.g., 103.2 and 125.1), the small n might well function to make p larger than alpha, thereby bringing about a fail-to-reject decision regarding the null hypothesis.
A pair of binoculars, I believe, metaphorically shows this connection between n and p. If you look through the wrong end of the binoculars, things that are really big will appear very small, and as a consequence you might think that two things are similar when they're really different. This is like comparing 2 sample means in a study where each sample's n is very small; even if the two means are quite different, they will be made to look similar because of the small n.
Now, suppose that you look through the binoculars in the proper fashion...and also suppose that the binoculars have been built so as to have a very high "magnification power." Here, two things that you see through these powerful binoculars might seem quite different when they truly are not very different at all. Their dissimilarity, you might say, has been exaggerated by the high power magnification. This is like comparing 2 sample means in a study where each sample's n is very large; even if the two means are very similar, they will be made to look different because of the large n.
Allow me to summarize by saying this. Any researchers or consumers of the research literature who focus exclusively on p-levels are likely to see things in a distorted fashion, just as binoculars can create the illusion of similarities or of differences depending upon which end is held next to your eyes. To be more discerning (and fair) when interpreting decisions based on the hypothesis testing procedure, you'll need to do two things.
First, examine closely the study's means or correlations or percentages (or whatever it is that represents the study's "statistical focus") in its/their "raw" state. For example, if a study compares the mean IQ of men and women, look closely at the means. Do they seem, in YOUR opinion, to be close together or far apart. Second, consider the sample size. If n is tiny or if n is gigantic, then entertain the possibility that the study's reject/fail-to-reject decision has been unduly influenced by the sample size.
If, in your opinion, two means are far apart, then a decision to reject the null hypothesis permits you to think of the two means as being significantly different, both in a statistical sense AND in a practical sense. If, however, the two means seem to you to be close together, then judge the difference between them to be of trivial importance even if p<.001.
If, in your opinion, two means are close together, then don't let a researcher pull the wool over your eyes and persuade you to think that there is a meaningful difference between them; it's possible that a statistically significant difference (but NOT a difference of any practical import) exists simply because or giant sample sizes.
In a nutshell: a small p (e.g., p<0001) may or may not indicate that something worthwhile has been detected; on the other hand, a large p (p>.05) may be the result of a small n rather than a null hypothesis that's really true (or even off by just a little).
If you're still reading, you deserve credit for staying with this "epistle" to its conclusion. I hope it has helped.
Copyright © 2012
Schuyler W. Huck