The 7-Step and 9-Step Versions of Hypothesis Testing Dear Students, In an attempt to help you with the 7-step and 9-step versions of hypothesis testing, I'd like to do two things right now in this email message. First, I'm going to type out a paragraph of material that appears in the 4-page journal article that I gave you at the end of class yesterday. Then, I'll try to draw a connection between the quoted paragraph and the two versions of hypothesis testing discussed in Ch. 8. Here's the paragraph from the Olds/Abernethy article: "Post-Exercise Oxygen Consumption Following Heavy and Light Resistance Exercise": Statistical Analyses Standard descriptive statistics were used in all analyses. One-way analyses of variance (ANOVA) was used to compare ROC following the three conditions. Fisher's PLSD test was used in post hoc analyses. Effect sizes were calculated between the resting and light, and [between] the resting and heavy conditions. The effect size is the difference between the mean values of two samples divided by the standard deviation. Effect sizes are often used to give an idea of the magnitude of the difference between groups, and to circumvent difficulties arising when the sample size is small (failure to reach statistical significance) or large (where small differences may be statistically, but not practically, significant). Effect sizes of about 0.2., 0.5, and 0.8 are often categorized as small, moderate, and large, respectively. Ok. That's the paragraph from the Olds/Abernethy article. And here come a few comments from Sky: Don't worry AT ALL about the 2nd or 3rd sentences; they are NOT germane to the material in Chapter 8. The most important part of this passage is contained in the next-to-last sentence. The exceedingly important point made by the authors in this sentence is as follows: In hypothesis testing, a sample size that's two small is likely to produce a fail-to-reject decision . . .even if there's a large and true difference between the means of the populations associated with the two samples being compared. In other words, a sample size that's two small will tend to bring forth a Type II error, thus causing the researcher to "miss" detecting something important that occurred. In hypothesis testing, a sample size that's too large is likely to produce a reject decision . . . even if there's a small and trivial difference between the means of the populations associated with the two samples being compared. In other words, a sample size that's too big can create a situation wherein a "significant difference" is shown to exist between two sample means . . . when, in fact, the difference between the sample means is tiny and "significant" only in a statistical sense and NOT AT ALL in a practical sense!!!!!!!!! The concept of "effect size" is at the heart of the 9-step version of hypothesis testing . . . and the ES is simply a judgment by the researcher as to the line of demarcation between "small" (i.e., trivial) differences, on the one hand, and "large" (i.e., important) differences, on the other. The 9-step version of hypothesis testing requires a researcher to specify his/her opinion as to ES . . . and then the appropriate sample size is determined (via a formula or a chart) so the researcher will have a decent chance of reaching a "reject" decision if there is a bigger-than-ES difference between the study's population means. What's a "decent chance" is usually considered to be about 80%, and this number, when converted from a percentage to a decimal, is called the test's "power." In summary, the 9-step version of hypothesis testing allows a researcher to design his/her study so the sample size(s) won't be too small or too large; this is done by (a) asking the researcher to specify ES, (b) asking the researcher to specify power, and (3) asking the researcher to determine (via formula or chart) the "proper" number of subjects to have in the study's sample(s). These 3 steps are positioned as Steps #4, #5, and #6 of the 9-step version of hypothesis testing. In the 7-step version of hypothesis testing, the researcher simply adds a step after rejecting or failing-to-reject the null hypothesis with the 6-step procedure (wherein there's no pre-planned ES, power specification, and sample size determination). The 7th step involves using the sample data to either (1) show that sample size was not "too small" or "too large" or (2) look at the sample means and evaluate the observed difference as being either small-and-trivial or big-and-important. The first of these options asks the researcher to specify ES (as in the 9-step version of hypothesis testing) and then determine, via formula or chart, how much power there was in light of the sample sizes that were used. The second of these options allows the researcher to compute a strength-of-association index, or to estimate the observed magnitude of effect. In the Olds/Abernethy study, the 7-step version of hypothesis testing was used. After rejecting or failing-to-reject any null hypothesis, the researchers computed the "effect size" based upon the sample data. They deserve high marks for going past the 6 steps that form the most basic version of hypothesis testing. What they did in the 7th step, using my terminology, was to compute an estimate of the observed magnitude of effect. I prefer my longer term to their short phrase "effect size," because they computed their "effect sizes" from the data and are, in a very real sense, telling us "what they got" (rather than using the term "effect size" to tell us their OPINION as to the line of demarcation between small and large differences). Olds and Abernethy do, however, tie their computed "effect sizes" back to some judgmental "benchmarks," for they make reference to Cohen's standardized numerical values of 0.2 (a "small" difference), 0.5 (a "medium" difference), and 0.8 (a "large" difference). Sky Huck