The Standard Deviation
The purpose of this e-mail message is to clarify the meaning of an important statistical concept. The concept I'd like to focus on is the standard deviaion (which is abbreviated as SD). Because researchers regularly report the SD (along with the mean) when they summarize a group of scores, it's important that you know what the SD is and what it isn't.
Because of its name, many people think (incorrectly) that the standard deviation indicates the "standard" (i.e., average) distance that individual scores lie from the mean of the scores. This misconception would cause one to suspect that the SD is computed by first adding up all of the deviations (i.e., the distances that scores deviate from M) and then dividing that sum by the number of scores in the group. For example, if we had these 5 scores [2, 6, 8, 10, and 14], some people would think that the "standard" deviation comes about by averaging the "deviations" of the 5 scores from the mean (M = 8). This notion would lead one to expect that the SD, for those 5 scores, is equal to (6 + 2 + 0 + 2 + 6)/5 = 3.2.
Although the standard deviation is, in fact, based on deviation scores, the process of getting the SD is a bit more complicated than what's indicated above. To get the real SD, one must do 4 things: (1) determine how far each individual score deviates from the mean, (2) square these separate deviation scores, (3) take the arithmetic average of the squared deviation scores, and (4) take the square root of the mean squared deviation score. For the 5 scores presented in the previous paragraph, the squared deviation scores are 36, 4, 0, 4, and 36. The mean of these squared deviations is 16. And the square root of 16 is 4. Hence, the SD of those original 5 scores is 4.
In some older statistics books, the standard deviation was defined as the "root mean squared deviation." This 4-word definition describes, in backwards order, the four steps described in the previous paragraph: compute each deviation, square them, take the mean, take the square root. In my opinion, this really isn't a very good definition, for it describes the steps one takes to get to the right answer, not the meaning of the final result.
If you asked me to come up with a better definition of the SD, I think I'd offer a four-part response. First, I'd point out that it's a single-number summary that measures the degree of "spread" among the scores in a group (and in that sense it's like the range, the SIQR, or any other measures of variability). Second, I'd point out that an SD has a lower limit of 0 but no upper limit (and this characteristic makes an SD similar to all other measures of variability.) Next, I'd point out that the SD is based upon--that is, influenced by--each an every score in the group (a characteristic not associated with the R or SIQR). Finally, I'd indicate that the process of squaring the deviation scores gives more "weight" to scores located far from the mean as compared with scores located close to the mean.
In case the last portion of my 4-part definition didn't make any sense, allow me to illustrate what I mean through an example. Those 5 scores we considered earlier were 2, 6, 8, 10, and 14. Those scores, we saw, had a mean of 8 and a standard deviation of 4. Now let's change two of the scores, increase one score by 1 point and decreasing a different score by 1 point. To be more specific, let's change the 6 to a 7 and the 2 to a 1. These changes will not afftect the mean; once gain, M = 8. But the standard deviation is affected. For the new data [1, 7, 8, 10, 14], the squared deviation scores are 49, 1, 0, 4, and 36. The mean of these squared deviations is 90/5 = 18, thus causing SD to be equal to 4.25. Even though the mean of the deviations scores was unaffected by the changing 6 to 7 and 2 to 1, the change of 2 to 1 caused the new score to be weighted more heavily, as shown by the increase in SD from 4 to 4.25.
Perhaps one last comment about the standard deviation will be helpful (especially to those of you who like pictures better than formulas). If a histogram is prepared for any set of data, it would be possible to put a dot on the abscissa (i.e., the line on which the histogram sits) to indicate the numerical value of the mean score. The SD can be thought of as a "yardstick" that extends equally far above and below the mean. In a normal distribution, this yardstick will extend to points on the baseline directly below the places where the curved line (forming the "bell-shaped" curve) where the curved line change from being convex to concave. These two points on the curve, one on each side of the middle "hump," are technically called the "points of inflection." Drop down from either point of inflection until you cross the abscissa, and that's precisely where you'll find the end of the yardstick I've mentioned.
I know this email message has been long and probably more detailed than you wanted it to be. It represents, however, my best effort at explaining what the standard deviation is. If nothing else, I hope you'll leave this message knowing that a standard deviation is NOT simply the average of the deviation scores.
Copyright © 2012
Schuyler W. Huck