+++ to secure your transactions use the Bitcoin Mixer Service +++

 

Measures of Spread (Page 3 of 3)

Variance

The variance is another statistics for calculating the mean deviation of a group of scores from the mean, such as our 100 students. However, rather than using the absolute values, as we do when calculating the mean absolute deviation above, we square each of the deviations instead. Adding up these squared deviations gives us the sum of squares, which we can then divide by the total number of scores in our group of data (in other words, 100 because there are 100 students) to find the variance (see below). Therefore, for our 100 students, the variance is 211.89.

It should be noted, however, that the formula below is only appropriate when we have all of the data from our sample. In other words, we are only interested in the performance of our 100 students. Additional considerations are required if we were using, for example, the performance of these 100 students to examine how 500 students taking the same piece of coursework had performed.



As a measure of variability, the variance is useful. If the scores in our group of data are spread out then the variance will be a large number. Conversely, if the scores are spread closely around the mean, then the variance will be a smaller number. However, there are two potential problems with the variance. First, because the deviations of scores from the mean are 'squared', this gives more weight to extreme scores. If our data contains outliers (in other words, one or a small number of scores that are particularly far away from the mean and perhaps do not represent well our data as a whole) this can give undo weight to these scores. Second, and most important, since the deviations of scores from the mean are squared, the variance ends up being a large number that cannot be placed on our frequency distribution. Taking our 100 students' scores, for example, the frequency distribution goes from the lowest score, 35, to the largest score, 85. Therefore, the figure of 211.89, our variance, is somewhat arbitrary. Calculating the standard deviation (see below) rather than the variance rectifies this problem. Nonetheless, analysing variance is extremely important in some statistical analyses, discussed in other statistical guides.

Standard Deviation

The standard deviation is a useful and popular statistic for calculating the degree of spread around the mean. Like the absolute deviation and variance it takes into account all of the scores in a group of data and provides us with an output that we can understand in terms of our frequency distribution. Typically, around two-thirds of scores lie within one standard deviation of either side of the mean.

In repairing the defects of the variance statistic, standard deviation is simply the square root of the variance (see below). Therefore, for our 100 students, the standard deviation is 14.56 marks. If you want to know how to calculate the standard deviation from your own data then you can also use the calculators at Laerd Statistics (found here), which show you all the working out as well.



It should be noted, however, that the above formula for standard deviation is only appropriate when we have all of the data from our sample (in other words, when we are only interested in the performance of our 100 students). A slightly different formula is required if we were using, for example, the performance of these 100 students to examine how 500 students (our sample population) taking the same piece of coursework has performed. This is discussed in the statistical guide, Sampling.