4.4 – Calculating Variance
Variance is a calculation of the average squared deviation. Since the Sum of Squares is the total of all the squared deviations, to calculate the average we would just divide that by the total number of scores. However, that gets a little complicated because it turns out that the total number of scores by which we divide will be slightly different depending on whether or not our group of scores is from a population or a sample.
Here are the variance formulas:
First, you will likely notice that the symbols for the two versions of variance have different symbols. Consistent with statistical convention, symbols for populations use Greek letters, and symbols for samples use Roman letters. Both symbols are the lowercase letter “s.” However, in Greek, the letter “s” is “σ” (or, “sigma”).
Second, you will see that the symbol for each has a squared symbol “…2” This helps remind us that variance is the average “squared” deviation. Thus, population variance is depicted by σ2 and sample variance is depicted by s2.
Third, you may notice that the formula is slightly different. While both use the Sum of Squares in the numerator of the equation, the denominator is different. In the population variance version, it is the total number of scores in the population, “N,” but in the sample variance version, it is the total number of scores in the sample minus one, “n – 1.” There is a reason for that slight difference, but for now, it’s simply important to remember that we need to pay attention to whether or not we are calculating a variance for a population or for a sample so that we use the correct formula.
To illustrate this difference, let’s calculate the variance for the set of scores we used in the previous section:
X2 | |
8 | 82 = 64 |
5 | 52 = 25 |
5 | 52 = 25 |
4 | 42 = 16 |
3 | 32 = 9 |
[latex]\Sigma X = 25[/latex] | [latex]\Sigma X^2 = 139[/latex] |
Now we just plug the ΣX and ΣX2 into the Sum of Squares computational formula:
[latex]\text{Sum of Squares (Computational Formula)}=SS=\Sigma X^2-\frac{(\Sigma X)^2}{N}[/latex]
[latex]=139 - \frac{(25)^2}{5} = 139 - \frac{625}{5} = 139 - 125 = 14[/latex]
This group of scores has a Sum of Squares of 14. In other words, if we added up each score’s squared deviation, it would be 14.
From here, we now can calculate the variance for the scores by taking that Sum of Squares (SS) result and plugging it into our variance formula. The key, however, is to determine if we are working with a population of scores or a sample of scores. At a practical level, whether or not the scores are a population or sample will usually be indicated in the homework or exam question. However, in a non-classroom environment, the designation of whether the scores are from a population or a sample comes down to whether the individuals in the group are the entire group on which the research question focuses, in which case it would be a population of scores, or whether the individuals in the group are a subset of the entire group on which the research question focuses, in which case it would be a sample of scores.
Calculating Variance for Populations
Let’s assume that the five scores in our table above are a population. In that case, we would use the population variance formula, using the Sum of Squares (SS) and the total number of scores in the population (N). We already calculated our Sum of Squares (SS) above, so we just need to determine the number of individuals in our population (N). To do that, we simply count the total number of scores, which is 5. Now we can plug everything into our formula:
[latex]\text{Population Variance}=\sigma^2=\frac{SS}{N}=\frac{14}{5}=2.8[/latex]
Thus, the variance for this set of scores is 2.8. In other words, the average squared deviation is 2.8. That’s not all that meaningful to us as a descriptive statistic at this point. It is an excellent measure of spread where higher numbers mean more spread and lower numbers mean less spread, but a direct interpretation of the number is less useful than the standard deviation because it can be hard for most people to get their heads around the idea of an “average squared deviation” (that’s why we will convert this score into a standard deviation in the next section so that it’s easier to interpret).
Calculating Variance for Samples
Let’s assume instead, that the five scores in our table above are a sample. In that case, we would use the sample variance formula, using the Sum of Squares (SS) and the total number of scores in the sample minus one (n – 1). We already calculated our Sum of Squares (SS) above, so we just need to determine the number of individuals in our sample (n). To do that, we simply count the total number of scores, which is 5. Now we can plug everything into our formula.:
[latex]\text{Sample Variance}=s^2=\frac{SS}{n-1}=\frac{14}{5-1}=\frac{14}{4}=3.5[/latex]
Thus, the variance for this set of scores is 3.5. In other words, the average squared deviation is 3.5.
The average squared deviation. This is a common measure of variability or spread. The higher the number, the more variability. However, it is difficult to interpret as a descriptive statistic.
The group of individuals or objects that are the focus of a research question.
Example: For the research question: "Does caffeine increase attention of people with Attention Deficit Disorder?" the population is "people with Attention Deficit Disorder."
A subset of the population that participates in a research study.
Feedback/Errata