Data handling: Use variance and regression analysis to interpolate and extrapolate bivariate data
Unit 1: Calculate variance and standard deviation
Gill Scott

Unit outcomes
By the end of this unit you will be able to:
- Calculate variance for ungrouped data manually.
- Calculate standard deviation for ungrouped data manually.
- Interpret variance and standard deviation.
What you should know
Before you start this unit, make sure you can:
- Calculate measures of central tendency of a data set, such as the mean, median and mode, of both ungrouped and grouped data, and interpret what these tell you about a data set. To revise this, you can work through:
- Calculate the range of data.
Introduction
Measures of central tendency of data sets, the mean, median and mode, give a first impression of the characteristics of a data set. From the work that you have already done, you saw that although these measures can be useful, they can also be misleading. So, it is necessary to investigate how the data in any set is spread, scattered or dispersed in order to have a complete picture of the data set.
The range is a measure of dispersion , being the spread of data from smallest to largest. The interquartile range (IQR) is a better measure of dispersion than the range. It gives the range of spread around the median, so 50%50% of the data set. However, the mean is often a better measure of central tendency than the median is, and in this unit we will investigate how data is spread around the mean. The measures of dispersion around the mean are the variance and the standard deviation.
Variance
Suppose that two groups of nine learners wanted to see how long they could balance on a slackline, with each member recording how many seconds passed before they fell off.
The nine members of group A balanced for:
320320sec; 250250sec; 183183sec; 4141sec; 335335sec; 7878sec; 142142sec; 210210sec; 115115sec.
The nine members of group B balanced for:
185185sec; 188188sec; 183183sec; 191191sec; 185185sec; 179179sec; 192192sec; 184184sec; 187187sec;
The mean of each group was calculated:
Group A’s mean:
ˉxA=320+250+183+41+335+78+142+210+1159=16749=186 sec¯xA=320+250+183+41+335+78+142+210+1159=16749=186 sec
Group B’s mean:
ˉxB=185+188+183+191+185+179+192+184+1879=16749=186 sec¯xB=185+188+183+191+185+179+192+184+1879=16749=186 sec
The means of the two groups are the same but as you can see the recorded data values are very different. The mean does not provide enough information to make a useful comparison of the data sets.
The deviation of each value from the mean for the groups was tabulated:
Group A | |
Time xx |
Deviation from the mean x−ˉxAx−¯xA |
320320 | 320−186=134 |
250 | 250−186=64 |
183 | 183−186=−3 |
41 | 41−186=−145 |
335 | 335−186=149 |
78 | 78−186=−108 |
142 | 142−186=−44 |
210 | 210−186=24 |
115 | 115−186=−71 |
Group B | |
Time x |
Deviation from the mean x−ˉxB |
185 | 185−186=−1 |
188 | 188−186=2 |
183 | 183−186=−3 |
191 | 191−186=5 |
185 | 185−186=−1 |
179 | 179−186=−7 |
192 | 192−186=6 |
184 | 184−186=−2 |
187 | 187−186=1 |
The table shows that although the means for both groups were the same, the times for group A are much more widely dispersed about the mean than the times for group B. We need to investigate the dispersions. Suppose we find the total deviations from the mean for each group:
Sum of Group A’s deviations: ∑(x−ˉx)=134+64−3−145+149−108−44+24−71=0
Sum of Group B’s deviations: ∑(x−ˉx)=−1+2−3+5−1−7+6−2+1=0
In each group, the negative values cancel out the positive values giving the total of 0 (this will happen for any group; can you see why?). However, the extent of dispersion of data around the mean gives a good idea of how representative the mean is of the data set. Squaring the distance from the mean for each data element gives a positive value for each, and so enables us to look at total spread about the mean, although this value is squared. Thus, the next step is to square each of the deviations from the mean, and to calculate the sum of the squared values:
Group A:
∑(x−ˉx)2=(134)2+(64)2+(−3)2+(−145)2+(149)2+(−108)2+(−44)2+(24)2+(−71)2=17 956+4 096+9+21 025+22 201+11 664+1 936+576+71=84 504
Group B:
∑(x−ˉx)2=(−1)2+(2)2+(−3)2+(5)2+(−1)2+(−7)2+(6)2+(−2)2+(1)2=1+4+9+25+1+49+36+4+1=130
The variance, σ, is defined as the average, or mean, of the squared deviations, so the sum for each must be divided by the number of data elements:
Variance for group A =∑(x−ˉx)2n=84 5049=9 389.33
Variance for group B =∑(x−ˉx)2n=1309=14.44
The variance of a data set is the average ˉx of the squared deviations of each of the n elements x of the set from the mean for the set:
Variance =∑(x−ˉx)2n
Notice that the units of variance are squared units.
Standard deviation
From the definition above, you can see that the variance is a squared value, which is not a very useful measure as the data values given are not squared. The other measure of dispersion, the standard deviation, represented by the Greek letter σ(lower case ‘sigma’), is the square root of the variance.
Continuing the example above:
Standard deviation for group A =√variance=√∑(x−ˉx)2n=√9 389.33=96.90
Standard deviation for group B =√variance=√∑(x−ˉx)2n=√14.44=3.80
The much larger standard deviation for group A indicates that the data for group A is much more widely distributed around the mean than that for group B. There is greater dispersion in the distribution of data for group A than that for group B. This shows that the mean for group B is much less representative of the data elements than that of group A. The more varied the data values are, the less reliable they are as a means of prediction.
Standard deviation may serve as a measure of uncertainty – or accuracy. It gives an idea of how much variation there is from the mean. The standard deviation is the square root of the average distance of the values in the data set from their mean. The standard deviation is always a positive value, and is always measured in the same units as the data elements of the set.
Standard deviation σ of n elements x of data in a set:
σ=√∑(x−ˉx)2n
Note that the standard deviation is always positive, and the units of the standard deviation are the same as the units of the data elements.
Note
For other explanations of variance and standard deviation watch “Variance of a population”,
or read through “Describing Variability“.

Take note!
For a fairly normal distribution that is not too skewed by having some very large or very small values:
- about 67% of the elements of the data set will lie within one standard deviation of the mean
- about 95% of the elements of the data set will lie within two standard deviation of the mean.

Example 1.1
Eight cupcakes from a batch were weighed and their masses recorded as follows:
23 g; 37 g; 25 g; 28 g; 33 g; 31 g; 29 g; 26 g.
- Find the range of the masses.
- Calculate the mean.
- Calculate the variance.
- Calculate the standard deviation.
Solutions
- Arrange the masses in order: 23 g; 25 g; 26 g; 28 g; 29 g; 31 g; 33 g; 37 g.
Subtract the smallest mass from the largest: Range =37−23=14 g - Divide the sum of all the masses by the number of cupcakes:
Mean:
ˉx=∑xn=2328=29 g - To find the variance, find the deviation of each mass from the mean, and square that.
.Mass
xDeviation from the mean
x−ˉx(Deviations)2
(x−ˉx)223 23−29=−6 36 37 37−29=8 64 25 25−29=−4 16 28 28−29=−1 1 33 33−29=4 16 31 31−29=2 4 29 29−29=0 0 26 26−29=−3 9 Variance=∑(x−ˉx)2n=36+64+16+1+16+4+0+98=1468=18.25
- Standard deviation is the square root of the variance:
σ=√∑(x−ˉx)2n=√1468=√18.25=4.27 g
You will notice that tabulating the data and the calculations simplifies the application of the formulae.

Activity 1.1: Working with temperatures
Time required: 12 minutes
What you need:
- a pen and paper
- a calculator
What to do:
The maximum daily temperatures in Johannesburg in the second week of April 2021 are recorded and tabulated below, alongside those of the second week of January of the same year.
April | Temperature x | Deviation from the mean x−ˉx | (Deviation)2 (x−ˉx)2 |
11th | 21∘ | ||
12th | 26∘ | ||
13th | 23∘ | ||
14th | 19∘ | ||
15th | 25∘ | ||
16th | 26∘ | ||
17th | 27∘ |
January | Temperature x | Deviation from the mean x−ˉx | (Deviation)2 (x−ˉx)2 |
10th | 25∘ | ||
11th | 27∘ | ||
12th | 27∘ | ||
13th | 25∘ | ||
14th | 24∘ | ||
15th | 26∘ | ||
16th | 28∘ |
- Work out:
- The mean temperature for the week in April (correct to one decimal place).
- The mean temperature for the week in January (correct to one decimal place).
- Copy and complete the table above for both months.
- Work out the variance for April and for January (correct to one decimal place).
- Work out the standard deviation for April and for January (correct to one decimal place).
- On what percentage of days in each of the months was the maximum temperature within one standard deviation of the mean?
- What do the two standard deviations and your calculations show about the spread of data around the respective means?
What did you find?
- .
- April Mean =ˉx=∑x7=1677=23.9∘
- January Mean =ˉx=∑x7=1827=26∘
- .Table for April
April Temperature x Deviation from the mean x−ˉx (Deviation)2 (x−ˉx)2 11th 21∘ 21∘−23.9=−2.9 8.4 12th 26∘ 26∘−23.9=2.1 4.4 13th 23∘ 23∘−23.9=−0.9 0.8 14th 19∘ 19∘−23.9=−4.9 24.0 15th 25∘ 25∘−23.9=1.1 1.2 16th 26∘ 26∘−23.9=2.1 4.4 17th 27∘ 27∘−23.9=3.1 9.6 Table for January
January Temperature x Deviation from the mean x−ˉx (Deviation)2 (x−ˉx)2 10th 25∘ 25∘−26=−1 1 11th 27∘ 27∘−26=1 1 12th 27∘ 27∘−26=1 1 13th 25∘ 25∘−26=−1 1 14th 24∘ 24∘−26=−2 4 15th 26∘ 26∘−26=0 0 16th 28∘ 28∘−26=2 4 - April:
Variance=∑(x−ˉx)2n=8.4+4.4+0.8+24.0+1.2+4.4+9.67=52.87=7.5
Notice that we leave out the units for variance: the ‘square’ of degrees is not helpful here.
.
January:
Variance=∑(x−ˉx)2n=1+1+1+1+4+0+47=127=1.7 - Standard deviation for April:
σ=√∑(x−ˉx)2n=√52.87=√7.5=2.74∘
.
Standard deviation for January:
σ=√∑(x−ˉx)2n=√127=√1.7=1.3∘ - April:
One standard deviation from the mean=ˉx±σ=23.9±2.74
So the interval is [latex]\scriptsize \left[ {23.9-2.74;23.9+2.74} \right]=\left[ {21.16{}^\circ ;26.64{}^\circ \right][/latex].
The maximum temperatures on 12, 13, 15and 16April fall within this interval.
47×100%=57.14%
So the maximum temperature on 57.14% of the days of the week in April fall within one standard deviation of the mean.
.
January:
One standard deviation from the mean =ˉx±σ=26±1.3
So the interval is [26−1.3;26+1.3]=[24.7;27.3].
The maximum temperatures on 10, 11, 12, 13 and 15 January fall within this interval.
57×100%=71.43%
So the maximum temperature on 71.43% of the days of the week in January falls within one standard deviation of the mean. - The temperatures were more consistent, with fewer fluctuations, in the week in January than the week in April.

Exercise 1.1
- World Health Organisation data for 2018 reported numbers of tuberculosis cases per 100 000 in the population for some countries in Southern and Eastern Africa as follows:
Country Number per 100 000 Angola 355 Botswana 275 Kenya 292 Lesotho 659 Malawi 153 Mozambique 361 Namibia 524 South Africa 677 Zimbabwe 210 Uganda 200 United Republic of Tanzania 253 Zambia 346 - What is the range of tuberculosis incidence per 100 000 in the populations across these countries?
- What is the mean for the entire region?
- What is the standard deviation of numbers of reported tuberculosis cases per 100 000 for the entire region?
- What percentage of countries’ tuberculosis incidence falls within one standard deviation from the mean?
- World Health Organisation estimated data for 2016 country death rates due to road traffic injuries per 100 000 population are as follows:
Country Number per 100 000 Angola 23.6 Botswana 23.8 Kenya 27.8 Lesotho 28.9 Malawi 31 Mozambique 30.1 Namibia 30.4 South Africa 25.9 Zimbabwe 34.7 Eswatini 26.9 United Republic of Tanzania 29.2 Zambia 20.9 - What is the range of road traffic death rates per 100 000 in the populations for each of these countries?
- What is the mean for the region?
- What is the standard deviation of numbers of deaths per 100 000 for the region?
- What percentage of countries’ road traffic death rates falls within one standard deviation from the mean?
- A manufacturer checks the width of a number of roller bearings from the production line in order to control quality. The following widths were measured, in micrometres (thousandth of a millimetre):
15 015; 15 101; 15 089; 15 062; 15 111; 15 054; 15 028; 15 137; 15 009; 15 096- Calculate the range.
- Calculate the mean.
- Calculate the standard deviation.
- What percentage of the measurements are within one standard deviation of the mean?
The full solutions are at the end of the unit.
Summary
In this unit you have learnt the following:
- How to calculate the variance of a data set.
- How to calculate the standard deviation of a data set.
- How to interpret the results of calculations of standard deviation of a data set
Suggested time to complete: 25 minutes
- World Health Organisation data for 2018 reported numbers of malaria cases per 1 000 in the population for some countries in Southern and Eastern Africa as follows:
Country Number per 100 000 Angola 227.36 Botswana 0.59 Kenya 60.05 Malawi 207.33 Mozambique 314.66 Namibia 31.68 South Africa 1.65 Zimbabwe 55.97 Eswatini 0.97 Uganda 262.69 Zambia 157.5 - What is the range of numbers of cases per 1 000 for these countries?
- What is the mean incidence of malaria for the region?
- What is the standard deviation?
- A potential car-buyer investigated the prices of eight cars on a car sales website, and wrote the following prices down (prices are in rands).
103 125; 129 900; 87 900; 99 900; 85 000; 120 000; 95 000; 88 000- What is the range of prices?
- What is the mean?
- What is the standard deviation?
Question 3 adapted from the NC(V) level 4 Mathematics second paper of November 2017
- The following represents the marks (percentages) of the learners who wrote the examination in Pattern Maker’s Theory:
Scores 66 59 43 72 57 47 81 54 92 61 Calculate the standard deviation of the marks of the 10 learners.
- The table below shows the number of minutes taken by seven mechanic apprentices each, to replace a control arm bushing on a vehicle:
Minutes 84 150 54 126 78 102 108 Calculate the standard deviation of the time taken by the seven apprentices.
The full solutions are at the end of the unit.
Exercise 1.1
- Tuberculosis incidence 2018:
Country Number per 100 000 x−ˉx (x−ˉx)2 Angola 355 355−358.75=−3.75 14.06 Botswana 275 275−358.75=−83.75 7 014.06 Kenya 292 292−358.75=−66.75 4 455.56 Lesotho 659 659−358.75=300.25 90 150.06 Malawi 153 153−358.75=−205.75 42 333.06 Mozambique 361 361−358.75=2.25 5.06 Namibia 524 524−358.75=165.25 27 307.56 South Africa 677 677−358.75=318.25 101 283.10 Zimbabwe 210 210−358.75=−148.75 22 126.56 Uganda 200 200−358.75=−158.75 25 201.56 United Republic of Tanzania 253 253−358.75=−105.75 11 183.06 Zambia 346 346−358.75=−12.75 162.56 - Range =677−153=524 cases per 100 000.
- Mean:
ˉx=∑xn=355+275+292+659+153+361+524+677+210+200+253+34612=430512=358.75 - Standard deviation:
σ=√∑(x−ˉx)2n=√331236.312=√227603.025=166.14 - Percentage of countries in the region falling within one standard deviation of the mean:
One standard deviation from the mean=ˉx±σ=358.75±166.14
So the interval is [358.75−166.14;358.75+166.14]=[192.61;524.89].
Three countries (Malawi, Lesotho, South Africa) fall outside this interval, so 912×100%=75% fall within one standard deviation of the mean.
- Road traffic death rates:
Country Number per 100 000 x−ˉx (x−ˉx)2 Angola 23.6 23.6−27.77=−4.17 17.39 Botswana 23.8 23.8−27.77=−3.97 15.76 Kenya 27.8 27.8−27.77=0.03 0.00 Lesotho 28.9 28.9−27.77=1.13 1.28 Malawi 31 31−27.77=3.23 10.43 Mozambique 30.1 30.1−27.77=2.33 5.43 Namibia 30.4 30.4−27.77=2.63 6.92 South Africa 25.9 25.9−27.77=−1.87 3.50 Zimbabwe 34.7 34.7−27.77=6.93 48.02 Eswatini 26.9 26.9−27.77=−0.87 0.76 United Republic of Tanzania 29.2 29.2−27.77=1.43 2.04 Zambia 20.9 20.9−27.77=−6.87 47.20 - Range =34.7−20.9=13.8 road traffic deaths per 100 000.
- Mean:
ˉx=∑xn=23.6+23.8+27.8+28.9+31+30.1+30.4+25.9+34.7+26.9+29.2+20.912=333.212=27.77 - Standard deviation:
σ=√∑(x−ˉx)2n=√158.7312=√13.2275=3.64 - Percentage of countries in the region falling within one standard deviation of the mean:
One standard deviation from the mean=ˉx±σ=27.77±3.64
So the interval is [27.77−3.64;27.77+3.64]=[24.13;31.41].
Four countries (Angola, Botswana, Zimbabwe, Zambia) fall outside this interval, so 812×100%=66.67% fall within one standard deviation of the mean.
- .
Width (mm) Deviation from the mean
x−ˉx(Deviation)2(x−ˉx)2 15 015 −55.2 3 047.04 15 101 30.8 948.64 15 089 18.8 353.44 15 062 −8.2 67.24 15 111 40.8 1 664.64 15 054 −16.2 262.44 15 028 −42.2 1 780.84 15 137 66.8 4 462.24 15 009 −61.2 3 745.44 15 096 25.8 665.64 - Range =15 137−15 009=128 mm
- Mean:
ˉx=∑xn=15070210=15 070.2 - Standard deviation:
σ=√∑(x−ˉx)2n=√16 997.610=√1 699.76=41.228 - Percentage of bearings falling within one standard deviation of the mean:
One standard deviation from the mean =ˉx±σ=15 070.2±41.2
So the interval is [15 070.2−41.2;15 070.2+41.2]=[15 029;15 111.4].
Four bearings fall outside this interval (15 015; 15 028; 15 137; 15 009 ), so 610=60% fall within one standard deviation from the mean.
Unit 1: Assessment
- .
Country Number malaria cases per 1 000 Deviation from mean
x−ˉx(Deviation)2
(x−ˉx)2Angola 227.36 107.32 11 517.39 Botswana 0.59 −119.45 14 268.52 Kenya 60.05 −59.99 3 598.91 Malawi 207.33 87.29 7 619.39 Mozambique 314.66 194.62 37 876.59 Namibia 31.68 −88.36 7 807.65 South Africa 1.65 −118.39 14 016.41 Zimbabwe 55.97 −64.07 4 105.08 Eswatini 0.97 −119.07 14 177.88 Uganda 262.69 142.65 20 348.76 Zambia 157.5 37.46 1 403.18 - Range = 314.66−0.59=314.07
- Mean:
Mean =ˉx=∑x11=1320.4511=120.04 - Standard deviation:
σ=√∑(x−ˉx)2n=√136 739.811=√12 430.89=111.49
- .
Price in Rands Deviation from mean
x−ˉx(Deviation)2
(x−ˉx)2103 125 2 021.88 4 087 978.52 129 900 28 796.88 829 260 009.77 87 900 −13 203.13 174 322 509.77 99 900 −1 203.13 1 447 509.77 85 000 −16 103.13 259 310 634.77 120 000 18 896.88 357 091 884.77 95 000 −6 103.13 37 248 134.77 88 000 −13 103.13 171 691 884.77 - Range = 129 900−85 000= R44 900
- .
Mean =ˉx=∑x8=808 8258=101 103.10
The mean price is R101 103.10. - Standard deviation:
σ=√∑(x−ˉx)2n=√1 834 460 546.888=√229 307 568.40=15 142.90
The standard deviation is R15 142.90.
- Table of values:
Mark Deviation from mean
x−ˉx(Deviation)2
(x−ˉx)266 2.8 7.84 59 −4.2 17.64 43 −20.2 408.04 72 8.8 77.44 57 −6.2 38.44 47 −16.2 262.44 81 17.8 316.84 54 −9.2 84.64 92 28.8 829.44 61 −2.2 4.84 Standard deviation of the marks of the 10 learners:
σ=√∑(x−ˉx)2n=√2 047.610=√204.76=14.31 - Table of values:
Minutes taken Deviation from mean
x−ˉx(Deviation)2
(x−ˉx)284 −16.29 265.36 150 49.71 2471.08 54 −46.29 2142.76 126 25.71 661.00 78 −22.29 496.84 102 1.71 2.92 108 7.71 59.44 Standard deviation of the times taken:
σ=√∑(x−ˉx)2n=√6 099.437=√204.76=29.52