Data handling: Use variance and regression analysis to interpolate and extrapolate bivariate data

# Unit 1: Calculate variance and standard deviation

Gill Scott

### Unit outcomes

By the end of this unit you will be able to:

• Calculate variance for ungrouped data manually.
• Calculate standard deviation for ungrouped data manually.
• Interpret variance and standard deviation.

## What you should know

Before you start this unit, make sure you can:

• Calculate measures of central tendency of a data set, such as the mean, median and mode, of both ungrouped and grouped data, and interpret what these tell you about a data set. To revise this, you can work through:
• Calculate the range of data.

## Introduction

Measures of central tendency of data sets, the mean, median and mode, give a first impression of the characteristics of a data set. From the work that you have already done, you saw that although these measures can be useful, they can also be misleading. So, it is necessary to investigate how the data in any set is spread, scattered or dispersed in order to have a complete picture of the data set.

The range is a measure of dispersion , being the spread of data from smallest to largest. The interquartile range (IQR) is a better measure of dispersion than the range. It gives the range of spread around the median, so $\scriptsize 50\%$ of the data set. However, the mean is often a better measure of central tendency than the median is, and in this unit we will investigate how data is spread around the mean. The measures of dispersion around the mean are the variance and the standard deviation.

## Variance

Suppose that two groups of nine learners wanted to see how long they could balance on a slackline, with each member recording how many seconds passed before they fell off.

The nine members of group A balanced for:

$\scriptsize 320$sec; $\scriptsize 250$sec; $\scriptsize 183$sec; $\scriptsize 41$sec; $\scriptsize 335$sec; $\scriptsize 78$sec; $\scriptsize 142$sec; $\scriptsize 210$sec; $\scriptsize 115$sec.

The nine members of group B balanced for:

$\scriptsize 185$sec; $\scriptsize 188$sec; $\scriptsize 183$sec; $\scriptsize 191$sec; $\scriptsize 185$sec; $\scriptsize 179$sec; $\scriptsize 192$sec; $\scriptsize 184$sec; $\scriptsize 187$sec;

The mean of each group was calculated:
Group A’s mean:
$\scriptsize {{\bar{x}}_{A}}=\displaystyle \frac{{320+250+183+41+335+78+142+210+115}}{9}=\displaystyle \frac{{1674}}{9}=186\text{ sec}$
Group B’s mean:
$\scriptsize {{\bar{x}}_{B}}=\displaystyle \frac{{185+188+183+191+185+179+192+184+187}}{9}=\displaystyle \frac{{1674}}{9}=186\text{ sec}$

The means of the two groups are the same but as you can see the recorded data values are very different. The mean does not provide enough information to make a useful comparison of the data sets.

The deviation of each value from the mean for the groups was tabulated:

 Group A Time $\scriptsize x$ Deviation from the mean $\scriptsize x-{{\bar{x}}_{A}}$ $\scriptsize 320$ $\scriptsize 320-186=134$ $\scriptsize 250$ $\scriptsize 250-186=64$ $\scriptsize 183$ $\scriptsize 183-186=-3$ $\scriptsize 41$ $\scriptsize 41-186=-145$ $\scriptsize 335$ $\scriptsize 335-186=149$ $\scriptsize 78$ $\scriptsize 78-186=-108$ $\scriptsize 142$ $\scriptsize 142-186=-44$ $\scriptsize 210$ $\scriptsize 210-186=24$ $\scriptsize 115$ $\scriptsize 115-186=-71$
 Group B Time $\scriptsize x$ Deviation from the mean $\scriptsize x-{{\bar{x}}_{B}}$ $\scriptsize 185$ $\scriptsize 185-186=-1$ $\scriptsize 188$ $\scriptsize 188-186=2$ $\scriptsize 183$ $\scriptsize 183-186=-3$ $\scriptsize 191$ $\scriptsize 191-186=5$ $\scriptsize 185$ $\scriptsize 185-186=-1$ $\scriptsize 179$ $\scriptsize 179-186=-7$ $\scriptsize 192$ $\scriptsize 192-186=6$ $\scriptsize 184$ $\scriptsize 184-186=-2$ $\scriptsize 187$ $\scriptsize 187-186=1$

The table shows that although the means for both groups were the same, the times for group A are much more widely dispersed about the mean than the times for group B. We need to investigate the dispersions. Suppose we find the total deviations from the mean for each group:
Sum of Group A’s deviations: $\scriptsize \sum{{\left( {x-\bar{x}} \right)}}=134+64-3-145+149-108-44+24-71=0$
Sum of Group B’s deviations: $\scriptsize \sum{{\left( {x-\bar{x}} \right)}}=-1+2-3+5-1-7+6-2+1=0$

In each group, the negative values cancel out the positive values giving the total of $\scriptsize 0$ (this will happen for any group; can you see why?). However, the extent of dispersion of data around the mean gives a good idea of how representative the mean is of the data set. Squaring the distance from the mean for each data element gives a positive value for each, and so enables us to look at total spread about the mean, although this value is squared. Thus, the next step is to square each of the deviations from the mean, and to calculate the sum of the squared values:

Group A:
\scriptsize \begin{align*}{{\sum{{\left( {x-\bar{x}} \right)}}}^{2}}&={{(134)}^{2}}+{{(64)}^{2}}+{{(-3)}^{2}}+{{(-145)}^{2}}+{{(149)}^{2}}+{{(-108)}^{2}}+{{(-44)}^{2}}+{{(24)}^{2}}+{{(-71)}^{2}}\\&=17\text{ }956+4\text{ }096+9+21\text{ }025+22\text{ }201+11\text{ }664+1\text{ }936+576+71\\&=84\text{ }504\end{align*}

Group B:
\scriptsize \begin{align*}{{\sum{{\left( {x-\bar{x}} \right)}}}^{2}}&={{(-1)}^{2}}+{{(2)}^{2}}+{{(-3)}^{2}}+{{(5)}^{2}}+{{(-1)}^{2}}+{{(-7)}^{2}}+{{(6)}^{2}}+{{(-2)}^{2}}+{{(1)}^{2}}\\&=1+4+9+25+1+49+36+4+1\\&=130\end{align*}

The variance, $\scriptsize \sigma$, is defined as the average, or mean, of the squared deviations, so the sum for each must be divided by the number of data elements:
$\scriptsize \text{Variance for group A =}\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}=\displaystyle \frac{{84\text{ }504}}{9}=9\text{ }389.33$

$\scriptsize \text{Variance for group B =}\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}=\displaystyle \frac{{130}}{9}=14.44$

The variance of a data set is the average $\scriptsize \bar{x}$ of the squared deviations of each of the $\scriptsize n$ elements $\scriptsize x$ of the set from the mean for the set:
$\scriptsize \text{Variance =}\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}$
Notice that the units of variance are squared units.

## Standard deviation

From the definition above, you can see that the variance is a squared value, which is not a very useful measure as the data values given are not squared. The other measure of dispersion, the standard deviation, represented by the Greek letter $\scriptsize \sigma$(lower case ‘sigma’), is the square root of the variance.

Continuing the example above:
$\scriptsize \text{Standard deviation for group A =}\sqrt{{\text{variance}}}\text{=}\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}=\sqrt{{9\text{ }389.33}}=96.90$
$\scriptsize \text{Standard deviation for group B =}\sqrt{{\text{variance}}}=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}=\sqrt{{14.44}}=3.80$

The much larger standard deviation for group A indicates that the data for group A is much more widely distributed around the mean than that for group B. There is greater dispersion in the distribution of data for group A than that for group B. This shows that the mean for group B is much less representative of the data elements than that of group A. The more varied the data values are, the less reliable they are as a means of prediction.

Standard deviation may serve as a measure of uncertainty – or accuracy. It gives an idea of how much variation there is from the mean. The standard deviation is the square root of the average distance of the values in the data set from their mean. The standard deviation is always a positive value, and is always measured in the same units as the data elements of the set.

Standard deviation $\scriptsize \sigma$ of $\scriptsize n$ elements $\scriptsize x$ of data in a set:
$\scriptsize \sigma \text{=}\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}$
Note that the standard deviation is always positive, and the units of the standard deviation are the same as the units of the data elements.

### Note

For other explanations of variance and standard deviation watch “Variance of a population”,

Variance of a population (Duration: 08.05)

### Take note!

For a fairly normal distribution that is not too skewed by having some very large or very small values:

• about $\scriptsize 67\%$ of the elements of the data set will lie within one standard deviation of the mean
• about $\scriptsize 95\%$ of the elements of the data set will lie within two standard deviation of the mean.

### Example 1.1

Eight cupcakes from a batch were weighed and their masses recorded as follows:
$\scriptsize 23\text{ g; }37\text{ g; }25\text{ g; }28\text{ g; }33\text{ g; }31\text{ g; }29\text{ g; 26 g}\text{.}$

1. Find the range of the masses.
2. Calculate the mean.
3. Calculate the variance.
4. Calculate the standard deviation.

Solutions

1. Arrange the masses in order: $\scriptsize 23\text{ g; }25\text{ g; 26 g; }28\text{ g; }29\text{ g; }31\text{ g; }33\text{ g; }37\text{ g}\text{.}$
Subtract the smallest mass from the largest: Range $\scriptsize =37-23=14\text{ g}$
2. Divide the sum of all the masses by the number of cupcakes:
Mean:
\scriptsize \begin{align*}\bar{x}&=\displaystyle \frac{{\sum{x}}}{n}\\&=\displaystyle \frac{{232}}{8}\\&=29\text{ g}\end{align*}
3. To find the variance, find the deviation of each mass from the mean, and square that.
.

 Mass $\scriptsize x$ Deviation from the mean $\scriptsize x-\bar{x}$ (Deviations)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize 23$ $\scriptsize 23-29=-6$ $\scriptsize 36$ $\scriptsize 37$ $\scriptsize 37-29=8$ $\scriptsize 64$ $\scriptsize 25$ $\scriptsize 25-29=-4$ $\scriptsize 16$ $\scriptsize 28$ $\scriptsize 28-29=-1$ $\scriptsize 1$ $\scriptsize 33$ $\scriptsize 33-29=4$ $\scriptsize 16$ $\scriptsize 31$ $\scriptsize 31-29=2$ $\scriptsize 4$ $\scriptsize 29$ $\scriptsize 29-29=0$ $\scriptsize 0$ $\scriptsize 26$ $\scriptsize 26-29=-3$ $\scriptsize 9$

\scriptsize \begin{align*}\text{Variance}&=\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}\\&=\displaystyle \frac{{36+64+16+1+16+4+0+9}}{8}\\&=\displaystyle \frac{{146}}{8}\\&=18.25\end{align*}

4. Standard deviation is the square root of the variance:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{146}}{8}}}\\&=\sqrt{{18.25}}\\&=4.27\text{ g}\end{align*}

You will notice that tabulating the data and the calculations simplifies the application of the formulae.

### Activity 1.1: Working with temperatures

Time required: 12 minutes

What you need:

• a pen and paper
• a calculator

What to do:

The maximum daily temperatures in Johannesburg in the second week of April 2021 are recorded and tabulated below, alongside those of the second week of January of the same year.

 April Temperature $\scriptsize x$ Deviation from the mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize \text{1}{{\text{1}}^{{\text{th}}}}$ $\scriptsize 21{}^\circ$ $\scriptsize \text{1}{{\text{2}}^{{\text{th}}}}$ $\scriptsize 26{}^\circ$ $\scriptsize \text{1}{{\text{3}}^{{\text{th}}}}$ $\scriptsize 23{}^\circ$ $\scriptsize \text{1}{{\text{4}}^{{\text{th}}}}$ $\scriptsize 19{}^\circ$ $\scriptsize \text{1}{{\text{5}}^{{\text{th}}}}$ $\scriptsize 25{}^\circ$ $\scriptsize \text{1}{{\text{6}}^{{\text{th}}}}$ $\scriptsize 26{}^\circ$ $\scriptsize \text{1}{{\text{7}}^{{\text{th}}}}$ $\scriptsize 27{}^\circ$

 January Temperature $\scriptsize x$ Deviation from the mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize \text{1}{{\text{0}}^{{\text{th}}}}$ $\scriptsize 25{}^\circ$ $\scriptsize \text{1}{{\text{1}}^{{\text{th}}}}$ $\scriptsize 27{}^\circ$ $\scriptsize \text{1}{{\text{2}}^{{\text{th}}}}$ $\scriptsize 27{}^\circ$ $\scriptsize \text{1}{{\text{3}}^{{\text{th}}}}$ $\scriptsize 25{}^\circ$ $\scriptsize \text{1}{{\text{4}}^{{\text{th}}}}$ $\scriptsize 24{}^\circ$ $\scriptsize \text{1}{{\text{5}}^{{\text{th}}}}$ $\scriptsize 26{}^\circ$ $\scriptsize \text{1}{{\text{6}}^{{\text{th}}}}$ $\scriptsize 28{}^\circ$
1. Work out:
1. The mean temperature for the week in April (correct to one decimal place).
2. The mean temperature for the week in January (correct to one decimal place).
2. Copy and complete the table above for both months.
3. Work out the variance for April and for January (correct to one decimal place).
4. Work out the standard deviation for April and for January (correct to one decimal place).
5. On what percentage of days in each of the months was the maximum temperature within one standard deviation of the mean?
6. What do the two standard deviations and your calculations show about the spread of data around the respective means?

What did you find?

1. .
1. April $\scriptsize \text{Mean }=\bar{x}=\displaystyle \frac{{\sum{x}}}{7}=\displaystyle \frac{{167}}{7}=23.9{}^\circ$
2. January $\scriptsize \text{Mean }=\bar{x}=\displaystyle \frac{{\sum{x}}}{7}=\displaystyle \frac{{182}}{7}=26{}^\circ$
2. .Table for April
 April Temperature $\scriptsize x$ Deviation from the mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize \text{1}{{\text{1}}^{{\text{th}}}}$ $\scriptsize 21{}^\circ$ $\scriptsize 21{}^\circ -23.9=-2.9$ $\scriptsize 8.4$ $\scriptsize \text{1}{{\text{2}}^{{\text{th}}}}$ $\scriptsize 26{}^\circ$ $\scriptsize 26{}^\circ -23.9=2.1$ $\scriptsize 4.4$ $\scriptsize \text{1}{{\text{3}}^{{\text{th}}}}$ $\scriptsize 23{}^\circ$ $\scriptsize 23{}^\circ -23.9=-0.9$ $\scriptsize 0.8$ $\scriptsize \text{1}{{\text{4}}^{{\text{th}}}}$ $\scriptsize 19{}^\circ$ $\scriptsize 19{}^\circ -23.9=-4.9$ $\scriptsize 24.0$ $\scriptsize \text{1}{{\text{5}}^{{\text{th}}}}$ $\scriptsize 25{}^\circ$ $\scriptsize 25{}^\circ -23.9=1.1$ $\scriptsize 1.2$ $\scriptsize \text{1}{{\text{6}}^{{\text{th}}}}$ $\scriptsize 26{}^\circ$ $\scriptsize 26{}^\circ -23.9=2.1$ $\scriptsize 4.4$ $\scriptsize \text{1}{{\text{7}}^{{\text{th}}}}$ $\scriptsize 27{}^\circ$ $\scriptsize 27{}^\circ -23.9=3.1$ $\scriptsize 9.6$

Table for January

 January Temperature $\scriptsize x$ Deviation from the mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize \text{1}{{\text{0}}^{{\text{th}}}}$ $\scriptsize 25{}^\circ$ $\scriptsize 25{}^\circ -26=-1$ $\scriptsize 1$ $\scriptsize \text{1}{{\text{1}}^{{\text{th}}}}$ $\scriptsize 27{}^\circ$ $\scriptsize 27{}^\circ -26=1$ $\scriptsize 1$ $\scriptsize \text{1}{{\text{2}}^{{\text{th}}}}$ $\scriptsize 27{}^\circ$ $\scriptsize 27{}^\circ -26=1$ $\scriptsize 1$ $\scriptsize \text{1}{{\text{3}}^{{\text{th}}}}$ $\scriptsize 25{}^\circ$ $\scriptsize 25{}^\circ -26=-1$ $\scriptsize 1$ $\scriptsize \text{1}{{\text{4}}^{{\text{th}}}}$ $\scriptsize 24{}^\circ$ $\scriptsize 24{}^\circ -26=-2$ $\scriptsize 4$ $\scriptsize \text{1}{{\text{5}}^{{\text{th}}}}$ $\scriptsize 26{}^\circ$ $\scriptsize 26{}^\circ -26=0$ $\scriptsize 0$ $\scriptsize \text{1}{{\text{6}}^{{\text{th}}}}$ $\scriptsize 28{}^\circ$ $\scriptsize 28{}^\circ -26=2$ $\scriptsize 4$
3. April:
$\scriptsize$ \scriptsize \begin{align*}\text{Variance}&=\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}\\&=\displaystyle \frac{{8.4+4.4+0.8+24.0+1.2+4.4+9.6}}{7}\\&=\displaystyle \frac{{52.8}}{7}\\&=7.5\end{align*}
Notice that we leave out the units for variance: the ‘square’ of degrees is not helpful here.
.
January:
\scriptsize \begin{align*}\text{Variance} &=\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}\\&=\displaystyle \frac{{1+1+1+1+4+0+4}}{7}\\&=\displaystyle \frac{{12}}{7}\\&=1.7\end{align*}
4. Standard deviation for April:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{52.8}}{7}}}\\&=\sqrt{{7.5}}\\&=2.74{}^\circ \end{align*}
.
Standard deviation for January:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{12}}{7}}}\\&=\sqrt{{1.7}}\\&=1.3{}^\circ \end{align*}
5. April:
\scriptsize \begin{align*}\text{One standard deviation from the mean} &= \bar{x}\pm \sigma \\&=23.9\pm 2.74\end{align*}
So the interval is $\scriptsize \left[ {23.9-2.74;23.9+2.74} \right]=\left[ {21.16{}^\circ ;26.64{}^\circ \right]$.
The maximum temperatures on $\scriptsize \text{12}$, $\scriptsize \text{13}$, $\scriptsize \text{15}$and $\scriptsize \text{16}$April fall within this interval.
$\scriptsize \displaystyle \frac{4}{7}\times 100\%=57.14\%$
So the maximum temperature on $\scriptsize 57.14\%$ of the days of the week in April fall within one standard deviation of the mean.
.
January:
\scriptsize \begin{align*}\text{One standard deviation from the mean }&= \bar{x}\pm \sigma \\&=26\pm 1.3\end{align*}
So the interval is $\scriptsize \left[ {26-1.3;26+1.3} \right]=\left[ {24.7;27.3} \right]$.
The maximum temperatures on $\scriptsize \text{10}$, $\scriptsize \text{11}$, $\scriptsize \text{12}$, $\scriptsize \text{13}$ and $\scriptsize 15$ January fall within this interval.
$\scriptsize \displaystyle \frac{5}{7}\times 100\%=71.43\%$
So the maximum temperature on $\scriptsize 71.43\%$ of the days of the week in January falls within one standard deviation of the mean.
6. The temperatures were more consistent, with fewer fluctuations, in the week in January than the week in April.

### Exercise 1.1

1. World Health Organisation data for 2018 reported numbers of tuberculosis cases per $\scriptsize 100\text{ }000$ in the population for some countries in Southern and Eastern Africa as follows:
 Country Number per $\scriptsize 100\text{ }000$ Angola $\scriptsize 355$ Botswana $\scriptsize 275$ Kenya $\scriptsize 292$ Lesotho $\scriptsize 659$ Malawi $\scriptsize 153$ Mozambique $\scriptsize 361$ Namibia $\scriptsize 524$ South Africa $\scriptsize 677$ Zimbabwe $\scriptsize 210$ Uganda $\scriptsize 200$ United Republic of Tanzania $\scriptsize 253$ Zambia $\scriptsize 346$
1. What is the range of tuberculosis incidence per $\scriptsize 100\text{ }000$ in the populations across these countries?
2. What is the mean for the entire region?
3. What is the standard deviation of numbers of reported tuberculosis cases per $\scriptsize 100\text{ }000$ for the entire region?
4. What percentage of countries’ tuberculosis incidence falls within one standard deviation from the mean?
2. World Health Organisation estimated data for 2016 country death rates due to road traffic injuries per $\scriptsize 100\text{ }000$ population are as follows:
 Country Number per $\scriptsize 100\text{ }000$ Angola $\scriptsize 23.6$ Botswana $\scriptsize 23.8$ Kenya $\scriptsize 27.8$ Lesotho $\scriptsize 28.9$ Malawi $\scriptsize 31$ Mozambique $\scriptsize 30.1$ Namibia $\scriptsize 30.4$ South Africa $\scriptsize 25.9$ Zimbabwe $\scriptsize 34.7$ Eswatini $\scriptsize 26.9$ United Republic of Tanzania $\scriptsize 29.2$ Zambia $\scriptsize 20.9$
1. What is the range of road traffic death rates per $\scriptsize 100\text{ }000$ in the populations for each of these countries?
2. What is the mean for the region?
3. What is the standard deviation of numbers of deaths per $\scriptsize 100\text{ }000$ for the region?
4. What percentage of countries’ road traffic death rates falls within one standard deviation from the mean?
3. A manufacturer checks the width of a number of roller bearings from the production line in order to control quality. The following widths were measured, in micrometres (thousandth of a millimetre):
$\scriptsize 15\text{ }015;\text{ 15 101; 15 089; 15 062; 15 111; 15 054; 15 028; 15 137; 15 009; 15 096}$

1. Calculate the range.
2. Calculate the mean.
3. Calculate the standard deviation.
4. What percentage of the measurements are within one standard deviation of the mean?

The full solutions are at the end of the unit.

## Summary

In this unit you have learnt the following:

• How to calculate the variance of a data set.
• How to calculate the standard deviation of a data set.
• How to interpret the results of calculations of standard deviation of a data set

# Unit 1: Assessment

#### Suggested time to complete: 25 minutes

1. World Health Organisation data for 2018 reported numbers of malaria cases per $\scriptsize 1\text{ }000$ in the population for some countries in Southern and Eastern Africa as follows:
 Country Number per $\scriptsize 100\text{ }000$ Angola $\scriptsize 227.36$ Botswana $\scriptsize 0.59$ Kenya $\scriptsize 60.05$ Malawi $\scriptsize 207.33$ Mozambique $\scriptsize 314.66$ Namibia $\scriptsize 31.68$ South Africa $\scriptsize 1.65$ Zimbabwe $\scriptsize 55.97$ Eswatini $\scriptsize 0.97$ Uganda $\scriptsize 262.69$ Zambia $\scriptsize 157.5$
1. What is the range of numbers of cases per $\scriptsize 1\text{ }000$ for these countries?
2. What is the mean incidence of malaria for the region?
3. What is the standard deviation?
2. A potential car-buyer investigated the prices of eight cars on a car sales website, and wrote the following prices down (prices are in rands).
$\scriptsize 103\text{ 1}25\text{; 129 900; 87 900; 99 900; 85 000; 120 000; 95 000; 88 000}$

1. What is the range of prices?
2. What is the mean?
3. What is the standard deviation?

Question 3 adapted from the NC(V) level 4 Mathematics second paper of November 2017

1. The following represents the marks (percentages) of the learners who wrote the examination in Pattern Maker’s Theory:
 Scores $\scriptsize 66$ $\scriptsize 59$ $\scriptsize 43$ $\scriptsize 72$ $\scriptsize 57$ $\scriptsize 47$ $\scriptsize 81$ $\scriptsize 54$ $\scriptsize 92$ $\scriptsize 61$

Calculate the standard deviation of the marks of the $\scriptsize 10$ learners.

2. The table below shows the number of minutes taken by seven mechanic apprentices each, to replace a control arm bushing on a vehicle:
 Minutes $\scriptsize 84$ $\scriptsize 150$ $\scriptsize 54$ $\scriptsize 126$ $\scriptsize 78$ $\scriptsize 102$ $\scriptsize 108$

Calculate the standard deviation of the time taken by the seven apprentices.

The full solutions are at the end of the unit.

# Unit 1: Solutions

### Exercise 1.1

1. Tuberculosis incidence 2018:
 Country Number per $\scriptsize 100\text{ }000$ $\scriptsize x-\bar{x}$ $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ Angola $\scriptsize 355$ $\scriptsize 355-358.75=-3.75$ $\scriptsize 14.06$ Botswana $\scriptsize 275$ $\scriptsize 275-358.75=-83.75$ $\scriptsize 7\text{ }014.06$ Kenya $\scriptsize 292$ $\scriptsize 292-358.75=-66.75$ $\scriptsize 4\text{ }455.56$ Lesotho $\scriptsize 659$ $\scriptsize 659-358.75=300.25$ $\scriptsize 90\text{ }150.06$ Malawi $\scriptsize 153$ $\scriptsize 153-358.75=-205.75$ $\scriptsize 42\text{ }333.06$ Mozambique $\scriptsize 361$ $\scriptsize 361-358.75=2.25$ $\scriptsize 5.06$ Namibia $\scriptsize 524$ $\scriptsize 524-358.75=165.25$ $\scriptsize 27\text{ }307.56$ South Africa $\scriptsize 677$ $\scriptsize 677-358.75=318.25$ $\scriptsize 101\text{ }283.10$ Zimbabwe $\scriptsize 210$ $\scriptsize 210-358.75=-148.75$ $\scriptsize 22\text{ }126.56$ Uganda $\scriptsize 200$ $\scriptsize 200-358.75=-158.75$ $\scriptsize 25\text{ }201.56$ United Republic of Tanzania $\scriptsize 253$ $\scriptsize 253-358.75=-105.75$ $\scriptsize 11\text{ }183.06$ Zambia $\scriptsize 346$ $\scriptsize 346-358.75=-12.75$ $\scriptsize 162.56$
1. $\scriptsize \text{Range }=677-153=524$ cases per $\scriptsize 100\text{ }000$.
2. Mean:
\scriptsize \begin{align*}\bar{x}\text{=}\displaystyle \frac{{\sum{x}}}{n}&=\displaystyle \frac{{355+275+292+659+153+361+524+677+210+200+253+346}}{{12}}\\&=\displaystyle \frac{{4305}}{{12}}\\&=358.75\end{align*}
3. Standard deviation:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{331236.3}}{{12}}}}\\&=\sqrt{{227603.025}}\\&=166.14\end{align*}
4. Percentage of countries in the region falling within one standard deviation of the mean:
\scriptsize \begin{align*}\text{One standard deviation from the mean} &=\bar{x}\pm \sigma \\&=358.75\pm 166.14\end{align*}
So the interval is $\scriptsize \left[ {358.75-166.14;358.75+166.14} \right]=\left[ {192.61;524.89} \right]$.
Three countries (Malawi, Lesotho, South Africa) fall outside this interval, so $\scriptsize \displaystyle \frac{9}{{12}}\times 100\%=75\%$ fall within one standard deviation of the mean.
 Country Number per $\scriptsize 100\text{ }000$ $\scriptsize x-\bar{x}$ $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ Angola $\scriptsize 23.6$ $\scriptsize 23.6-27.77=-4.17$ $\scriptsize 17.39$ Botswana $\scriptsize 23.8$ $\scriptsize 23.8-27.77=-3.97$ $\scriptsize 15.76$ Kenya $\scriptsize 27.8$ $\scriptsize 27.8-27.77=0.03$ $\scriptsize 0.00$ Lesotho $\scriptsize 28.9$ $\scriptsize 28.9-27.77=1.13$ $\scriptsize 1.28$ Malawi $\scriptsize 31$ $\scriptsize 31-27.77=3.23$ $\scriptsize 10.43$ Mozambique $\scriptsize 30.1$ $\scriptsize 30.1-27.77=2.33$ $\scriptsize 5.43$ Namibia $\scriptsize 30.4$ $\scriptsize 30.4-27.77=2.63$ $\scriptsize 6.92$ South Africa $\scriptsize 25.9$ $\scriptsize 25.9-27.77=-1.87$ $\scriptsize 3.50$ Zimbabwe $\scriptsize 34.7$ $\scriptsize 34.7-27.77=6.93$ $\scriptsize 48.02$ Eswatini $\scriptsize 26.9$ $\scriptsize 26.9-27.77=-0.87$ $\scriptsize 0.76$ United Republic of Tanzania $\scriptsize 29.2$ $\scriptsize 29.2-27.77=1.43$ $\scriptsize 2.04$ Zambia $\scriptsize 20.9$ $\scriptsize 20.9-27.77=-6.87$ $\scriptsize 47.20$
1. $\scriptsize \text{Range }=34.7-20.9=13.8$ road traffic deaths per $\scriptsize 100\text{ }000$.
2. Mean:
\scriptsize \begin{align*}\bar{x}\text{=}\displaystyle \frac{{\sum{x}}}{n}&=\displaystyle \frac{{23.6+23.8+27.8+28.9+31+30.1+30.4+25.9+34.7+26.9+29.2+20.9}}{{12}}\\&=\displaystyle \frac{{333.2}}{{12}}\\&=27.77\end{align*}
3. Standard deviation:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{158.73}}{{12}}}}\\&=\sqrt{{13.2275}}\\&=3.64\end{align*}
4. Percentage of countries in the region falling within one standard deviation of the mean:
\scriptsize \begin{align*}\text{One standard deviation from the mean}&=\bar{x}\pm \sigma \\&=27.77\pm 3.64\end{align*}
So the interval is $\scriptsize \left[ {27.77-3.64;27.77+3.64} \right]=\left[ {24.13;31.41} \right]$.
Four countries (Angola, Botswana, Zimbabwe, Zambia) fall outside this interval, so $\scriptsize \displaystyle \frac{8}{{12}}\times 100\%=66.67\%$ fall within one standard deviation of the mean.
3. .
 Width (mm) Deviation from the mean $\scriptsize x-\bar{x}$ \scriptsize \begin{align*}{{\left( {\text{Deviation}} \right)}^{2}}\\{{\left( {x-\bar{x}} \right)}^{2}}\end{align*} $\scriptsize 15\text{ }015$ $\scriptsize -55.2$ $\scriptsize 3\text{ 047}\text{.04}$ $\scriptsize 15\text{ }101$ $\scriptsize 30.8$ $\scriptsize 948.64$ $\scriptsize 15\text{ }089$ $\scriptsize 18.8$ $\scriptsize 353.44$ $\scriptsize 15\text{ }062$ $\scriptsize -8.2$ $\scriptsize 67.24$ $\scriptsize 15\text{ }111$ $\scriptsize 40.8$ $\scriptsize 1\text{ 664}\text{.64}$ $\scriptsize 15\text{ }054$ $\scriptsize -16.2$ $\scriptsize 262.44$ $\scriptsize 15\text{ }028$ $\scriptsize -42.2$ $\scriptsize 1\text{ }780.84$ $\scriptsize 15\text{ }137$ $\scriptsize 66.8$ $\scriptsize 4\text{ 462}\text{.24}$ $\scriptsize 15\text{ }009$ $\scriptsize -61.2$ $\scriptsize 3\text{ 745}\text{.44}$ $\scriptsize 15\text{ }096$ $\scriptsize 25.8$ $\scriptsize 665.64$
1. $\scriptsize \text{Range }=15\text{ }137-15\text{ }009=128\text{ mm}$
2. Mean:
\scriptsize \begin{align*}\bar{x}&=\displaystyle \frac{{\sum{x}}}{n}\\&=\displaystyle \frac{{150702}}{{10}}\\&=15\text{ 070}\text{.2}\end{align*}
3. Standard deviation:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{16\text{ }997.6}}{{10}}}}\\&=\sqrt{{1\text{ }699.76}}\\&=41.228\end{align*}
4. Percentage of bearings falling within one standard deviation of the mean:
\scriptsize \begin{align*}\text{One standard deviation from the mean }&= \bar{x}\pm \sigma \\&=15\text{ }070.2\pm 41.2\end{align*}
So the interval is $\scriptsize \left[ {15\text{ 070}\text{.2}-41.2;15\text{ 070}\text{.2}+41.2} \right]=\left[ {15\text{ 0}29;15\text{ 111}.4} \right]$.
Four bearings fall outside this interval ($\scriptsize 15\text{ }015;\text{ 15 028; 15 137; 15 009 }$), so $\scriptsize \displaystyle \frac{6}{{10}}=60\%$ fall within one standard deviation from the mean.

Back to Exercise 1.1

### Unit 1: Assessment

1. .
 Country Number malaria cases per $\scriptsize 1\text{ }000$ Deviation from mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ Angola $\scriptsize 227.36$ $\scriptsize 107.32$ $\scriptsize 11\text{ 517}\text{.39}$ Botswana $\scriptsize 0.59$ $\scriptsize -119.45$ $\scriptsize 14\text{ 268}\text{.52}$ Kenya $\scriptsize 60.05$ $\scriptsize -59.99$ $\scriptsize 3\text{ 598}\text{.91}$ Malawi $\scriptsize 207.33$ $\scriptsize 87.29$ $\scriptsize 7\text{ 619}\text{.39}$ Mozambique $\scriptsize 314.66$ $\scriptsize 194.62$ $\scriptsize 37\text{ 876}\text{.59}$ Namibia $\scriptsize 31.68$ $\scriptsize -88.36$ $\scriptsize 7\text{ 807}\text{.65}$ South Africa $\scriptsize 1.65$ $\scriptsize -118.39$ $\scriptsize 14\text{ 016}\text{.41}$ Zimbabwe $\scriptsize 55.97$ $\scriptsize -64.07$ $\scriptsize 4\text{ 105}\text{.08}$ Eswatini $\scriptsize 0.97$ $\scriptsize -119.07$ $\scriptsize 14\text{ 177}\text{.88}$ Uganda $\scriptsize 262.69$ $\scriptsize 142.65$ $\scriptsize 20\text{ 348}\text{.76}$ Zambia $\scriptsize 157.5$ $\scriptsize 37.46$ $\scriptsize 1\text{ 403}\text{.18}$
1. $\scriptsize \text{Range = }314.66-0.59=314.07$
2. Mean:
$\scriptsize \text{Mean }=\bar{x}=\displaystyle \frac{{\sum{x}}}{{11}}=\displaystyle \frac{{1320.45}}{{11}}=120.04$
3. Standard deviation:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{136\text{ 739}.8}}{{11}}}}\\&=\sqrt{{12\text{ 430}\text{.89}}}\\&=111.49\end{align*}
2. .
 Price in Rands Deviation from mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize 103\text{ 1}25$ $\scriptsize 2\text{ 021}\text{.88}$ $\scriptsize 4\text{ 087 978}\text{.52}$ $\scriptsize \text{129 900}$ $\scriptsize 28\text{ 796}\text{.88}$ $\scriptsize 829\text{ 260 009}\text{.77}$ $\scriptsize \text{87 900}$ $\scriptsize -13\text{ 203}\text{.13}$ $\scriptsize 174\text{ 322 509}\text{.77}$ $\scriptsize \text{99 900}$ $\scriptsize -1\text{ 203}\text{.13}$ $\scriptsize 1\text{ 447 509}\text{.77}$ $\scriptsize \text{85 000}$ $\scriptsize -16\text{ 103}\text{.13}$ $\scriptsize 259\text{ 310 634}\text{.77}$ $\scriptsize \text{120 000}$ $\scriptsize 18\text{ 896}\text{.88}$ $\scriptsize 357\text{ 091 884}\text{.77}$ $\scriptsize \text{95 000}$ $\scriptsize -6\text{ 103}\text{.13}$ $\scriptsize 37\text{ 248 }\text{134}\text{.77}$ $\scriptsize \text{88 000}$ $\scriptsize -13\text{ 103}\text{.13}$ $\scriptsize 171\text{ 691 884}\text{.77}$
1. $\scriptsize \text{Range = }129\text{ 900}-\text{85 000}=\text{ R}44\text{ 900}$
2. .
$\scriptsize \text{Mean }=\bar{x}=\displaystyle \frac{{\sum{x}}}{8}=\displaystyle \frac{{808\text{ 825}}}{8}=101\text{ 10}3.10$
The mean price is $\scriptsize \text{R101 103}\text{.10}$.
3. Standard deviation:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{1\text{ 834 460 546}.88}}{8}}}\\&=\sqrt{{2\text{29 307 568}.40}}\\&=15\text{ 142}.90\end{align*}
The standard deviation is $\scriptsize \text{R15 142}\text{.90}$.
3. Table of values:
 Mark Deviation from mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize 66$ $\scriptsize 2.8$ $\scriptsize 7.84$ $\scriptsize 59$ $\scriptsize -4.2$ $\scriptsize 17.64$ $\scriptsize 43$ $\scriptsize -20.2$ $\scriptsize 408.04$ $\scriptsize 72$ $\scriptsize 8.8$ $\scriptsize 77.44$ $\scriptsize 57$ $\scriptsize -6.2$ $\scriptsize 38.44$ $\scriptsize 47$ $\scriptsize -16.2$ $\scriptsize 262.44$ $\scriptsize 81$ $\scriptsize 17.8$ $\scriptsize 316.84$ $\scriptsize 54$ $\scriptsize -9.2$ $\scriptsize 84.64$ $\scriptsize 92$ $\scriptsize 28.8$ $\scriptsize 829.44$ $\scriptsize 61$ $\scriptsize -2.2$ $\scriptsize 4.84$

Standard deviation of the marks of the $\scriptsize 10$ learners:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{2\text{ 047}\text{.6}}}{{10}}}}\\&=\sqrt{{204.76}}\\&=1\text{4}.31\end{align*}

4. Table of values:
 Minutes taken Deviation from mean $\scriptsize x-\bar{x}$ (Deviation)2 $\scriptsize {{\left( {x-\bar{x}} \right)}^{2}}$ $\scriptsize 84$ $\scriptsize -16.29$ $\scriptsize 265.36$ $\scriptsize 150$ $\scriptsize 49.71$ $\scriptsize 2471.08$ $\scriptsize 54$ $\scriptsize -46.29$ $\scriptsize 2142.76$ $\scriptsize 126$ $\scriptsize 25.71$ $\scriptsize 661.00$ $\scriptsize 78$ $\scriptsize -22.29$ $\scriptsize 496.84$ $\scriptsize 102$ $\scriptsize 1.71$ $\scriptsize 2.92$ $\scriptsize 108$ $\scriptsize 7.71$ $\scriptsize 59.44$

Standard deviation of the times taken:
\scriptsize \begin{align*}\sigma &=\sqrt{{\displaystyle \frac{{\sum{{{{{\left( {x-\bar{x}} \right)}}^{2}}}}}}{n}}}\\&=\sqrt{{\displaystyle \frac{{6\text{ 099}\text{.43}}}{7}}}\\&=\sqrt{{204.76}}\\&=29.52\end{align*}

Back to Unit 1: Assessment