Data handling: Use variance and regression analysis to interpolate and extrapolate bivariate data

# Unit 3: Linear regression analysis

Gill Scott

### Unit outcomes

By the end of this unit you will be able to:

• Determine the linear regression equation $\scriptsize \hat{y}=a+bx$.
• Use the regression line to predict the outcome of a given problem.

## What you should know

Before you start this unit, make sure you can:

• Calculate measures of central tendency of a data set, such as the mean, median and mode, and interpret what these tell you about a data set. To revise this, you can work through:
• Calculate and interpret the variance and standard deviation of a set of data. Refer to unit 1 of this subject outcome to revise this.
• Represent bivariate data as a scatter plot, and identify intuitively the best fit linear function to this data. You can refer to unit 2 of this subject outcome to revise this.

## Introduction

Unit 2 of this subject outcome focused on drawing scatter plots of given bivariate data sets, and intuitively drawing lines of best fit between the two components of the plotted points. The obvious next step in this process is to use a more reliable system than intuition to find the lines of best fit. This unit explains the ‘least squares regression’ method of determining a straight-line function that best fits a given set of bivariate data.

## Residuals

In example 2.1 from unit 2 of this subject outcome we drew an intuitive line of best fit of data measuring the number of sweets children ate per week against their average number of hours’ sleep per day. The data and the scatter plot with best fit line are repeated below:

 No. sweets per week $\scriptsize 15$ $\scriptsize 9$ $\scriptsize 10$ $\scriptsize 6$ $\scriptsize 23$ $\scriptsize 8$ $\scriptsize 13$ $\scriptsize 3$ Average hours’ sleep per day $\scriptsize 4.5$ $\scriptsize 5$ $\scriptsize 6$ $\scriptsize 7$ $\scriptsize 3$ $\scriptsize 5$ $\scriptsize 4$ $\scriptsize 8.5$

In this example, we intuitively found the line of best fit and we saw that the equation of the line would be different for different people, depending on which coordinates we chose to use. For mathematical accuracy and consistency, we need a method that will give one single equation of the line of best fit for a given dataset. This method is called linear regression.

The form of the linear regression equation preferred by statisticians for the best fit straight line, is $\scriptsize \hat{y}=a+bx$. We say ‘y hat’: statistics uses the ‘hat’ operator ‘ˆ’ to indicate that a value is an estimation. The intercept on the y-axis is $\scriptsize a$, and $\scriptsize b$ is the slope of the line. This equation is a variant of the familiar $\scriptsize y=mx+c$ straight-line equation, with $\scriptsize \displaystyle a$ instead of $\scriptsize \displaystyle c$, and $\scriptsize \displaystyle b$ instead of $\scriptsize m$.

Where $\scriptsize x$ is the independent variable, the best-fit straight line is the one where the sum of the deviations of y-distances from each given point to the line is a minimum. This distance, $\scriptsize y-\hat{y}$, is called the residual. The strategy of squaring, as used in calculating standard deviation, solves the problem of negative residuals cancelling out those that are positive.

### Activity 3.1: Calculating the distances from data points to ŷ, the line of best fit

Time required: 20 minutes

What you need:

• a pen
• paper (preferably graph paper)
• a ruler

What to do:

1. With $\scriptsize \hat{y}$ being the line of best fit (the dotted line in the graph above):
1. Read off the $\scriptsize \hat{y}$ coordinate for each $\scriptsize x$ value plotted from the given data set. Note it down in a copy of the table below.
2. Measure as accurately as you can the vertical distance from each plotted data point to the best fit line. Note this in the $\scriptsize y-\hat{y}$ column with the appropriate sign, whether positive or negative.
2. Calculate the ‘residual’ $\scriptsize y-\hat{y}$ for each $\scriptsize x$ value, noting it in the table.
3. Calculate the squared error for each given data point and the best fit straight line. Note these in the appropriate column.
4. Calculate the sum of the residuals and note it in the table.
5. Write down what the sum of residuals says about how well $\scriptsize \hat{y}$ fits to the data.
6. Calculate the sum of the squared error values, and note this in the table.
7. Discuss what the value of the squared error says about the goodness of fit of $\scriptsize \hat{y}$ to the data.
 No. sweets per week $\scriptsize x$ Average hours’ sleep per night $\scriptsize y$ $\scriptsize \hat{y}$ Residual $\scriptsize y- \hat{y}$ Squared error $\scriptsize (y- \hat{y})^2$ $\scriptsize 15$ $\scriptsize 4.5$ $\scriptsize 9$ $\scriptsize 5$ $\scriptsize 10$ $\scriptsize 6$ $\scriptsize 6$ $\scriptsize 7$ $\scriptsize 23$ $\scriptsize 3$ $\scriptsize 8$ $\scriptsize 5$ $\scriptsize 13$ $\scriptsize 4$ $\scriptsize 3$ $\scriptsize 8.5$ Sums

What did you find?

Your $\scriptsize \hat{y}$ values may differ slightly from those inserted in the table below. You will probably find that it is difficult to be completely accurate.

Answers to questions 1. to 4. and 6. are in the table below.

 No. sweets per week $\scriptsize x$ Average hours’ sleep per night $\scriptsize y$ $\scriptsize \hat{y}$ Residual $\scriptsize y- \hat{y}$ Squared error $\scriptsize (y- \hat{y})^2$ $\scriptsize 15$ $\scriptsize 4.5$ $\scriptsize 4.4$ $\scriptsize 0.1$ $\scriptsize 0.01$ $\scriptsize 9$ $\scriptsize 5$ $\scriptsize 5.8$ $\scriptsize -0.8$ $\scriptsize 0.64$ $\scriptsize 10$ $\scriptsize 6$ $\scriptsize 5.6$ $\scriptsize 0.4$ $\scriptsize 0.16$ $\scriptsize 6$ $\scriptsize 7$ $\scriptsize 6.7$ $\scriptsize 0.3$ $\scriptsize 0.09$ $\scriptsize 23$ $\scriptsize 3$ $\scriptsize 2.3$ $\scriptsize 0.7$ $\scriptsize 0.49$ $\scriptsize 8$ $\scriptsize 5$ $\scriptsize 6.1$ $\scriptsize -1.1$ $\scriptsize 1.21$ $\scriptsize 13$ $\scriptsize 4$ $\scriptsize 4.9$ $\scriptsize -0.9$ $\scriptsize 0.81$ $\scriptsize 3$ $\scriptsize 8.5$ $\scriptsize 7.4$ $\scriptsize 1.1$ $\scriptsize 1.21$ Sums $\scriptsize -0.2$ $\scriptsize 4.62$
1. The sum of residuals is not a good indicator of how well $\scriptsize \hat{y}$ fits to the data. Although the $\scriptsize -0.2$ sum is quite a small value, this could easily be the result of large positive and negative differences cancelling each other out.
2. The sum of squared error does relate to how well $\scriptsize \hat{y}$ fits to the data: the smaller this sum, the closer the line is to each of the data points. However, this value in itself does not seem to be helpful in the actual plotting of the line of best fit.

## Finding the equation $\scriptsize \hat{y}=a+bx$ of a straight line of best fit

The formula for finding the gradient of the line of best fit of a set of bivariate data points uses the mean and standard deviation of all the $\scriptsize x \text{-values}$, and the mean and standard deviation of the given $\scriptsize y \text{-values}$ .

### Take note!

The derivation of the formulae used for finding the equation of the line is fairly complicated and is not required at this level.

### Note

If you would like to investigate the derivation of the formulae, when you have access to the internet, watch the series of videos starting at “Introduction to residuals and least squares regression”.

The equation $\scriptsize \hat{y}=a+bx$ of the straight line of best fit satisfies the following conditions:

• The formulae are based on the fact that $\scriptsize x$ is the independent variable, and $\scriptsize y$ is the dependent variable (this is an important requirement).
• The point $\scriptsize (\bar{x},\text{ }\bar{y})$ lies on the line of best fit, where $\scriptsize \bar{x}$ is the mean of all the given $\scriptsize x\text{-values}$ in the dataset, and $\scriptsize \bar{y}$ is the mean of all the given $\scriptsize y\text{-values}$.
• The gradient: $\scriptsize b=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}$
You will remember that the standard deviation also used the difference between data values and their means.
• The y-intercept: $\scriptsize a=\displaystyle \frac{{\sum{{y-b\sum{x}}}}}{n}=\bar{y}-b\bar{x}$ where $\scriptsize b$ is the gradient of the best-fit line.
• The sum of residuals is $\scriptsize 0$.
• The mean of residuals is $\scriptsize 0$.
• The sum of squares of residuals is a minimum.

The first bullet point above is a requirement, and the next three bullets are the tools we use to find the equation of the line of best fit.

### Example 3.1

This example further develops question 2 from exercise 2.1 of unit 2 in this subject outcome.

Dr Dandara is a scientist trying to find a cure for a disease which has an $\scriptsize 80\%$ mortality rate. That means that $\scriptsize 80\%$ of people who get the disease will die. He knows of a plant which is used in traditional medicine to treat the disease. He extracts the active ingredient from the plant and tests different dosages (measured in milligrams) on different groups of patients. Examine the data below and complete the questions that follow.

 Dosage (mg) $\scriptsize 0$ $\scriptsize 25$ $\scriptsize 50$ $\scriptsize 75$ $\scriptsize 100$ $\scriptsize 125$ $\scriptsize 150$ $\scriptsize 175$ $\scriptsize 200$ Mortality rate ($\scriptsize \%$) $\scriptsize 80$ $\scriptsize 73$ $\scriptsize 63$ $\scriptsize 49$ $\scriptsize 42$ $\scriptsize 32$ $\scriptsize 25$ $\scriptsize 11$ $\scriptsize 5$
1. Draw the scatter plot of the data on graph paper.
2. Use the formulae for $\scriptsize b$ and $\scriptsize a$ to find the regression equation for the line of best fit.
3. Draw the line of best fit on the scatter plot. Compare this line to the best fit line you drew in unit 2.
4. Use the equation of the line of best fit to estimate the dosage required for a $\scriptsize 0\%$ mortality rate.

Solution

1. .
2. It is best to tabulate the data in these questions to be able to check your answers. Also, the Assessment Guidelines require that these values should be calculated ‘manually’ – that is without using your calculator’s statistical function. Examination questions also sometimes stipulate that calculator statistical functions should not be used. For this reason, this approach has been adopted in this unit.
 Dosage (mg) $\scriptsize x$ Mortality rate ($\scriptsize \%$) $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 0$ $\scriptsize 80$ $\scriptsize -100$ $\scriptsize 37.8$ $\scriptsize -3~777.8$ $\scriptsize 10~000$ $\scriptsize 25$ $\scriptsize 73$ $\scriptsize -75$ $\scriptsize 30.8$ $\scriptsize -2~308.3$ $\scriptsize 5~625$ $\scriptsize 50$ $\scriptsize 63$ $\scriptsize -50$ $\scriptsize 20.8$ $\scriptsize -1~038.9$ $\scriptsize 2~500$ $\scriptsize 75$ $\scriptsize 49$ $\scriptsize -25$ $\scriptsize 6.8$ $\scriptsize -169.4$ $\scriptsize 625$ $\scriptsize 100$ $\scriptsize 42$ $\scriptsize 0$ $\scriptsize -0.2$ $\scriptsize 0$ $\scriptsize 0$ $\scriptsize 125$ $\scriptsize 32$ $\scriptsize 25$ $\scriptsize -10.2$ $\scriptsize -255.6$ $\scriptsize 626$ $\scriptsize 150$ $\scriptsize 25$ $\scriptsize 50$ $\scriptsize -17.2$ $\scriptsize -861.1$ $\scriptsize 2~500$ $\scriptsize 175$ $\scriptsize 11$ $\scriptsize 75$ $\scriptsize -31.2$ $\scriptsize -2~341.7$ $\scriptsize 5~625$ $\scriptsize 200$ $\scriptsize 5$ $\scriptsize 100$ $\scriptsize -37.2$ $\scriptsize -3~722.2$ $\scriptsize 10~000$ Sums $\scriptsize 900$ $\scriptsize 380$ $\scriptsize -14~475$ $\scriptsize 37~500$ Mean $\scriptsize 100$ $\scriptsize 42.2$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{-14\text{ }475}}{{37\text{ }500}}\\&=-0.386\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\displaystyle \frac{{\sum{{y-b\sum{x}}}}}{n}=\bar{y}-b\bar{x}\\&=42.2-(-0.386)(100)\\&=42.2+38.6\\&=80.8\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=80.8-0.386x\end{align*}
.
Check that the point $\scriptsize (\bar{x},\text{ }\bar{y})=(100,42.2)$ lies on the line.
Substituting $\scriptsize x=100$ into the regression equation gives $\scriptsize \hat{y}=80.8-0.386x=80.8-38.6=42.2$, so this point lies on the line.

3. .
4. To find the dosage for a $\scriptsize 0\%$ mortality rate, substitute $\scriptsize \hat{y}=0$ in the regression equation, therefore:
\scriptsize \begin{align*}\hat{y}&=80.8-0.386x\\0.386x&=80.8\\x&=209.33\end{align*}

### Exercise 3.1

Question 1 follows on from exercise 2.1 number 3 of unit 2 in this subject outcome.

1. The enrolment of learners for NC(V) programmes at TVET colleges was reported as follows:
 Year $\scriptsize 2010$ $\scriptsize 2011$ $\scriptsize 2012$ $\scriptsize 2013$ $\scriptsize 2014$ $\scriptsize 2015$ $\scriptsize 2016$ NC(V) enrolment (in thousands) $\scriptsize 130$ $\scriptsize 124$ $\scriptsize 140$ $\scriptsize 154$ $\scriptsize 166$ $\scriptsize 165$ $\scriptsize 177$
1. Draw a scatter plot of this data with the year on the horizontal axis and the enrolment on the vertical axis.
2. Use the formulae for $\scriptsize b$ and $\scriptsize a$ to find the regression equation for the line of best fit.
3. Draw the line of best fit on the scatter plot. Compare this line to the best fit line you drew in unit 2, exercise 2.1 question 3.
4. According to the equation, what enrolment could be expected in the year $\scriptsize 2022$?

Question 2 taken from Siyavula Grade 12 Mathematics Exercise 9-3

1. For each of the following data sets:
1. .
 $\scriptsize x$ $\scriptsize 10$ $\scriptsize 4$ $\scriptsize 9$ $\scriptsize 11$ $\scriptsize 11$ $\scriptsize 6$ $\scriptsize 8$ $\scriptsize 18$ $\scriptsize y$ $\scriptsize 1$ $\scriptsize 0$ $\scriptsize 6$ $\scriptsize 3$ $\scriptsize 9$ $\scriptsize 5$ $\scriptsize 9$ $\scriptsize 8$
2. .
 $\scriptsize x$ $\scriptsize 8$ $\scriptsize 12$ $\scriptsize 12$ $\scriptsize 7$ $\scriptsize 6$ $\scriptsize 14$ $\scriptsize 8$ $\scriptsize 14$ $\scriptsize y$ $\scriptsize -5$ $\scriptsize 4$ $\scriptsize 3$ $\scriptsize -3$ $\scriptsize -5$ $\scriptsize -6$ $\scriptsize -2$ $\scriptsize 0$
3. .
 $\scriptsize x$ $\scriptsize 1.9$ $\scriptsize 1.1$ $\scriptsize -1.5$ $\scriptsize 1.3$ $\scriptsize 0.95$ $\scriptsize 8.25$ $\scriptsize 10.6$ $\scriptsize 6.2$ $\scriptsize y$ $\scriptsize 7$ $\scriptsize 8.45$ $\scriptsize 0.9$ $\scriptsize 0.1$ $\scriptsize 2.45$ $\scriptsize 4.35$ $\scriptsize 2.2$ $\scriptsize 1.4$
1. Draw a scatter plot of the data.
2. Use a table to determine the values of $\scriptsize b$ and $\scriptsize a$ in order to find the least squares regression equation for each line of best fit. Round $\scriptsize b$ and $\scriptsize a$ off to two decimal places where necessary.
3. Draw the line of best fit on the scatter plot.
4. Use your equation in each case to predict the value of $\scriptsize y$ when $\scriptsize x=25$.

Question 3 follows on from question 3 of unit 2 assessment in this subject outcome

1. A college assists learners to complete their national diplomas by negotiating with employers in the region with the aim of placing the learners for work experience. Over recent years they have tracked their engagements with employers against numbers of learners placed in work experience as follows:
 No. employers engaged $\scriptsize 15$ $\scriptsize 45$ $\scriptsize 65$ $\scriptsize 35$ $\scriptsize 38$ $\scriptsize 25$ $\scriptsize 40$ $\scriptsize 30$ No. learners placed $\scriptsize 40$ $\scriptsize 90$ $\scriptsize 128$ $\scriptsize 90$ $\scriptsize 95$ $\scriptsize 60$ $\scriptsize 140$ $\scriptsize 75$
1. Draw a scatter plot of the data.
2. Use a table to determine the values of $\scriptsize b$ and $\scriptsize a$ in order to find the least squares regression equation for the line of best fit. Round $\scriptsize b$ and $\scriptsize a$ off to two decimal places where necessary.
3. Draw the line of best fit on the scatter plot. Compare this line to the best fit line you drew in unit 2.
4. Use the equation of the line of best fit to estimate the number of employers that would need to be engaged in order to place $\scriptsize 175$ learners in work placements.

The full solutions are at the end of the unit.

## Summary

In this unit you have learnt the following:

• How to determine the linear regression equation $\scriptsize \hat{y}=a+bx$ for a set of bivariate data.
• How to use the regression line to predict the outcome of a given problem.

# Unit 3: Assessment

#### Suggested time to complete: 55 minutes

See question 1 of unit 2 assessment; question adapted from NC(V) Mathematics Level 4 examination, November 2017

1. A study was done to compare electricity usage of geysers that are inside or outside the house. The table below shows the electricity usage (in kilowatt hours) for equivalent water consumption for matched households that have geysers inside the house, and those with geysers outside the house. Nine houses of each type were considered in the study.
 Inside the house (kWh) $\scriptsize 29$ $\scriptsize 31$ $\scriptsize 20$ $\scriptsize 40$ $\scriptsize 26$ $\scriptsize 39$ $\scriptsize 32$ $\scriptsize 34$ $\scriptsize 35$ Outside the house (kWh) $\scriptsize 19$ $\scriptsize 23$ $\scriptsize 13$ $\scriptsize 32$ $\scriptsize 17$ $\scriptsize 28$ $\scriptsize 25$ $\scriptsize 24$ $\scriptsize 28$
1. Draw a scatter plot of the data.
2. Using the information above, find the sample regression equation using the method of least squares.
3. If the geyser fitted outside the house uses $\scriptsize 40\text{kWh}$, what will the usage be with the geyser inside the house?

Question 2 taken from NC(V) Mathematics Level 4 examination, November 2015

1. ATA Consultants is a company that offers tuition for learners from grade 8 to grade 12. For the past $\scriptsize 5$ years they have distributed flyers to learners and have enrolled numbers of learners according to the table given below.
 Number of flyers distributed $\scriptsize (x)$ Number of learners enrolled $\scriptsize (y)$ $\scriptsize 50$ $\scriptsize 15$ $\scriptsize 250$ $\scriptsize 45$ $\scriptsize 200$ $\scriptsize 40$ $\scriptsize 350$ $\scriptsize 65$ $\scriptsize 150$ $\scriptsize 35$
1. Draw a scatter plot showing on the x-axis the number of flyers distributed and on the y-axis the number of learners enrolled.
2. Using the information above find the simple regression equation by the method of least squares.
3. Use the regression equation to determine the number of learners that would be enrolled if $\scriptsize 500$ flyers were sent out.

Question 3 taken from NC(V) Mathematics Level 4 examination, November 2019

1. The data below shows the mathematics marks of ten learners at a college for the internal examinations and the external examinations.
 Internal examinations $\scriptsize (x)$ $\scriptsize 80$ $\scriptsize 68$ $\scriptsize 94$ $\scriptsize 72$ $\scriptsize 74$ $\scriptsize 83$ $\scriptsize 56$ $\scriptsize 68$ $\scriptsize 65$ $\scriptsize 75$ External examinations $\scriptsize (x)$ $\scriptsize 72$ $\scriptsize 71$ $\scriptsize 96$ $\scriptsize 77$ $\scriptsize 82$ $\scriptsize 72$ $\scriptsize 58$ $\scriptsize 83$ $\scriptsize 78$ $\scriptsize 80$
1. Make a scatter plot of the marks in the above table on an x-y plane, with each axis showing values from $\scriptsize 50$ to $\scriptsize 96$.
2. Calculate the equation of the least squares regression line for the data. No marks will be awarded if answers are taken directly from a calculator. Complete all the calculations in a table that shows
EITHER (x) (y) xy x2
OR (x) (y) $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize {{(x-\bar{x})}^{2}}$
3. Draw the least squares regression line on the x-y plane.
4. Calculate the predicted final examination mark for a learner who scores $\scriptsize 70$ in the internal examination.

See unit 2 assessment question 2

1. Tobacco smoking is still one of the world’s largest health problems, although prevalence of smoking is generally decreasing. The table below shows numbers of deaths (in thousands) in South Africa from smoking, in recent years.
 Year $\scriptsize 2010$ $\scriptsize 2011$ $\scriptsize 2012$ $\scriptsize 2013$ $\scriptsize 2014$ $\scriptsize 2015$ $\scriptsize 2016$ $\scriptsize 2017$ Deaths $\scriptsize ('000)$ $\scriptsize 38.0$ $\scriptsize 35.8$ $\scriptsize 34.1$ $\scriptsize 32.6$ $\scriptsize 31.8$ $\scriptsize 31.5$ $\scriptsize 31.3$ $\scriptsize 29.9$
1. Draw a scatter plot of the data.
2. Calculate the equation of the least squares regression line for the data.
3. Draw the least squares regression line into the graph of the scatter plot.
4. Draw the line of best fit on the scatter plot. Compare this line to the best fit line you drew in unit 2.
5. Use the regression equation to calculate the number of deaths from smoking predicted for $\scriptsize 2022$.

The full solutions are at the end of the unit.

# Unit 3: Solutions

### Exercise 3.1

1. .
1. .
2. .
 Year $\scriptsize x$ Enrolment $\scriptsize ( `000) y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 2010$ $\scriptsize 130$ $\scriptsize -3$ $\scriptsize -20.86$ $\scriptsize 62.58$ $\scriptsize 9$ $\scriptsize 2011$ $\scriptsize 124$ $\scriptsize -2$ $\scriptsize -26.86$ $\scriptsize 53.72$ $\scriptsize 4$ $\scriptsize 2012$ $\scriptsize 140$ $\scriptsize -1$ $\scriptsize -10.86$ $\scriptsize 10.86$ $\scriptsize 1$ $\scriptsize 2013$ $\scriptsize 154$ $\scriptsize 0$ $\scriptsize 3.14$ $\scriptsize 0$ $\scriptsize 0$ $\scriptsize 2014$ $\scriptsize 166$ $\scriptsize 1$ $\scriptsize 15.14$ $\scriptsize 15.14$ $\scriptsize 1$ $\scriptsize 2015$ $\scriptsize 165$ $\scriptsize 2$ $\scriptsize 14.14$ $\scriptsize 28.28$ $\scriptsize 4$ $\scriptsize 2016$ $\scriptsize 177$ $\scriptsize 3$ $\scriptsize 26.14$ $\scriptsize 78.42$ $\scriptsize 9$ Sums $\scriptsize 14~091$ $\scriptsize 1~056$ $\scriptsize 249$ $\scriptsize 28$ Mean $\scriptsize 2~013$ $\scriptsize 150.86$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{249}}{{28}}\\&=8.89\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=150.86-(8.89)(2013)\\&=150.86-17\text{ }895.57\\&=-17\text{ 744}.71\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=-17\text{ 744}.71+8.89x\end{align*}

3. .
4. To find enrolment expected in $\scriptsize 2022$, substitute this value into the regression equation:
\scriptsize \begin{align*}\hat{y}&=-17\text{ 744}.71+8.89x\\\hat{y}&=-17\text{ 744}.71+8.89(2022)\\&=-17\text{ 744}\text{.71}+17\text{ }975.58\\&=230.87\end{align*}
So, enrolment in $\scriptsize 2022$ is expected to be $\scriptsize 230.87$ thousands, or $\scriptsize 230\text{ 870}$ learners.
2. .
1. .
 $\scriptsize x$ $\scriptsize 10$ $\scriptsize 4$ $\scriptsize 9$ $\scriptsize 11$ $\scriptsize 11$ $\scriptsize 6$ $\scriptsize 8$ $\scriptsize 18$ $\scriptsize y$ $\scriptsize 1$ $\scriptsize 0$ $\scriptsize 6$ $\scriptsize 3$ $\scriptsize 9$ $\scriptsize 5$ $\scriptsize 9$ $\scriptsize 8$
1. .
2. .
 $\scriptsize x$ $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 10$ $\scriptsize 1$ $\scriptsize 0.37$ $\scriptsize -4.13$ $\scriptsize -1.53$ $\scriptsize 0.14$ $\scriptsize 4$ $\scriptsize 0$ $\scriptsize -5.63$ $\scriptsize -5.13$ $\scriptsize 28.88$ $\scriptsize 31.70$ $\scriptsize 9$ $\scriptsize 6$ $\scriptsize -0.63$ $\scriptsize 0.87$ $\scriptsize -0.55$ $\scriptsize 0.40$ $\scriptsize 11$ $\scriptsize 3$ $\scriptsize 1.37$ $\scriptsize -2.13$ $\scriptsize -2.92$ $\scriptsize 1.88$ $\scriptsize 11$ $\scriptsize 9$ $\scriptsize 1.37$ $\scriptsize 3.87$ $\scriptsize 5.30$ $\scriptsize 1.88$ $\scriptsize 6$ $\scriptsize 5$ $\scriptsize -3.63$ $\scriptsize -0.13$ $\scriptsize 0.47$ $\scriptsize 13.18$ $\scriptsize 8$ $\scriptsize 9$ $\scriptsize -1.63$ $\scriptsize 3.87$ $\scriptsize -6.31$ $\scriptsize 2.66$ $\scriptsize 18$ $\scriptsize 8$ $\scriptsize 8.37$ $\scriptsize 2.87$ $\scriptsize 24.02$ $\scriptsize 70.06$ Sums $\scriptsize 77$ $\scriptsize 41$ $\scriptsize 50.41$ $\scriptsize 121.9$ Mean $\scriptsize 9.63$ $\scriptsize 5.13$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{50.41}}{{121.9}}\\&=0.41\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=5.13-(0.13)(9.63)\\&=5.13-1.25\\&=3.88\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=3.88+0.41x\end{align*}

3. .
4. To find $\scriptsize y\text{-value}$ when $\scriptsize x=25$, substitute this value into the regression equation:
\scriptsize \begin{align*}\hat{y}&=3.88+0.41x\\\hat{y}&=3.88+0.41(25)\\&=3.88+10.25\\&=14.13\end{align*}
2. .
 $\scriptsize x$ $\scriptsize 8$ $\scriptsize 12$ $\scriptsize 12$ $\scriptsize 7$ $\scriptsize 6$ $\scriptsize 14$ $\scriptsize 8$ $\scriptsize 14$ $\scriptsize y$ $\scriptsize -5$ $\scriptsize 4$ $\scriptsize 3$ $\scriptsize -3$ $\scriptsize -5$ $\scriptsize -6$ $\scriptsize -2$ $\scriptsize 0$
1. .
2. .
 $\scriptsize x$ $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 8$ $\scriptsize -5$ $\scriptsize -2.13$ $\scriptsize -3.25$ $\scriptsize 6.92$ $\scriptsize 4.54$ $\scriptsize 12$ $\scriptsize 4$ $\scriptsize 1.87$ $\scriptsize 2.25$ $\scriptsize 4.21$ $\scriptsize 3.50$ $\scriptsize 12$ $\scriptsize 3$ $\scriptsize 1.87$ $\scriptsize 4.75$ $\scriptsize 8.88$ $\scriptsize 3.50$ $\scriptsize 7$ $\scriptsize -3$ $\scriptsize -3.13$ $\scriptsize -1.25$ $\scriptsize 3.91$ $\scriptsize 9.80$ $\scriptsize 6$ $\scriptsize -5$ $\scriptsize -4.13$ $\scriptsize -4.5$ $\scriptsize 18.59$ $\scriptsize 17.06$ $\scriptsize 14$ $\scriptsize -6$ $\scriptsize 3.87$ $\scriptsize -4.25$ $\scriptsize -16.45$ $\scriptsize 14.98$ $\scriptsize 8$ $\scriptsize -2$ $\scriptsize -2.13$ $\scriptsize -0.25$ $\scriptsize 0.53$ $\scriptsize 4.54$ $\scriptsize 14$ $\scriptsize 0$ $\scriptsize 3.87$ $\scriptsize 1.75$ $\scriptsize 6.77$ $\scriptsize 14.98$ Sums $\scriptsize 81$ $\scriptsize -14$ $\scriptsize 33.36$ $\scriptsize 72.9$ Mean $\scriptsize 10.13$ $\scriptsize -1.75$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{33.36}}{{72.9}}\\&=0.46\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=-1.75-(0.46)(10.13)\\&=-1.75-4.66\\&=-6.41\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=-6.41+0.46x\end{align*}

3. .
4. To find $\scriptsize y\text{-value}$ when $\scriptsize x=25$, substitute this value into the regression equation:
\scriptsize \begin{align*}\hat{y}&=-6.41+0.46x\\\hat{y}&=-6.41+0.46(25)\\&=-6.41+11.5\\&=5.09\end{align*}
3. .
 $\scriptsize x$ $\scriptsize 1.9$ $\scriptsize 1.1$ $\scriptsize -1.5$ $\scriptsize 1.3$ $\scriptsize 0.95$ $\scriptsize 8.25$ $\scriptsize 10.6$ $\scriptsize 6.2$ $\scriptsize y$ $\scriptsize 7$ $\scriptsize 8.45$ $\scriptsize 0.9$ $\scriptsize 0.1$ $\scriptsize 2.45$ $\scriptsize 4.35$ $\scriptsize 2.2$ $\scriptsize 1.4$
1. .
2. .
 $\scriptsize x$ $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 1.9$ $\scriptsize 7$ $\scriptsize -1.7$ $\scriptsize 3.64$ $\scriptsize -6.19$ $\scriptsize 2.89$ $\scriptsize 1.1$ $\scriptsize 8.45$ $\scriptsize -2.5$ $\scriptsize 5.09$ $\scriptsize -12.73$ $\scriptsize 6.25$ $\scriptsize -1.5$ $\scriptsize 0.9$ $\scriptsize -5.1$ $\scriptsize -2.46$ $\scriptsize 12.55$ $\scriptsize 26.01$ $\scriptsize 1.3$ $\scriptsize 0.1$ $\scriptsize -2.3$ $\scriptsize -3.26$ $\scriptsize 7.50$ $\scriptsize 5.29$ $\scriptsize 0.95$ $\scriptsize 2.45$ $\scriptsize -2.65$ $\scriptsize -0.91$ $\scriptsize 2.41$ $\scriptsize 7.02$ $\scriptsize 8.25$ $\scriptsize 4.35$ $\scriptsize 4.65$ $\scriptsize 0.99$ $\scriptsize 4.60$ $\scriptsize 21.62$ $\scriptsize 10.6$ $\scriptsize 2.2$ $\scriptsize 7$ $\scriptsize -1.16$ $\scriptsize -8.12$ $\scriptsize 49$ $\scriptsize 6.2$ $\scriptsize 1.4$ $\scriptsize 2.6$ $\scriptsize -1.96$ $\scriptsize -5.10$ $\scriptsize 6.76$ Sums $\scriptsize 28.8$ $\scriptsize 26.85$ $\scriptsize -5.08$ $\scriptsize 124.84$ Mean $\scriptsize 3.6$ $\scriptsize 3.36$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{-5.08}}{{124.84}}\\&=-0.04\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=3.36-(-0.04)(3.6)\\&=3.36+0.144\\&=3.50\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=3.5-0.04x\end{align*}

3. .
4. To find $\scriptsize y\text{-value}$ when $\scriptsize x=25$, substitute this value into the regression equation:
\scriptsize \begin{align*}\hat{y}&=3.5-0.04x\\\hat{y}&=3.5-0.04(25)\\&=3.5-1\\&=2.5\end{align*}
3. .
 No. employers engaged $\scriptsize 15$ $\scriptsize 45$ $\scriptsize 65$ $\scriptsize 35$ $\scriptsize 38$ $\scriptsize 25$ $\scriptsize 40$ $\scriptsize 30$ No. learners placed $\scriptsize 40$ $\scriptsize 90$ $\scriptsize 128$ $\scriptsize 90$ $\scriptsize 95$ $\scriptsize 60$ $\scriptsize 140$ $\scriptsize 75$
1. .
2. .
 No. employers engaged $\scriptsize x$ No. learners placed $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 15$ $\scriptsize 40$ $\scriptsize -21.63$ $\scriptsize -49.75$ $\scriptsize 1~076.09$ $\scriptsize 467.86$ $\scriptsize 45$ $\scriptsize 90$ $\scriptsize 8.37$ $\scriptsize 0.25$ $\scriptsize 2.09$ $\scriptsize 70.06$ $\scriptsize 65$ $\scriptsize 128$ $\scriptsize 28.37$ $\scriptsize 38.25$ $\scriptsize 1~085.15$ $\scriptsize 804.86$ $\scriptsize 35$ $\scriptsize 90$ $\scriptsize -1.63$ $\scriptsize 0.25$ $\scriptsize -0.41$ $\scriptsize 2.66$ $\scriptsize 38$ $\scriptsize 95$ $\scriptsize 1.37$ $\scriptsize 5.25$ $\scriptsize 7.19$ $\scriptsize 1.88$ $\scriptsize 25$ $\scriptsize 60$ $\scriptsize -11.63$ $\scriptsize -29.75$ $\scriptsize 345.99$ $\scriptsize 135.26$ $\scriptsize 40$ $\scriptsize 140$ $\scriptsize 3.37$ $\scriptsize 50.25$ $\scriptsize 169.34$ $\scriptsize 11.36$ $\scriptsize 30$ $\scriptsize 75$ $\scriptsize -6.63$ $\scriptsize -14.75$ $\scriptsize 97.79$ $\scriptsize 43.96$ Sums $\scriptsize 293$ $\scriptsize 718$ $\scriptsize 2~783.23$ $\scriptsize 1~537.9$ Mean $\scriptsize 36.63$ $\scriptsize 89.75$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{2\text{ }783.23}}{{1\text{ }537.9}}\\&=1.81\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=89.75-(1.81)(36.63)\\&=89.75-66.30\\&=23.45\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=23.45+1.81x\end{align*}

3. .
4. To find $\scriptsize x\text{-value}$ when $\scriptsize y=175$, substitute this value into the regression equation:
\scriptsize \begin{align*}\hat{y}&=23.45+1.81x\\175&=23.45+1.81x\\1.81x&=175-23.45\\x&=\displaystyle \frac{{151.55}}{{1.81}}\\&=83.73\end{align*}
At the current rate, approximately $\scriptsize 84$ employers would need to be engaged in order to find work placements for $\scriptsize 175$ learners.

Back to Exercise 3.1

### Unit 3: Assessment

1. .
 Inside the house (kWh) $\scriptsize 29$ $\scriptsize 31$ $\scriptsize 20$ $\scriptsize 40$ $\scriptsize 26$ $\scriptsize 39$ $\scriptsize 32$ $\scriptsize 34$ $\scriptsize 35$ Outside the house (kWh) $\scriptsize 19$ $\scriptsize 23$ $\scriptsize 13$ $\scriptsize 32$ $\scriptsize 17$ $\scriptsize 28$ $\scriptsize 25$ $\scriptsize 24$ $\scriptsize 28$
1. .
2. .
 Inside the house $\scriptsize x$ Outside the house $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 29$ $\scriptsize 19$ $\scriptsize -2.78$ $\scriptsize -4.22$ $\scriptsize 11.73$ $\scriptsize 7.73$ $\scriptsize 31$ $\scriptsize 23$ $\scriptsize -0.78$ $\scriptsize -0.22$ $\scriptsize 0.17$ $\scriptsize 0.61$ $\scriptsize 20$ $\scriptsize 13$ $\scriptsize -11.78$ $\scriptsize -10.22$ $\scriptsize 120.39$ $\scriptsize 138.77$ $\scriptsize 40$ $\scriptsize 32$ $\scriptsize 8.22$ $\scriptsize 8.78$ $\scriptsize 72.17$ $\scriptsize 67.57$ $\scriptsize 26$ $\scriptsize 17$ $\scriptsize -5.78$ $\scriptsize -6.22$ $\scriptsize 35.95$ $\scriptsize 33.41$ $\scriptsize 39$ $\scriptsize 28$ $\scriptsize 7.22$ $\scriptsize 4.78$ $\scriptsize 34.51$ $\scriptsize 52.13$ $\scriptsize 32$ $\scriptsize 25$ $\scriptsize 0.22$ $\scriptsize 1.78$ $\scriptsize 0.39$ $\scriptsize 0.05$ $\scriptsize 34$ $\scriptsize 24$ $\scriptsize 2.22$ $\scriptsize 0.78$ $\scriptsize 1.73$ $\scriptsize 4.93$ $\scriptsize 35$ $\scriptsize 28$ $\scriptsize 3.22$ $\scriptsize 4.78$ $\scriptsize 15.39$ $\scriptsize 10.37$ Sums $\scriptsize 286$ $\scriptsize 209$ $\scriptsize -0.02$ $\scriptsize 0.02$ $\scriptsize 292.43$ $\scriptsize 315.57$ Mean $\scriptsize 31.78$ $\scriptsize 23.22$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{292.43}}{{315.57}}\\&=0.93\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=23.22-(0.93)(31.78)\\&=23.22-29.56\\&=-6.34\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
\scriptsize \begin{align*}\hat{y}&=a+bx\\\hat{y}&=-6.34+0.93x\end{align*}

3. To find the electricity usage inside the house when the outside usage is $\scriptsize 40\text{kWh}$, substitute into the equation:
\scriptsize \begin{align*}\hat{y}&=-6.34+0.93x\\0.93x&=6.34+\hat{y}\\x&=\displaystyle \frac{{6.34+40}}{{0.93}}\\x&=49.83\end{align*}
According to the regression equation, if the geyser outside the house uses $\scriptsize 40\text{kWh}$, the geyser inside the house will use $\scriptsize 49.83\text{kWh}$.
2. .
 Number of flyers distributed $\scriptsize (x)$ Number of learners enrolled $\scriptsize (y)$ $\scriptsize 50$ $\scriptsize 15$ $\scriptsize 250$ $\scriptsize 45$ $\scriptsize 200$ $\scriptsize 40$ $\scriptsize 350$ $\scriptsize 65$ $\scriptsize 150$ $\scriptsize 35$
1. .
2. .
 Number of flyers distributed $\scriptsize x$ Number of learners enrolled $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 50$ $\scriptsize 15$ $\scriptsize -150$ $\scriptsize -25$ $\scriptsize 3~750$ $\scriptsize 22~500$ $\scriptsize 250$ $\scriptsize 45$ $\scriptsize 50$ $\scriptsize 5$ $\scriptsize 250$ $\scriptsize 2~500$ $\scriptsize 200$ $\scriptsize 40$ $\scriptsize 0$ $\scriptsize 0$ $\scriptsize 0$ $\scriptsize 0$ $\scriptsize 350$ $\scriptsize 65$ $\scriptsize 150$ $\scriptsize 25$ $\scriptsize 3~750$ $\scriptsize 22~500$ $\scriptsize 150$ $\scriptsize 35$ $\scriptsize -50$ $\scriptsize -5$ $\scriptsize 250$ $\scriptsize 2~500$ Sums $\scriptsize 1~000$ $\scriptsize 200$ $\scriptsize 0$ $\scriptsize 8~000$ $\scriptsize 50~000$ Mean $\scriptsize 200$ $\scriptsize 40$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{8\text{ 0}00}}{{50\text{ }000}}\\&=0.16\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=40-(0.16)(200)\\&=40-32\\&=8\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
$\scriptsize \hat{y}=8+0.16x$

3. According to the equation, the number of learners that would be enrolled if $\scriptsize 500$ flyers were sent out would be:
\scriptsize \begin{align*}\hat{y}&=8+0.16x\\\hat{y}&=8+0.16(500)\\&=8+80\\&=88\end{align*}
3. .
 Internal examinations $\scriptsize (x)$ $\scriptsize 80$ $\scriptsize 68$ $\scriptsize 94$ $\scriptsize 72$ $\scriptsize 74$ $\scriptsize 83$ $\scriptsize 56$ $\scriptsize 68$ $\scriptsize 65$ $\scriptsize 75$ External examinations $\scriptsize (x)$ $\scriptsize 72$ $\scriptsize 71$ $\scriptsize 96$ $\scriptsize 77$ $\scriptsize 82$ $\scriptsize 72$ $\scriptsize 58$ $\scriptsize 83$ $\scriptsize 78$ $\scriptsize 80$
1. .
2. .
 Internal examinations $\scriptsize \%$ $\scriptsize x$ External examinations $\scriptsize \%$ $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 80$ $\scriptsize 72$ $\scriptsize 6.5$ $\scriptsize -4.9$ $\scriptsize 31.85$ $\scriptsize 42.25$ $\scriptsize 68$ $\scriptsize 71$ $\scriptsize -5.5$ $\scriptsize -5.9$ $\scriptsize 32.45$ $\scriptsize 30.25$ $\scriptsize 94$ $\scriptsize 96$ $\scriptsize 20.5$ $\scriptsize 19.1$ $\scriptsize 391.55$ $\scriptsize 420.25$ $\scriptsize 72$ $\scriptsize 77$ $\scriptsize -1.5$ $\scriptsize 0.1$ $\scriptsize -0.15$ $\scriptsize 2.25$ $\scriptsize 74$ $\scriptsize 82$ $\scriptsize 0.5$ $\scriptsize 5.1$ $\scriptsize 2.55$ $\scriptsize 0.25$ $\scriptsize 83$ $\scriptsize 72$ $\scriptsize 9.5$ $\scriptsize -4.9$ $\scriptsize -46.55$ $\scriptsize 90.25$ $\scriptsize 56$ $\scriptsize 58$ $\scriptsize -17.5$ $\scriptsize -18.9$ $\scriptsize 330.75$ $\scriptsize 306.25$ $\scriptsize 68$ $\scriptsize 83$ $\scriptsize -5.5$ $\scriptsize 6.1$ $\scriptsize -33.55$ $\scriptsize 30.25$ $\scriptsize 65$ $\scriptsize 78$ $\scriptsize -8.5$ $\scriptsize 1.1$ $\scriptsize -9.35$ $\scriptsize 72.25$ $\scriptsize 75$ $\scriptsize 80$ $\scriptsize 1.5$ $\scriptsize 3.1$ $\scriptsize 4.65$ $\scriptsize 2.25$ Sums $\scriptsize 735$ $\scriptsize 769$ $\scriptsize 640.5$ $\scriptsize 996.5$ Mean $\scriptsize 73.5$ $\scriptsize 76.9$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{640.5}}{{996.5}}\\&=0.64\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=76.9-(0.64)(73.5)\\&=76.9-47.04\\&=29.86\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
$\scriptsize \hat{y}=29.86+0.64x$

3. .
4. The regression equation predicts that the final examination mark of a learner who scores $\scriptsize 70$ in the internal examination will be:
\scriptsize \begin{align*}\hat{y}&=29.86+0.64x\\&=29.86+0.64(70)\\&=29.86+44.8\\&=74.66\end{align*}

See question 2 in unit 2 assessment.

1. .
 Year $\scriptsize 2010$ $\scriptsize 2011$ $\scriptsize 2012$ $\scriptsize 2013$ $\scriptsize 2014$ $\scriptsize 2015$ $\scriptsize 2016$ $\scriptsize 2017$ Deaths $\scriptsize ('000)$ $\scriptsize 38.0$ $\scriptsize 35.8$ $\scriptsize 34.1$ $\scriptsize 32.6$ $\scriptsize 31.8$ $\scriptsize 31.5$ $\scriptsize 31.3$ $\scriptsize 29.9$
1. .
2. .
 Year $\scriptsize x$ Deaths from smoking  $\scriptsize ('000)$ $\scriptsize y$ $\scriptsize x-\bar{x}$ $\scriptsize y-\bar{y}$ $\scriptsize (x-\bar{x})(y-\bar{y})$ $\scriptsize (x-\bar{x})^2$ $\scriptsize 2010$ $\scriptsize 38$ $\scriptsize -3.5$ $\scriptsize 4.87$ $\scriptsize -17.05$ $\scriptsize 12.25$ $\scriptsize 2011$ $\scriptsize 35.8$ $\scriptsize -2.5$ $\scriptsize 2.67$ $\scriptsize -6.68$ $\scriptsize 6.25$ $\scriptsize 2012$ $\scriptsize 34.1$ $\scriptsize -1.5$ $\scriptsize 0.97$ $\scriptsize -1.46$ $\scriptsize 2.25$ $\scriptsize 2013$ $\scriptsize 32.6$ $\scriptsize -0.5$ $\scriptsize -0.53$ $\scriptsize 0.27$ $\scriptsize 0.25$ $\scriptsize 2014$ $\scriptsize 31.8$ $\scriptsize 0.5$ $\scriptsize -1.33$ $\scriptsize -0.67$ $\scriptsize 0.25$ $\scriptsize 2015$ $\scriptsize 31.5$ $\scriptsize 1.5$ $\scriptsize -1.63$ $\scriptsize -2.45$ $\scriptsize 2.25$ $\scriptsize 2016$ $\scriptsize 31.3$ $\scriptsize 2.5$ $\scriptsize -1.83$ $\scriptsize -4.58$ $\scriptsize 6.25$ $\scriptsize 2017$ $\scriptsize 29.9$ $\scriptsize 3.5$ $\scriptsize -3.23$ $\scriptsize -11.31$ $\scriptsize 12.25$ Sums $\scriptsize 16~108$ $\scriptsize 265$ $\scriptsize -43.93$ $\scriptsize 42$ Mean $\scriptsize 2~013.5$ $\scriptsize 33.13$

\scriptsize \begin{align*}b&=\displaystyle \frac{{\sum{{(x-\bar{x})(y-\bar{y})}}}}{{\sum{{{{{(x-\bar{x})}}^{2}}}}}}\\&=\displaystyle \frac{{-43.93}}{{42}}\\&=-1.05\end{align*}
y-intercept:
\scriptsize \begin{align*}a&=\bar{y}-b\bar{x}\\&=33.13-(-1.05)(2013.5)\\&=33.13+2114.18\\&=2147.31\end{align*}
Regression equation: substituting for $\scriptsize b$ and $\scriptsize a$ in $\scriptsize \hat{y}=a+bx$:
$\scriptsize \hat{y}=2147.31-1.05x$

3. .
4. The regression equation indicates that the number of deaths (thousands) from smoking predicted for $\scriptsize 2022$ will be:
\scriptsize \begin{align*}\hat{y}&=2147.31-1.05x\\&=2147.31-1.05(2022)\\&=2147.31-2123.1\\&=24.21\end{align*}

Back to Unit 3: Assessment