**Simple Regression &
Correlation Analysis: Scattergram **

**Problem Description and Data**

This example examines the relationship between annual family Food Expenditure and
family Income. The maintained hypothesis is that **family food
expenditure increases as family income increases and conversely, ceteris paribus
(including family size)**. Thus, the implied causal relationship
is direct or positive.

The following set of data is obtained from 20 families in a metropolitan area in 1993
(Source: Hamburg and Young, 1994, p. 507):

Family |
Annual Food Expenditure ($000) |
Annual Income($000) |
Family Size (number in family) |

1 |
5.2 |
28 |
3 |

2 |
5.1 |
26 |
3 |

3 |
5.6 |
32 |
2 |

4 |
4.6 |
24 |
1 |

5 |
11.3 |
54 |
4 |

6 |
8.1 |
59 |
2 |

7 |
7.8 |
44 |
3 |

8 |
5.8 |
30 |
2 |

9 |
5.1 |
40 |
1 |

10 |
18.0 |
82 |
6 |

11 |
4.9 |
42 |
3 |

12 |
11.8 |
58 |
4 |

13 |
5.2 |
28 |
1 |

14 |
4.8 |
20 |
5 |

15 |
7.9 |
42 |
3 |

16 |
6.4 |
47 |
1 |

17 |
20.0 |
112 |
6 |

18 |
13.7 |
85 |
5 |

19 |
5.1 |
31 |
2 |

20 |
2.9 |
26 |
2 |

**1**. The general pattern of the dots are upward rising; hence it can be
concluded that the true causal relationship between family
annual **Food Expenditure (Dependent Variable)** and **Income
(Independent Variable)** is indeed** Direct or Positive**.

**2**. Because a **straight line** can be drawn through the
dots such that some lie above it while others lie below it, the true relationship can be
described as **Linear **in a mathematical sense.
Hence it can be estimated using a **linear equation** of the form: . This is an
instance of a **deterministic
relationship** by assuming that some uncertain factors do not influence
family Food Expenditure besides Income.

**3**. Because the dots appear to cluster about such a straight line, the
true **Causal Relationship** between the two variables is possibly **strong**
(remember, only 20 data points are used in this example).

**4**. By fitting a straight line through the data using a linear equation
of the type stated above and allowing for random disturbance term, the true relationship
can be estimated or quantified using the classical * Least Squares Method*
of estimation. The equation so estimated is called the

Knowing the type of causal relationship is only one aspect of assessing the relationship between any two or more variables; this is what is accomplished with Regression Analysis. Knowing or measuring the degree/strength of the relationhsip is accomplished by undertaking Correlation Analysis, which is a separate statistical technique altogether. However, it is often performed in conjunction with the regression analysis for the sake of assessing both the type and the strength of relationship at the same time. Hence, most introductory statistics texts often present both techniques together.

**Copyright© 1996, Ebenge Usip, all rights reserved.
Last revised: Wednesday, July 10, 2013.**