Simple Regression & Correlation Analysis: Scattergram
Problem Description and Data
This example examines the relationship between annual family Food Expenditure and
family Income. The maintained hypothesis is that family food
expenditure increases as family income increases and conversely, ceteris paribus
(including family size). Thus, the implied causal relationship
is direct or positive.
The following set of data is obtained from 20 families in a metropolitan area in 1993
(Source: Hamburg and Young, 1994, p. 507):
Family |
Annual Food Expenditure ($000) |
Annual Income($000) |
Family Size (number in family) |
1 |
5.2 |
28 |
3 |
2 |
5.1 |
26 |
3 |
3 |
5.6 |
32 |
2 |
4 |
4.6 |
24 |
1 |
5 |
11.3 |
54 |
4 |
6 |
8.1 |
59 |
2 |
7 |
7.8 |
44 |
3 |
8 |
5.8 |
30 |
2 |
9 |
5.1 |
40 |
1 |
10 |
18.0 |
82 |
6 |
11 |
4.9 |
42 |
3 |
12 |
11.8 |
58 |
4 |
13 |
5.2 |
28 |
1 |
14 |
4.8 |
20 |
5 |
15 |
7.9 |
42 |
3 |
16 |
6.4 |
47 |
1 |
17 |
20.0 |
112 |
6 |
18 |
13.7 |
85 |
5 |
19 |
5.1 |
31 |
2 |
20 |
2.9 |
26 |
2 |
1. The general pattern of the dots are upward rising; hence it can be concluded that the true causal relationship between family annual Food Expenditure (Dependent Variable) and Income (Independent Variable) is indeed Direct or Positive.
2. Because a straight line can be drawn through the dots such that some lie above it while others lie below it, the true relationship can be described as Linear in a mathematical sense. Hence it can be estimated using a linear equation of the form: . This is an instance of a deterministic relationship by assuming that some uncertain factors do not influence family Food Expenditure besides Income.
3. Because the dots appear to cluster about such a straight line, the true Causal Relationship between the two variables is possibly strong (remember, only 20 data points are used in this example).
4. By fitting a straight line through the data using a linear equation
of the type stated above and allowing for random disturbance term, the true relationship
can be estimated or quantified using the classical Least Squares Method
of estimation. The equation so estimated is called the Sample
Regression Line (SRL), in contrast to the Population
Regression Line (PRL). In
the simple regression analysis, both the PRL and SRL constitute the core of the Classical Linear Regression Model (often denoted as CLRM or simply
LRM. I prefer the latter). Note that when the context is clear, the terms regression line and regression
equation are often used for sample regression line
and sample regression equation, correspondingly.
Knowing the type of causal relationship is only one aspect of
assessing the relationship between any two or more variables; this is what is accomplished
with Regression Analysis. Knowing or
measuring the degree/strength of the relationhsip is
accomplished by undertaking Correlation
Analysis, which is a separate statistical technique altogether. However, it is often
performed in conjunction with the regression analysis for the sake of assessing both the type and the strength of relationship
at the same time. Hence, most introductory statistics texts often present both techniques
together.
Top or back to Simple Regression &
Correlation Example or Home Page
or Send me your Comments via E-mail
Copyright© 1996, Ebenge Usip, all rights reserved.
Last revised: Wednesday, July 10, 2013.