Simple Regression & Correlation Analysis: Scattergram

Problem Description and Data
This example examines the relationship between annual family Food Expenditure and family Income. The maintained hypothesis is that family food expenditure increases as family income increases and conversely, ceteris paribus (including family size). Thus, the implied causal relationship is direct or positive.

The following set of data is obtained from 20 families in a metropolitan area in 1993 (Source: Hamburg and Young, 1994, p. 507):

Family

Annual Food Expenditure ($000)

Annual Income($000)

Family Size (number in family)

1

5.2

28

3

2

5.1

26

3

3

5.6

32

2

4

4.6

24

1

5

11.3

54

4

6

8.1

59

2

7

7.8

44

3

8

5.8

30

2

9

5.1

40

1

10

18.0

82

6

11

4.9

42

3

12

11.8

58

4

13

5.2

28

1

14

4.8

20

5

15

7.9

42

3

16

6.4

47

1

17

20.0

112

6

18

13.7

85

5

19

5.1

31

2

20

2.9

26

2


The Scattergram suggests the following conclusions:

1. The general pattern of the dots are upward rising; hence it can be concluded that the true causal relationship between family annual Food Expenditure (Dependent Variable) and Income (Independent Variable) is indeed Direct or Positive.

2. Because a straight line can be drawn through the dots such that some lie above it while others lie below it, the true relationship can be described as Linear in a mathematical sense. Hence it can be estimated using a linear equation of the form: Yi = A + BXi . This is an instance of a deterministic relationship by assuming that some uncertain factors do not influence family Food Expenditure besides Income.

3. Because the dots appear to cluster about such a straight line, the true Causal Relationship between the two variables is possibly strong (remember, only 20 data points are used in this example).

4. By fitting a straight line through the data using a linear equation of the type stated above and allowing for random disturbance term, the true relationship can be estimated or quantified using the classical Least Squares Method of estimation. The equation so estimated is called the Sample Regression Line (SRL), in contrast to the Population Regression Line (PRL). In the simple regression analysis, both the PRL and SRL constitute the core of the Classical Linear Regression Model (often denoted as CLRM or simply LRM. I prefer the latter). Note that when the context is clear, the terms regression line and regression equation are often used for sample regression line and sample regression equation, correspondingly.

Knowing the type of causal relationship is only one aspect of assessing the relationship between any two or more variables; this is what is accomplished with Regression Analysis. Knowing or measuring the degree/strength of the relationhsip is accomplished by undertaking Correlation Analysis, which is a separate statistical technique altogether. However, it is often performed in conjunction with the regression analysis for the sake of assessing both the type and the strength of relationship at the same time. Hence, most introductory statistics texts often present both techniques together.


Top or back to Simple Regression & Correlation Example or Home Page or Send me your Comments via E-mail


Copyright© 1996, Ebenge Usip, all rights reserved.
Last revised: Wednesday, July 10, 2013.