﻿ Scatter Matrix of family Food Expenditure, Income & Size, Dr. Usip, Economics

Scatterplot Matrix of family Food Expenditure, Income and Size

Motivation:
Oftentimes, it may not be realistic to conclude that only one factor or IV  influences the behavior of the DV.  In such situations, a researcher needs to carefully identify those possible factors and explicitly include them in the Linear Regression Model (LRM). Both the existing theory and common sense should constitute a basis for selecting the IVs; and where data on a theoretical variable is not readily available a proxy should be chosen carefully. Graphical assessment of both the type and the structure of correlation among the variables can be accomplished by using  the scatterplot matrix - a graphical device that consists of scatterplots for each pair of variables in the model.

Problem Description and Data
The maintained hypothesis is essentially similar to the Simple Regression & Correlation case where annual family Income was considered the only determinant of annual family Food Expenditure. The influence of other factors, such as family Size was assumed away as one of  the ceteris paribus factors. That assumption is relaxed here by hypothesizing that family Size (X2) also has a positive influence on the annual family Food Expenditure (Y) in addition to annual family Income (X1).   The Multiple Regression & Correlation Analysis attempts to measure and isolate the separate effects of X1 and X2 on Y, as well as determine if any relationship exists between X1 and X2 that might blur the their separate effects on Y.
The additional data on family size is as follows (Source: Hamburg et al., 1994, p. 507):

X2 (number in family)

 3 3 2 1 4 2 3 2 1 6 3 4 1 5 3 1 6 5 2 2

The Scatterplot matrix is an important graphical tool for screening the data to visually identify the following possibilities:

1. Type of relationship between the variables (a pair at a time) - Direct or Indirect

2. Form of relationship between the DV and the IVs - Linear or Nonlinear

3. Degree of relationship between any two variables - from Perfectly Strong and Direct (r = +1) to Perfectly Strong and indirect (r = -1). No relationship at all if r = 0

4. Presence/Detection of Outliers in the data set.

The above matrix suggests the following conclusions:

1. The relationship between annual family Food Expenditure and Size is Direct, Linear, and relatively Strong with possibly one OUTLIER.

2. The relationship between annual family Food Expenditure an Income is Direct, Linear, and relatively Strong with no apparent OUTLIER.

3. The relationship between family Size and annual Income is Direct, Linear, and Weak with one visible OUTLIER. Thus we should expect collinearity problem in the regression.

Quantitative assessment of both the type and the structure of correlation among the variables is the subject matter discussed under the multiple regression and correlation analysis.