**Motivation:** Oftentimes, it may not be realistic to conclude that only one factor or
IV influences the behavior of the DV. In such situations, a researcher needs to
carefully identify those other possible factors and explicitly include them in the Linear
Regression Model (LRM). Existing economic theory or common sense should constitute a basis
for selecting the IVs; and where data on a theoretically construed variable is not readily
available a proxy should be carefully chosen.

This tutorial will illustrate the key steps involved in
using multiple regression and correlation to solve real world problems. The example will
consider a multiple LRM which typically has the form:

Y_{i} =A +B_{1}X_{i,1}+ B_{2}X_{i,2}+ ... + B_{j}X_{i,j} +
E_{i}where X_{j}^{s} are the IVs; A,
B_{j} (j = 1, 2, ..., K) are the regression parameters or coefficients and reflect
the partial effect of the associated IV, holding the effects of all other IVs constant; K
is the number of IVs in the model; and E_{i} is the random error term.
Again, note that in **regression analysis**, all of the
underlying **classical assumptions**
essentially apply to this random error term. In multiple
regression the three most crucial ones are the assumptions of no **multicollinearity** among the
IVs, of no **heteroskedasticity**
in the error variances, and of no **autocorrelation**
in the errors for all i.

**Step 1: Formulate the LRM and State the
Expected Signs
of the Regression
Parameters**

When specifying a LRM theory or common sense should be your guide in stipulating, a priori, the expected signs of the regression parameters.

Let us return to the family food expenditure example that we introduced in the simple regression tutorial. In that tutorial, the only factor that was explicitly identified as the predictor of annual family

Y

with a Scatterplot Matrix.

It is always advisable to do some exploratory analysis of the data to uncover inherent patterns as to the type and strength of relationship among the variables as well as the presence of outliers in the data. The scatterplot matrix is a useful graphical device for doing so. While a strong linear association between the DV and each of the IV is highly desirable, a strong linear association between (or among) the IVs is highly undesirable since it is indicative of the presence of

For this example, the data set
for the simple regression analysis has been augmented to include data on **X _{2}**.
The results of the preliminary analysis of the data are discussed separately in the
scatterplot matrix component.

After studying the results for reasonable inferences, the next phase of the data analysis is to estimate the LRM. Estimating the embedded parameters of the population regression plane (PRP) is accomplished by fitting the sample regression plane (SRP) to a sample of data on all the variables of the model.

Step 3: Estimate the SRP

Again, the estimation method is the classical

y

Note that y

The OLS method is programmed into the SPSS/win statistical package. Using the command sequence presented earlier will automatically implements this method. The following outputs contain the necessary results which are based on selected options that are accessible via the 'Statistics...' button.

Mean | Std. Deviation | N | |
---|---|---|---|

Annual Food Expenditure ($000) | 7.965 | 4.664 | 20 |

Annual Income ($000) | 45.50 | 23.96 | 20 |

Family Size | 2.95 | 1.61 | 20 |

Annual Food Expenditure ($000) | Annual Income ($000) | Family Size | ||
---|---|---|---|---|

Pearson Correlation | Annual Food Expenditure ($000) | 1.000 | .946 | .787 |

Annual Income ($000) | .946 | 1.000 | .676 | |

Family Size | .787 | .676 | 1.000 | |

Sig. (1-tailed) | Annual Food Expenditure ($000) | . | .000 | .000 |

Annual Income ($000) | .000 | . | .001 | |

Family Size | .000 | .001 | . | |

N | Annual Food Expenditure ($000) | 20 | 20 | 20 |

Annual Income ($000) | 20 | 20 | 20 | |

Family Size | 20 | 20 | 20 |

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate | Durbin-Watson |
---|---|---|---|---|---|

1 | .967(a) | .935 | .927 | 1.261 | 2.616 |

a Predictors: (Constant), Family Size , Annual Income ($000) | |||||

b Dependent Variable: Annual Food Expenditure ($000) |

Model | Sum of Squares | df | Mean Square | F | Sig. | |
---|---|---|---|---|---|---|

1 | Regression | 386.313 | 2 | 193.156 | 121.470 | .000(a) |

Residual | 27.033 | 17 | 1.590 | |||

Total | 413.346 | 19 | ||||

a Predictors: (Constant), Family Size , Annual Income ($000) | ||||||

b Dependent Variable: Annual Food Expenditure ($000) |

Unstandardized Coefficients | Standardized Coefficients | t | Sig. | |||
---|---|---|---|---|---|---|

Model | B | Std. Error | Beta | |||

1 | (Constant) | -1.118 | .655 | -1.708 | .106 | |

Annual Income ($000) | .148 | .016 | .761 | 9.049 | .000 | |

Family Size | .793 | .244 | .273 | 3.245 | .005 | |

a Dependent Variable: Annual Food Expenditure ($000) |

Minimum | Maximum | Mean | Std. Deviation | N | |
---|---|---|---|---|---|

Predicted Value | 3.232 | 20.240 | 7.965 | 4.509 | 20 |

Residual | -2.586 | 2.206 | 1.110E-16 | 1.193 | 20 |

Std. Predicted Value | -1.050 | 2.722 | .000 | 1.000 | 20 |

Std. Residual | -2.051 | 1.750 | .000 | .946 | 20 |

a Dependent Variable: Annual Food Expenditure ($000) |

**Step 4: Discuss the Results and Summarize your
Findings
Similar to the presentation in the simple regression tutorial, I will discuss the
results in the order in which SPSS/win generates the outputs beginning with the
descriptive statistics tables. This approach permits a critical analysis of the all
results and their implications.
**

1.

a)

3.

a) The sample mean of 2.95 means that an average family comprised of about 3 persons during the year.

b) The sample standard deviation of 1.61 (or 2) means that there were between 1 and 5 members in approximately 68.3% of the families during the year.

**II. Correlations Analysis
**This table contains the

**III. Model Summary and Evaluation with S _{e},
R, R^{2}, and DW Statistics**From the 'Coefficients' table, the OLS method produces the following
estimated SRP:

From the 'Model Summary' table

**IV. ANOVA Table: Testing the Significance of the
Model
**The summary measures reported here are used in the partitioning of the the
total variation in the DV according to the identity relation

: Some authors use RSS (regression sum of squares) instead of ESS (explained sum of squares), and ESS (error sum of squares) instead of RSS (residual sum of squares) so that the identity is stated as TSS = RSS + ESS. So pay attention to how these acronyms are defined.

The null hypothesis (

From the ANOVA table, under the df column,

**V. Coefficients Table: T-Test of the
Significance of the Regression Coefficients**

This table contains the estimated regression coefficients (a = -1.118, b_{1} =
.148, and b_{2} = .973); hence, the estimated SRP/equation can be written as
. The estimated coefficients
have the following interpretations:

1.** a = -1.118** has no interpretable meaning
because the average level of family Food expenditure could not be negative even when no
member of the is gainfully employed. Moreover, it is unrealistic to think of the existence
a family that has no income and member and yet incurs expenditure on food.
Nonetheless, this value should not be discarded; it plays an important role when using the
estimated regression line/equation for prediction.

2. **b _{1} = .148** represents the
partial effect of annual family Income on Food Expenditure, holding family Size constant.
The estimated positive sign implies that such effect is positive while the absolute value
implies that Food Expenditure would increase by $148 for every $1000 increase in Income.

3.

4.

5. S

The standardized coefficients are useful for determining the relative importance of the IVs the model. In effect, the importance of IVs can ranked according to the size (i.e., the absolute value) of the beta coefficients. In this example, the beta coefficient for income

Suppose we had included a third IV (

As part of investigating the accuracy of the fitted SRP, it is often useful to verify both the statistical significance and the sign (i.e., economic significance) of the regression parameters/coefficients (B

With respect to income, the null is

An interesting variation of the t-test is to verify the economic significance of the parameter with respect to the direction of causality of the associated IV. In this case, the null is phrased as

Consider, for example, family size where the sign of

Note that in the test for economic significance of a parameter the alpha value is not divided by two since this is always a one-tailed test; whereas, it is divided by 2 in the test for statistical significance since this is always a two-tailed test.

Suppose a typical or

**Top** **or Return to Regression & Correlation Analysis or Learning Statistics with SPSS/win**
**or Home Page or Send me your Comments via E-mail.**

**Copyright© 1996, Ebenge Usip, all rights reserved.
Last revised:
Wednesday, July 10, 2013.**