Problem Description and Data1
This example addressees the question: Does a meaningful relationship exist between education level and salary of the employees? Both are categorical variables: Education is inherently categorical with the nominal values HS, BA, and MBA, and salary by category (salcat) is logically created with the values high and low - high if an employees makes over $35, 000, and low if she/he makes below or equal to $35,000 (see the employees.sav data file). To research this question, we begin with the maintained premise (H0) that a person’s salary is not related to his/her education level; in which case, the two categorical variables are statistically independent. This null is to be tested against the alternative statement (HA) that a person’s salary is indeed related to her/his education level; in which case, the two variables are not statistically independent.
The crosstab table below summarizes the relations Salary and Education level of the 20 Neku employees (see page 42 of the SPSS/win manual, 5th edition, for details). The test is to be done at the significance level of alpha = .05.
Discussion of the Outputs/Results and Testing Procedure
The complete SPSS/win outputs are as follows:
|Gender * Education Level||20||100.0%||0||
|Employees with HS||Employees with BA||Employees with MBA
|Salary Group||high: salary > $35000||Count||0||5||5||10|
|low: salary less than or equal to $35000||Count||8||2||0||10|
|Value||df||Asymp. Sig. (2-sided)|
|N of Valid Cases||
|a 6 cells (100.0%) have expected count less than 5. The minimum expected count is 2.50.|
The first table (Case Processing Summary), contains both the valid and the total number and percent of cases used in the study. The middle table is actually a 2 by 3 crosstabulation table reported earlier in chapter 2 (figure 19, page 44). The only exception now is that each of the six cells (i,j) [for i = 1,2 and j = 1, 2, 3], contains the expected frequencies (fe,i,j) in addition to the observed frequencies (fo,i,j). For instance, in cell (1,1): fo,1,1 = 0 (i.e., no employees in the high salary group is observed to hold a high school diploma), and fe,1,1 = 4 (i.e., 4 employees with a high school diploma would be expected in the high salary group if no consistent considerations were given to the salary and education level of the applicants during the hiring process (in which case, the two variables are not related). Similarly, in cell (2,2): f0,2,2 = 2 (i.e., 2 employees with a BA diploma are actually observed to be in the low salary group), and fe,2,2 = 3.5 (i.e., 3.5 employees with a BA diploma would be expected in the low salary group if no consistent considerations were given to the salary and education level of the applicants during the hiring process (in which case, again, the two variables are not related).
The expected frequencies are derived from the rule fe,i,j = (ri x cj)/n, where ri is the sum of all the observed frequencies in an ith row (e.g, r1 = 10 ); and cj is the sum of all the observed frequencies in the jth column (e.g, c1 = 8). Thus, the expected frequency in cell (1,1) is given as fe,1,1 = (10x8)/20 = 4; similarly, the expected frequency in cell (2,2) is given as fe,2,2 = (10x7)/20 = 3.5.
Note: The above computational rule, fe,i,j = (ri x cj)/n, is derived typically from the joint and marginal probabilities by positing that P(Ri Ç Cj) = P(Ri) x P(Cj) if the null (H0) is indeed true. Ri denotes the marginal event that a randomly selected employee belongs in an ith salary group, Cj denotes the marginal event that a randomly selected person holds a jth type of diploma, and (Ri Ç Cj) denotes the joint event that a randomly selected person is in the ith salary group and holds a jth diploma. So for the cell (1,1), statistical independence of R1 and C1and under H0 implies that
fe,1,1 = n[P(R1 Ç C1)] = n[P(R1) x P(C1)] = n[ (r1 /n x c1/n ] = (r1 x c1)/n = 4.
Both the marginal and the joint probabilities reported earlier in figure 2 of chapter 4 (see page 67) can also be used to verify this result. Since P(R1) º P(high salary) = .50, and P(C1) º P(HS) = .40, it follows that: fe,1,1 = n[P(R1) x P(C1)] = 20[.50 x .40] = 4.
The third table (Chi-Square Tests) contains the results for completing the test: (1) the computed/observed X2 value (X2ov) = 14.286 (Pearson Chi-Square), and (2) the corresponding degrees of freedom (v) is reported as v = 2 under the df column; the value of “v” is given by the rule v = (R-1)(C-1) = (2- 1)(3-1) = 2, where R and C denote, respectively, the total number of rows and columns that define the size of the contingency table.
At the significance level of 5%, the critical X2 value (X2cv) from the Chi-Square table is given as X2cv = X2.05,2 = 5.991. Since X2ov = 14.286 is greater than X2cv = 5.991, the valid statistical conclusion is therefore to reject H0 in favor of HA. Thus, the observed association between the two categorical variables could not have occurred by chance; it is indeed real and significantly so. As observed earlier in chapter 2, the nature of their association is such that high (low) salary is directly related to high (low) education level. The significance of these conclusions is further supported by the reported p-value of .001; it indicates that H0 can be rejected even at 1% level of significance.
The policy implication of the study is that, regardless of
gender (and, possibly, other attributes), people with higher level of education
tend to be genuinely rewarded with higher salary for the investment that they
make to develop their human capital through education.
Top or Back to Hypothesis Testing or Learning Statistics with SPSS/win or Home Page or Send me your Comments via E-mail