Correlation & Linear Regression in SPSS
Types of dependence association between two nominal data mixed between a nominal and a ratio data correlation among ratio data
Exercise 1 - Correlation File / Open / Employee data.sav Is there any relation between - current salary & - beginning salary? CORRELATION
Analyze / Correlate / Bivariate 0 I r I 0,3 weak dependence 0,3 I r I 0,7 medium-strong dependence 0,7 I r I 1 strong dependence r Shows direction and strength C Just direction! + -
Output Mean Std. Deviation N Current Salary $34,419.57 $17,075.661 474 Beginning Salary $17,016.09 $7,870.638 474 Current Salary Beginning Salary Current Salary Beginning Salary Pearson Correlation 1,880(**) Sig. (2-tailed),000 Sum of Squares and Cross-products 137916495436,340 55948605047,73 Covariance 291578214,45 118284577,27 N 474 474 Pearson Correlation,880(**) 1 Sig. (2-tailed),000 Sum of Squares and Cross-products 55948605047,73 29300904965,45 Covariance 118284577,27 61946944,96 N 474 474
Exercise 2 Multiple Correlation Is there any relation between the current salary previous experience (month) month since hire beginning salary? MULTIPLE CORRELATION
Analyze / Correlate / Bivariate r Shows direction and strength 0 I r I 0,3 weak dependence 0,3 I r I 0,7 medium-strong dependence 0,7 I r I 1 strong dependence C Just direction! + -
Output View Inverse relationship & weak dependence Direct relationship & strong dependence Current Salary Previous Experience (months) Months since Hire Beginning Salary r C Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Correlations *. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed). Matrix Previous Experience Months Beginning Current Salary (months) since Hire Salary 1 -,097*,084,880**,034,067,000 1,379E+011-82332343,5 6833347,5 5,59E+010 291578214,5-174064,151 14446,823 118284577 474 474 474 474 -,097* 1,003,045,034,948,327-82332343,54 5173806,810 1482,241 17573777-174064,151 10938,281 3,134 37153,862 474 474 474 474,084,003 1 -,020,067,948,668 6833347,489 1482,241 47878,295-739866,50 14446,823 3,134 101,223-1564,200 474 474 474 474,880**,045 -,020 1,000,327,668 55948605048 17573776,7-739866,5 2,93E+010 118284577,3 37153,862-1564,200 61946945 474 474 474 474 Inverse relationship Direct relationship
Assumptions of Pearson s Correlation Coefficient Variables should be measured at the interval or ratio level There needs to be a linear relationship between the two variables There should be no significant outliers Variables should be approximately normally distributed
Rank-correlation Spearman rank order correlation coefficient is a nonparametric measure of the strength and of the direction of relation between two variables measured on at least an ordinal scale.
Exercise 3 Rank Correlation Ten students were ranked by their mathematical and musical ability: Student Ability A B C D E F G H I J Mathematics 1 2 3 4 5 6 7 8 9 10 Music 3 4 1 2 5 7 10 6 8 9
Analyze / Correlate / Bivariate
6 1- n (n d 2 i ρ 2 2 1) 6 32 1-10 (10-1) Strong relationship. 0.806
Linear regression y ŷ = b 0 + b 1 x b 1 : for every 1 unit increase in x we expect y to change by b 1 units b 0 : when x=0, y=b 0 x
Exercise 4 Linear Regression File / Open / Employee data.sav Determine a linear relationship between the salary and the age of the employees! Create a new variable!
Create a new variable: age = this year date of birth (in year) This year Transform / Compute Variable
Analyze / Regression / Linear Regression
Model 1 Model Summ ary Adjusted Std. Error of R R Square R Square the Estimate,146 a,021,019 $16,928.804 a. Predictors: (Constant), age Multiple correlation coefficient Adjusted multiple determination coefficient R r 2 y1 r 2 y2 2r 1 r y1 2 12 r y2 r It expresses the combined effect of all the variables acting on the dependent variable Weak dependence 12 Multiple determination coefficient How many percent of the variation of the dependent variable can be explained by the variation of all the independent variables The dependent variable (current salary) is explained in 2,1% by the regression model R 2 n 1 1 (1 R n p 1 It enables to compare the multiple determination coefficient among populations / samples with different size and different number of dependent variables as it control for the number of sample / population size (n) and the number of independent variables (p) 2 )
F-test: for model testing We can accept the model in every significance level. The F ratio (in the Analysis of Variance Table) is 10.241 and significant at p=.001. This provides evidence of existence of a linear relationship between the variables
Model 1 b 0 b 1 (Constant) age Unstandardized Coefficients Coefficie nts a Standardized Coefficients B Std. Error Beta t Sig. 41543,805 2358,686 17,613,000 a. Dependent Variable: Current Salary The regression line: ŷ = b 0 + b 1 x -211,609 66,124 -,146-3,200,001 b 0 : If the x variable is 0, how much is the y. If the employees are 0-year-old, they earn $41543,805 (It doesn t mean anything.) b 1 : If the x increases by 1 unit, what is the difference in y. We can accept the parameters at every significance level. If the employees are 1 year older, they earn less money with $211,609.
Exercise 5 - Multiple Regression Determine the characteristics of the current salary in relation with the age, education level, beginning salary, month since hire and previous experience. y = current salary x 1 = age x 2 = education level (years) x 3 = beginning salary x 4 = month since hire x 5 = previous experience
Analyze / Regression / Linear y x
Output View
Output View yˆ 13462.743 103,049x 1 631.920x2 1.771x3 166.444x4 8. 301 x 5
Thanks for your attention