Correlation & Linear Regression in SPSS Petra Petrovics PhD Student
Types of dependence association between two nominal data mixed between a nominal and a ratio data correlation among ratio data
Exercise 1 - Correlation File / Open / Employee data.sav Is there any relation between - current salary & - beginning salary? CORRELATION
Analyze / Correlate / Bivariate 0 < I r I<0,3 weak dependence 0,3 < I r I< 0,7 medium-strong dependence r 0,7 < I r I< 1 strong dependence Shows direction and strength C Just direction! + -
Output Mean Std. Deviation N Current Salary $34,419.57 $17,075.661 474 Beginning Salary $17,016.09 $7,870.638 474 Current Salary Beginning Salary Current Salary Beginning Salary Pearson Correlation 1,880(**) Sig. (2-tailed),000 Sum of Squares and Cross-products 137916495436,340 55948605047,73 Covariance 291578214,45 118284577,27 N 474 474 Pearson Correlation,880(**) 1 Sig. (2-tailed),000 Sum of Squares and Cross-products 55948605047,73 29300904965,45 Covariance 118284577,27 61946944,96 N 474 474
Exercise 2 Multiple Correlation Is there any relation between the current salary previous experience (month) month since hire beginning salary? MULTIPLE CORRELATION
Analyze / Correlate / Bivariate 0 < I r I<0,3 weak dependence 0,3 < I r I< 0,7 medium-strong dependence r 0,7 < I r I< 1 strong dependence Shows direction and strength C Just direction! + -
Output View Current Salary r C Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Correlations Matrix Previous Experience Months Beginning Current Salary (months) since Hire Salary 1 -,097*,084,880**,034,067,000 1,379E+011-82332343,5 6833347,5 5,59E+010 Covariance 291578214,5-174064,151 14446,823 118284577 N 474 474 474 474 Inverse Previous Experience Pearson Correlation -,097* 1,003,045 (months) Sig. (2-tailed),034,948,327 relationship Sum of Squares and -82332343,54 5173806,810 1482,241 17573777 Cross-products & weak dependence Direct relationship & strong dependence Months since Hire Beginning Salary Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N *. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed). -174064,151 10938,281 3,134 37153,862 474 474 474 474,084,003 1 -,020,067,948,668 6833347,489 1482,241 47878,295-739866,50 14446,823 3,134 101,223-1564,200 474 474 474 474,880**,045 -,020 1,000,327,668 55948605048 17573776,7-739866,5 2,93E+010 118284577,3 37153,862-1564,200 61946945 474 474 474 474 Inverse relationship Direct relationship
Exercise 3 Rank Correlation Ten students were ranked by their mathematical and musical ability: Student Ability A B C D E F G H I J Mathematics 1 2 3 4 5 6 7 8 9 10 Music 3 4 1 2 5 7 10 6 8 9
Analyze / Correlate / Bivariate
2 6 di = 1-2 n (n 1) 6 32 = 1-10 (10-1) ρ 2 Strong relationship. = 0.806
Linear regression y ŷ = b 0 + b 1 x b 1 : for every 1 unit increase in x we expect y to change by b 1 units b 0 : when x=0, y=b 0 x
Exercise 4 Linear Regression File / Open / Employee data.sav Determine a linear relationship between the salary and the age of the employees! Create a new variable!
Create a new variable: age = this year date of birth (in year) Transform / Compute Variable This year
Regression Analyze / Regression / Linear
R= Model 1 r Model Summary Adjusted Std. Error of R R Square R Square the Estimate,146 a,021,019 $16,928.804 a. Predictors: (Constant), age Multiple correlation coefficient 2 y1 + r 2 y2 2r 1 r y1 2 12 r y2 r It expresses the combined effect of all the variables acting on the dependent variable 12 Weak dependence Adjusted multiple determination coefficient R 2 n 1 = 1 (1 R n p 1 Multiple determination coefficient It enables to compare the How many percent of the multiple determination variation of the dependent coefficient among variable can be explained by populations / samples with the variation of all the different size and different independent variables number of dependent variables as it control for The dependent variable (current salary) is the number of sample / population size (n) and the explained in 2,1% by the number of independent regression model variables (p) 2 )
F-test: for model testing We can accept the model in every significance level. The F ratio (in the Analysis of Variance Table) is 10.241 and significant at p=.001. This provides evidence of existence of a linear relationship between the variables
Model 1 b 0 b 1 (Constant) age Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig. 41543,805 2358,686 17,613,000 a. Dependent Variable: Current Salary The regression line: ŷ = b 0 + b 1 x -211,609 66,124 -,146-3,200,001 b 0 : If the x variable is 0, how much is the y. We can accept the parameters at every significance level. If the employees are 0-year-old, they earn $41543,805 (It doesn t mean anything.) b 1 : If the x increases by 1 unit, what is the difference in y. If the employees are 1 year older, they earn less money with $211,609.
Exercise 5 - Multiple Regression Determine the characteristics of the current salary in relation with the age, education level, beginning salary, month since hire and previous experience. y = current salary x 1 = age x 2 = education level (years) x 3 = beginning salary x 4 = month since hire x 5 = previous experience
Analyze / Regression / Linear y x
Output View
Output View yˆ = 13462.743 103,049 x 1+ 631.920 x2 + 1.771x3+ 166.444 x4 8. 301 x 5
Thank you for your attention!