Petra Petrovics Correlation & Linear Regression in SPSS 4 th seminar
Types of dependence association between two nominal data mixed between a nominal and a ratio data correlation among ratio data
Correlation describes the strength of a relationship, the degree to which one variable is linearly related to another X (or X 1, X 2,, X p ): known variable(s) / independent variable(s) / predictor(s) Y: unknown variable / dependent variable causal relationship: X causes Y to change Regression shows us how to determine the nature of a relationship between two or more variables
Correlation Measures 1. Covariance 2. Coefficient of correlation 3. Coefficient of determination 4. Coefficient of rank correlation
1. Covariance A measure of the joint variation of the two variables; An average value of the product of the deviations of observations on 2 random variables from their sample means. C x, y x x y y n 1 ranges from - to + ; C = 0, when X and Y are uncorrelated; its sign shows the direction of correlation it doesn t measure the degree of relationship!!!
2. Coefficient of correlation r C s s x y = Σd d x 2 x d d y 2 y Pearson correlation A measure of how closely related two data series are. Its sign shows the direction of correlation It measures the strength of correlation 0 < r < 1 statistical dependence r = 0 X and Y are uncorrelated r = -1 negative r = 1 positive You can use only in case of linear relationship!
3. Coefficient of determination r 2 The square of the sample correlation coefficient between the outcomes and their predicted values. Measures the degree of correlation in percentage (%) It provides a measure of how well future outcomes are likely to be predicted by the model. Vary from 0 to 1. r 2 S S yˆ y =1- S S e y
Exercise 1 - Correlation File / Open / Employee data.sav Is there any relation between - current salary & - beginning salary? CORRELATION
Analyze / Correlate / Bivariate 0 I r I 0,3 weak dependence 0,3 I r I 0,7 medium-strong dependence 0,7 I r I 1 strong dependence r Shows direction and strength C Just direction! + -
Output Mean Std. Deviation N Current Salary $34,419.57 $17,075.661 474 Beginning Salary $17,016.09 $7,870.638 474 Current Salary Beginning Salary Current Salary Beginning Salary Pearson Correlation 1,880(**) Sig. (2-tailed),000 Sum of Squares and Cross-products 137916495436,340 55948605047,73 Covariance 291578214,45 118284577,27 N 474 474 Pearson Correlation,880(**) 1 Sig. (2-tailed),000 Sum of Squares and Cross-products 55948605047,73 29300904965,45 Covariance 118284577,27 61946944,96 N 474 474
Exercise 2 Multiple Correlation Is there any relation between the current salary previous experience (month) month since hire beginning salary? MULTIPLE CORRELATION
Analyze / Correlate / Bivariate 0 I r I 0,3 weak dependence 0,3 I r I 0,7 medium-strong dependence 0,7 I r I 1 strong dependence r Shows direction and strength C Just direction! + -
Output View Inverse relationship & weak dependence Direct relationship & strong dependence Current Salary Previous Experience (months) Months since Hire Beginning Salary r C Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N *. Correlation is significant at the 0.05 level (2-tailed). Correlations **. Correlation is significant at the 0.01 level (2-tailed). Matrix Previous Experience Months Beginning Current Salary (months) since Hire Salary 1 -,097*,084,880**,034,067,000 1,379E+011-82332343,5 6833347,5 5,59E+010 291578214,5-174064,151 14446,823 118284577 474 474 474 474 -,097* 1,003,045,034,948,327-82332343,54 5173806,810 1482,241 17573777-174064,151 10938,281 3,134 37153,862 474 474 474 474,084,003 1 -,020,067,948,668 6833347,489 1482,241 47878,295-739866,50 14446,823 3,134 101,223-1564,200 474 474 474 474,880**,045 -,020 1,000,327,668 55948605048 17573776,7-739866,5 2,93E+010 118284577,3 37153,862-1564,200 61946945 474 474 474 474 Inverse relationship Direct relationship
Linear regression y ŷ = b 0 + b 1 x b 1 : for every 1 unit increase in x we expect y to change by b 1 units on average b 0 : when x=0, y=b 0 x
Exercise 3 Linear Regression File / Open / Employee data.sav Determine a linear relationship between the salary and the age of the employees! Create a new variable!
Transform / Compute Variable Create a new variable: age = this year date of birth (in year) This year
Regression Analyze / Regression / Linear
Model 1 Model Summary Adjusted Std. Error of R R Square R Square the Estimate,146 a,021,019 $16,928.804 a. Predictors: (Constant), age Multiple correlation coefficient Adjusted multiple determination coefficient R r 2 y1 r 2 y2 2r 1 r y1 2 12 r y2 r It expresses the combined effect of all the variables acting on the dependent variable Weak dependence 12 R 2 n 1 1 (1 R n p 1 Multiple determination coefficient It enables to compare the How many percent of the multiple determination variation of the dependent coefficient among variable can be explained by populations / samples the variation of all the with different size and independent variables different number of dependent variables as it The dependent variable s control for the number of (current salary) variation is sample / population size explained in 2,1% by the (n) and the number of regression model independent variables (p) 2 )
F-test: for model testing We can accept the model in every significance level. The F ratio (in the Analysis of Variance Table) is 10.241 and significant at p=.001. This provides evidence of existence of a linear relationship between the variables
Model 1 b 0 b 1 (Constant) age Unstandardized Coefficients a. Dependent Variable: Current Salary The regression line: ŷ = b 0 + b 1 x Coefficients a b 0 : If the x variable is 0, how much is the y. Standardized Coefficients B Std. Error Beta t Sig. 41543,805 2358,686 17,613,000-211,609 66,124 -,146-3,200,001 If the employees are 0-year-old, they earn $41543,805 (It doesn t mean anything.) b 1 : If the x increases by 1 unit, what is the difference in y. We can accept the parameters at every significance level. If the employees are 1 year older, they earn less money with $211,609 on average.
Exercise 4 Curve Estimation File / Open / Employee data.sav Determine the relationship between the salary and the age of the employees! Which regression model fit the most?
Analyze / Regression / Curve Estimation Linear Compound Power To get a chart
Output View Linear Model Summary R R Square Adjusted R Square Std. Error of the Estimate,146,021,019 16928,804 The independent variable is age. Model Summary Compound The highest R 2 Power R R Square Adjusted R Square Std. Error of the Estimate,215,046,044,389 The independent variable is age. Model Summary R R Square Adjusted R Square Std. Error of the Estimate,156,024,022,393 The independent variable is age.
Also in the Output View Faculty of Economics
Weak dependence. The age has 4,6% influence on the current salary s variation The model is significant.
b a ŷ = a b x = 40482.362 0.993 x a: no analyzation The parameters are significant. b: When an employee is 1 year older, the current salary will be 0.993 times higher on average.
Thank You for Your Attention stgpren@uni-miskolc.hu