Statistical Dependence Petra Petrovics
Statistical Dependence Deinition: Statistical dependence exists when the value o some variable is dependent upon or aected by the value o some other variable. Independent variables Statistical Dependence/ Stochastic Dependence Functional relation
Types o Dependence association between two nominal data Yule (Y) Csuprov (T) Cramer (C) or (V) mixed between a nominal and a ratio data H; H or η; η correlation among ratio data
a) Yule-measure I. Association B (1) B (0) Total A (1) 11 10 1. A (0) 01 00 0. Total.1.0 n Where: 11, 10, 01, 00 the observed requencies 1., 0.,.1,.0 the marginal requencies Y 11 11 00 00 10 10 01 01 Only when the number o categories o both variables is two! Y = 0 the variables are independent 0 Y 1 statistical dependence Y = 1 unctional relation
In case o statistical dependence: 11 01 11 00 10 10 00 01 I the variables are independent: 11 01 11 00 10 01 11 00 10 01 0 10 00
Example: Suppose that a certain subect is oered to irst year and second year students on a pass-ail basis only. An advisor is interested in determining whether there is a relationship between the student s grade and year. Data or the test were obtained rom last semester s classes: Grade First year (1) Class standings Second year (0) Total Pass (1) 8 1 0 Fail (0) 10 70 80 Total 18 8 100 Medium-strong 8 70 1 10 440 dependence Y 8 70 1 10 680 0.65
b) Contingency table there are s categories o the row/column variable: A 1, A,, A s there are t categories o the row/column variable: B 1, B,, B t where s < t i B 1 B... B... B t A 1 11 1... 1... 1t 1. A 1...... t......................... A i i1 i... i... it i......................... A s s1 s... s... st s..1.........t n
The measure or statistical dependence in case o contingency table T measure, when s = t T n χ s - 1 t - 1 where C measure, when s < t χ s i1 t 1 i - i i T C where T 4 max T max 0 C 0.3 weak dependence 0.3 C 0.7 medium-strong dependence 0.7 C 1 strong dependence s t - 1-1
The variables are independent, when i i. i.. i. or i.e. i n n n. i. i expected requencies or case o independence i. n.
Example A manuacturer o printed circuit boards has determined that boards classiied as nonconorming nearly always have one o three deects: a component on the board is either missing, damaged or raised (installed improperly). The boards are produced on three machines (A, B and C). To determine whether there is a relationship between the type o nonconormity and the machine, a sample o 500 nonconorming boards was obtained:
Machine Type o nonconormity missing damaged raised Total A 50 80 10 50 B 60 55 10 15 C 65 45 15 15 Total 175 180 145 500 Question: Is the type o nonconormity related to the machine used or production? s=3 t=3 T-measure
Type o nonconormity and machine Solution Missing, A 50 87.50 16.071 Missing, B 60 43.75 6.036 Missing, C 65 43.75 10.31 Damaged, A 80 90.00 1.111 Damaged, B 55 45.00. Damaged, C 45 45.00 0.000 Raised, A 10 7.50 31.11 Raised, B 10 36.5 19.009 Raised, C 15 36.5 1.457 Total 500 500.00 = 98.35 T 500 98.35 3-1 3-1 i 0.3136 i i - i i Medium-strong dependence
Exercise Is there any relationship between: - gender & employment category; - gender & current salary? Association Mixed
Association File / Open / Data / Employee data Analyze / Descriptive Statistics / Crosstabs: Gender Employment category i * i
Output View Symmetric Measures Nominal by Nominal N o Valid Cases Phi Cramer's V a. Not assuming the null hy pothesis. b. Value Approx. Sig.,409,000,409,000 474 Using t he asy mptotic standard error assuming the null hy pothesis. 0 C 0,3 weak dependence 0,3 C 0,7 medium-strong dependence 0,7 C 1 strong dependence There is a medium-strong dependence between gender & employment category. We can accept that statement at every signiicance
Output View 4.6% o women are manager. 33.1% o people are male and clerical. The custodials are men. Number o people / cases
Mixed dependence Analysis o Variance One-way analysis o variance is a technique used to compare means o two or more samples. In case o a qualitative and a quantitative variable.
Dierences - variances d i total dierence: dierence between an employee s production and the grand mean d i x i W i within-column dierence: dierence between an employee s production and his group s mean W i x i - x B i between-column dierence: dierence between the group s mean and the grand mean - x B x - x
i d i = W i + B x - x x - x x - x i i i SS = SS W + SS B = W + B
Measures o mixed dependence SS SS σ σ B H B or Where: H = H = 0 the variables are independent H = H = 1 unctional relation 0 H 1 0 H 0.3 weak dependence SS SS 0.3 H 0.7 medium-strong dependence 0.7 H 1 strong dependence 0 H 1 σ σ B B H Statistical dependence
Example Marks I. II. III. Faculty Total Excellent (5) 0 0 0 60 Good (4) 30 50 40 10 Medium (3) 5 35 55 115 Satisactory() 0 35 80 135 Fail (1) 0 5 0 5 Total 95 145 15 455 Is there any dependence between the average marks and aculties?
σ σ σ W B Faculties Gazdaságtudományi Kar n x Faculty I. 95 3.53 1.09 Faculty II. 145 3.31 1.18 Faculty III. 15.81 1.7 Total 455 3.1 1.9 σ n B n σ n σ W 951.09 1451.18 151.7 455 H n x 953.531453.31 15.81 or x 3. 1 n 455 σ 0.09 1.9 B 1.9-1. 0.09 0.641 H 1. x x 953.53 3.1 1453.31 3.1 15.81 3.1 n 455 6.81% σ 0.09
Mixed dependence Analyze / Compare Means / Means
Output View Current Salary Gender Female Male Total Report Mean N Std. Dev iation $6,031.9 16 $7,558.01 $41,441.78 58 $19,499.14 $34,419.57 474 $17,075.661 This table shows you the central tendency & dispersion o the dependent variable (current salary) grouped by the independent variable (gender). ANOVA Table Current Salary * Gender Between Groups Within Groups Total (S) (S B ) (Combined) (S K ) Sum o Squares d Mean Square F Sig.,8E+010 1,79E+010 119,798,000 1,1E+011 47 33046530,5 1,4E+011 473 Measures o Association Current Salary * Gender H Eta Eta Squared,450,0 H S S K %; proportion o variance in the dependent variable explained by dierences among groups
Thanks or your attention! strolsz@uni-miskolc.hu