Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Factor Analysis

Factor Analysis Factor analysis is a multiple statistical method, which analyzes the correlation relation between data, and it is for data reduction, dimension reduction and to explore the structure.

Aim of the Factor Analysis Its aim is to group variables into so-called factors in order to an easier interpretation or to avoid the multicollinearity, or to analyze the relation among variables.

Stages of Factor Analysis 1 2 3 4 5 6 7 General Purpose Assumptions - DESCRIPTIVES Factor Method - EXTRACTION Number of Factors - EXTRACTION Rotation - ROTATION Validity Tests Name and Characterization of Factors

General Purpose if we have too many variable in an analysis, we need to reduce them for an easier interpretation; to select a group of variables according to their relation to the principle component; to explore the structure of data, to learn the relation among variables; to identify groups of cases and/or outliers (in case of type Q factor analysis, see also later);

Assumptions 1. Measure of Variables We need metric variables for factor analysis, however we can apply dummy variables (with outcome 0 and 1), too. Naturally, variables measured on interval or ratio scale can be easier interpreted, because the nominal variables degrade the validity and interpretation.

Assumptions 2. Relations of Variables The analysis of correlation matrix is necessary because without correlation among variables it would not be possible to find variables with similar characteristics and classify them into a single factor. If the correlation coefficient is lower than 0.3, the assumption is violated. Significance level, Partial coefficients

Assumptions 3. Sample Size The higher is the sample size, the more significant is the analysis. However, the ratio of cases to variables is important, too. The number of cases for one variable should be as high as it is possible.

Assumptions 4. General Multivariate Assumptions Besides the correlation coefficients and their significance level, we need to address the assumptions for multiple regression analysis as a pre-assumption for factor analysis: normally distributed variables, homoskedasticity, and linearity are important to not violate the correlation assumption among variables.

Output 1. Correlatio n Correlation Matrix Previous Experienc Employm Educatio Minority Months since Hire gender1 e (months) ent Category nal Level Beginning Classificat (years) Salary ion Months since Hire 1,000,066,003,005,047 -,020,050 gender1,066 1,000,165,378,356,457,076 Previous Experience (months),003,165 1,000,063 -,252,045,145 Employment Category,005,378,063 1,000,514,755 -,144 Educational Level (years),047,356 -,252,514 1,000,633 -,133 Beginning Salary -,020,457,045,755,633 1,000 -,158 Minority Classification,050,076,145 -,144 -,133 -,158 1,000

Output 2. KMO and Bartlett's Test Kaiser-Mey er-olkin Measure of Sampling Adequacy.,686 Bartlett's Test of Sphericity Approx. Chi-Square df Sig. 887,501 21,000

Kaiser-Meyer-Olkin (KMO) Measure

Output 3. Anti-image Matrices Anti-image Covariance Anti-image Correlation Previous Experience (months) Educational Level (years) Minority Classificatio n Months since Hire gender1 Employmen t Category Beginning Salary Months since Hire,985 -,055 -,011 -,015 -,049,044 -,035 gender1 -,055,728 -,144 -,023 -,096 -,099 -,120 Previous Experience (months) -,011 -,144,812 -,042,245 -,067 -,103 Employment Category -,015 -,023 -,042,424 -,039 -,220,034 Educational Level (years) -,049 -,096,245 -,039,499 -,166,006 Beginning Salary,044 -,099 -,067 -,220 -,166,322,055 Minority Classification -,035 -,120 -,103,034,006,055,928 Months since Hire,356 a -,065 -,012 -,022 -,070,078 -,037 gender1 -,065,799 a -,187 -,041 -,160 -,204 -,146 Previous Experience (months) -,012 -,187,348 a -,071,386 -,132 -,119 Employment Category -,022 -,041 -,071,729 a -,084 -,595,054 Educational Level (years) -,070 -,160,386 -,084,709 a -,416,009 Beginning Salary,078 -,204 -,132 -,595 -,416,667 a,100

Output 4. Component 1 2 3 4 5 6 7 Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings Total % of Variance Cumulativ e % Total % of Variance Cumulativ e % Total % of Variance Cumulativ e % 2,601 37,163 37,163 2,601 37,163 37,163 2,589 36,979 36,979 1,292 18,451 55,614 1,292 18,451 55,614 1,294 18,489 55,468 1,028 14,688 70,302 1,028 14,688 70,302 1,038 14,833 70,302,876 12,511 82,812,602 8,605 91,418,385 5,507 96,925,215 3,075 100,000 Extraction Method: Principal Component Analy sis.

Number of Factors Retained A prior criterion Kaiser criterion total variance explained method Scree Plot Maximum-likelihood factor analysis

Scree Plot Outputok 5. 3,0 2,5 2,0 Eigenvalue 1,5 1,0 0,5 0,0 1 2 3 4 5 6 7 Component Number

Output 6. Goodness-of-fit Test Chi-Square df Sig. 1,016 3,797

Output 7. Rotated Component Matrix a Component 1 2 3 Beginning Salary,909 -,044 -,092 Employment Category,849 -,014 -,097 Educational Level (years),759 -,367,127 gender1,663,350,152 Previous Experience (months),063,841 -,177 Minority Classification -,146,573,350 Months since Hire,035 -,014,910 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 4 iterations.

Readings: Quantitative Information Forming Methods: http://elearning.infotec.hu/ilias.php?basecla ss=ilsahspresentationgui&ref_id=2774 Naresh K. Malhotra: Marketingkutatás Budapest, 2005. Sajtos-Mitev: SPSS adatelemzési és kutatási kézikönyv

Thank you for your attention! email: strolsz@uni-miskolc.hu