Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Cluster analysis in SPSS

Hasonló dokumentumok
Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Factor Analysis

Correlation & Linear Regression in SPSS

Correlation & Linear Regression in SPSS

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Linear. Petra Petrovics.

Statistical Dependence

Klaszterelemzés az SPSS-ben

Klaszterelemzés az SPSS-ben

Klaszterezés, 2. rész

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Nonparametric Tests

Descriptive Statistics

Statistical Inference

Cluster Analysis. Potyó László

Quantitative Statistical Methods

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Nonparametric Tests. Petra Petrovics.

Sztochasztikus kapcsolatok

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Regression

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Hypothesis Testing. Petra Petrovics.

Angol Középfokú Nyelvvizsgázók Bibliája: Nyelvtani összefoglalás, 30 kidolgozott szóbeli tétel, esszé és minta levelek + rendhagyó igék jelentéssel

Választási modellek 3

A jövedelem alakulásának vizsgálata az észak-alföldi régióban az évi adatok alapján

A rosszindulatú daganatos halálozás változása 1975 és 2001 között Magyarországon

Gazdaságtudományi Kar. Gazdaságelméleti és Módszertani Intézet. Logistic regression. Quantitative Statistical Methods. Dr.

ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA I. VIZSGÁZTATÓI PÉLDÁNY

Construction of a cube given with its centre and a sideline

FÖLDRAJZ ANGOL NYELVEN GEOGRAPHY

Decision where Process Based OpRisk Management. made the difference. Norbert Kozma Head of Operational Risk Control. Erste Bank Hungary

KIEGÉSZÍTŽ FELADATOK. Készlet Bud. Kap. Pápa Sopr. Veszp. Kecsk Pécs Szomb Igény

FÖLDRAJZ ANGOL NYELVEN

Tudományos Ismeretterjesztő Társulat

Bird species status and trends reporting format for the period (Annex 2)

Statisztikai hipotézisvizsgálatok. Paraméteres statisztikai próbák

Esetelemzések az SPSS használatával

NYOMÁSOS ÖNTÉS KÖZBEN ÉBREDŐ NYOMÁSVISZONYOK MÉRÉTECHNOLÓGIAI TERVEZÉSE DEVELOPMENT OF CAVITY PRESSURE MEASUREMENT FOR HIGH PRESURE DIE CASTING

ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA I. VIZSGÁZTATÓI PÉLDÁNY

Minta ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA II. Minta VIZSGÁZTATÓI PÉLDÁNY

Lopocsi Istvánné MINTA DOLGOZATOK FELTÉTELES MONDATOK. (1 st, 2 nd, 3 rd CONDITIONAL) + ANSWER KEY PRESENT PERFECT + ANSWER KEY

ANGOL NYELVI SZINTFELMÉRŐ 2013 A CSOPORT. on of for from in by with up to at

ACTA CAROLUS ROBERTUS. Károly Róbert Főiskola Gazdaság és Társadalomtudományi Kar tudományos közleményei Alapítva: (1)

Gottsegen National Institute of Cardiology. Prof. A. JÁNOSI

24th October, 2005 Budapest, Hungary. With Equal Opportunities on the Labour Market

ENROLLMENT FORM / BEIRATKOZÁSI ADATLAP

Smaller Pleasures. Apróbb örömök. Keleti lakk tárgyak Répás János Sándor mûhelyébõl Lacquerware from the workshop of Répás János Sándor

A modern e-learning lehetőségei a tűzoltók oktatásának fejlesztésében. Dicse Jenő üzletfejlesztési igazgató

Supporting Information

Tudok köszönni tegezve és önözve, és el tudok búcsúzni. I can greet people in formal and informal ways. I can also say goodbye to them.

Computer Architecture

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

FAMILY STRUCTURES THROUGH THE LIFE CYCLE

Revenue Stamp Album for Hungary Magyar illetékbélyeg album. Content (tartalom) Documentary Stamps (okmánybélyegek)

Mapping Sequencing Reads to a Reference Genome

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Introduction to Multiple Correlation

Tudományos Ismeretterjesztő Társulat

STUDENT LOGBOOK. 1 week general practice course for the 6 th year medical students SEMMELWEIS EGYETEM. Name of the student:

discosnp demo - Peterlongo Pierre 1 DISCOSNP++: Live demo

First experiences with Gd fuel assemblies in. Tamás Parkó, Botond Beliczai AER Symposium

Széchenyi István Egyetem

USER MANUAL Guest user

Tavaszi Sporttábor / Spring Sports Camp május (péntek vasárnap) May 2016 (Friday Sunday)

Cashback 2015 Deposit Promotion teljes szabályzat

Sebastián Sáez Senior Trade Economist INTERNATIONAL TRADE DEPARTMENT WORLD BANK

KISTERV2_ANOVA_

OLYMPICS! SUMMER CAMP

A BÜKKI KARSZTVÍZSZINT ÉSZLELŐ RENDSZER KERETÉBEN GYŰJTÖTT HIDROMETEOROLÓGIAI ADATOK ELEMZÉSE

7 th Iron Smelting Symposium 2010, Holland

3. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Hogyan használja az OROS online pótalkatrész jegyzéket?

BIOMETRIA_ANOVA_2 1 1

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

Esetelemzés az SPSS használatával

Étkezési búzák mikotoxin tartalmának meghatározása prevenciós lehetıségek

Mr. Adam Smith Smith's Plastics 8 Crossfield Road Selly Oak Birmingham West Midlands B29 1WQ

Performance Modeling of Intelligent Car Parking Systems

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

On The Number Of Slim Semimodular Lattices

LUCRĂRI ŞTIINŢIFICE, SERIA I, VOL. XI (1) CHANGES IN COMMERCIAL ACTIVITIES IN MEZŐHEGYES ( )

Tudományos Ismeretterjesztő Társulat

TestLine - Angol teszt Minta feladatsor

DOAS változások, összefoglaló

Expansion of Red Deer and afforestation in Hungary

Regional Expert Meeting Livestock based Geographical Indication chains as an entry point to maintain agro-biodiversity


Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

A statisztika alapjai - Bevezetés az SPSS-be -

1. feladat: Hallgasd meg az angol szöveget, legalább egyszer.

FÖLDRAJZ ANGOL NYELVEN

(c) 2004 F. Estrada & A. Jepson & D. Fleet Canny Edges Tutorial: Oct. 4, '03 Canny Edges Tutorial References: ffl imagetutorial.m ffl cannytutorial.m

Geokémia gyakorlat. 1. Geokémiai adatok értelmezése: egyszerű statisztikai módszerek. Geológus szakirány (BSc) Dr. Lukács Réka

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

1. Gyakorlat: Telepítés: Windows Server 2008 R2 Enterprise, Core, Windows 7

Business Opening. Very formal, recipient has a special title that must be used in place of their name

IES TM Evaluating Light Source Color Rendition

A golyók felállítása a Pool-biliárd 8-as játékának felel meg. A golyók átmérıje 57.2 mm. 15 számozott és egy fehér golyó. Az elsı 7 egyszínő, 9-15-ig

Introduction to Statistics

(NGB_TA024_1) MÉRÉSI JEGYZŐKÖNYV

ANGOL NYELVI SZINTFELMÉRŐ 2012 A CSOPORT. to into after of about on for in at from

2. Local communities involved in landscape architecture in Óbuda

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

Szívkatéterek hajlékonysága, meghajlítása

Report on the main results of the surveillance under article 11 for annex II, IV and V species (Annex B)

Átírás:

Cluster analysis in SPSS

Cluster Analysis Cluster analysis one of the methods of classification, which aims to show that there are groups, which within-group distance is minimal, since cases are more similar to each other than members of other groups. However, the between-group distance is high, that is so create different, independent, homogen clusters. The aim is to identify groups and explore the structure.

Cluster analysis in practice Market segmentation 1. Definition of the relevant market 2. Definition of segmentation bases/variables 3. Segmentation (Factor-, cluster analysis) 4. Characterization of the consumers in each group Market structure analysis (substitutability of competing brands) Identification of the new product opportunities Test market selection Data reduction

Stages of cluster analysis 1 2 3 4 5 6 7 8 9 General Purpose Main Cluster Method Variable Selection Examination of the terms of cluster analysis Similarity and Distance Measures Further Cluster Methods Number of Clusters Validity Tests Name and Characterization of Clusters

Exercise Asked the consumers of a desiccated soup producer company Name : String Cooking: how often cook in a scale from 1 to 7 Domesticated : how much domesticated in a scale from 1 to 7 Gender : 1: male, 2: female Dwelling place: 1:Budapest, 2:county town, 3:other

1. General Purpose Aim of the analysis: Groupping the soup powder customers based on some statistical criteria. Observations: Population: eg.: soup powder customers in Hungary Determine the sample size and the sample design In this casse: n=16 person (no representativity)

Combined use: 1. Hierarchical: ideal number of clusters 2. Filtering outliers 3. Non hierarchical classification Miskolci Egyetem Gazdaságtudományi Kar Hierarchical method We don t know in advance how many clusters want to create It is preferred to use, if: Non hierarchical method High number of sampling units Less dependent on outliers Less dependent on the measure of distance Less dependent on whether in the analysis has been irrelevant variable Disadvantages Sensitive to outliers The number of clusters must be predetermined Selection of the cluster center Depends on the sequence of obsevations

3. Variable Selection Strength of correlation Analyze / Regression/ Linear Multicollinearity

4. Examination of the terms of cluster analysis I. Is the sample representative? Here is NOT we can t make conclusions about the population Managing Outliers An abnormal observations, which are not typical in the population; Underrepresent the size of the group in the population. Analyze / Classify / Hierarchical Cluster / Method: Nearest neighbour

4. Examination of the terms of cluster analysis II. Scales Similar scaling data are comparable Recommended: same unit of measurement (reason: larger deviation shows bigger influence) E.g.: we measure the cooking and the domestic aspect in a different interval; We comparing the income with the cooking etc. If it s different: standardization! If: - the relative importance of the responses compared to each other is relevant, - we re looking for similar profiles, - we don t concern to the respondent s style effect. xi x Mean 0, zi Comparable data s deviation 1 x

Analyze / Classify / Hierarchical Cluster / Method

5. Determination of the measure of similarity and distance Measure of distance Binary variables Measure of similarity Measure of distance Metric variables Measure of similarity Euclidean distance Russel and Rao Euclidean distance Pearson correlation Squared Euclidean distance Simple matching Squared Euclidean distance Variance Jaccard City block Yule Chebychev Analyze / Classify / Hierarchical Cluster / Method

6. Determination of the measure of similarity and distance Cluster Methods Hierarchical Non-hierarchical Agglomerative Divisive Linkage Methods Variance Methods Centroid Methods Single Ward Complete Average

Output Rita Vera The steps of contraction What kind of distances was the base to the contraction of the clusters? Too big step In which steps appears next the new common cluster (the lower number is the registration number) In which steps appears first the stage cluster

Vertical Icecle 3 In the case of large number of items it s difficult to handle. Géza ~ outlier We start the interpretation from the bottom: Where is the biggest line between the names? Vera and Rita 1. making clusters

Dendogram Contracts based on the minimum distance Handling of outliers Géza ~ outlier Abnormal? Should be excluded?

Analyze / Classify / Hierarchical Cluster / Method: Ward Metric variables No outliers No correlation between the variables

7. Determine the number of clusters a. Researcher experience b. Distances c. Scree plot d. Relative measure of clusters

b) Distance ( Dendogram) Where the value of the coefficient increases suddenly But: trying to determine the number of clusters around 5. 2 or 3 clusters

c) Scree plot Create Graph Line

3 clusters (n-1) cases

Graphs / Scatter/Dot

9. Explanation, characterization of clusters Clustercentroids and standard deviations Quantitative (cooking, domesticated) +qualitative (cluster) variables Mixed dependence Analyze / Compare Means / Means

Demographic analysis (gender (nem), residency (lakhely)) Quantitative-qualitative variables association Analyze / Descriptive Statistics / Crosstabs

Quantitative (income ) +qualitative (cluster) variables Mixed dependence (ANOVA) Analyze / Compare Means / Means

9. Characterization of clusters, labeling Variables involved in the cluster analysis Cooking a lot Domesticated Gender 1. cluster 2. cluster 3. cluster No Yes No No Yes No Predomimantly men Predominantly women Women Residency? Big cities County towns Income Low (3000 ) Low (2200 ) Labels Carelesses Housewives High (7667 ) Variables involved only in the characterization Businesswomen

Graphs / Pie

8. Verification of the validity of cluster analysis Different measure of distance Different method of cluster analysis Leave out variables Divide the sample into 2 parts Changing the order of cases Non hierarchical cluster analysis

Non hierarchical cluster analysis in the SPSS

Hierarchical method in SPSS-ben Non hierarchical method Miskolci Egyetem Gazdaságtudományi Kar Helps to determine the number of clusters By changing the number of clusters, the contents of the clusters made earlier will not change Lots of measures of distance Standardization of variables Dendogram Sensitive to outliers Long to find the ideal combination Nominal and metric variables are not combinable K-Means Benefits The number of sample units is high Less dependent on outliers Two Steps Less dependent on the measure of distance Less dependent on whether in the analysis has been irrelevant variable Fastest Nominal and metric variables are combinable Suggest the ideal number of clusters Filtering the outliers Default standardization Disadvantages The number of clusters must be pre-determined Selection of the cluster center Depends on the sequence of obsevations By changing the number of clusters, the contents of the clusters will be different

(Name) (Cooking) (Domesticated) (Gender) (Place) (Income) 1 Béla 1 3 1 3 3000 2 Jenő 2 3 1 1 1500 3 Bea 5 5 2 2 2000 4 Marci 2 4 1 3 1000 5 Ubul 4 4 1 1 7000 6 Zsuzsa 2 7 2 1 8000 7 Rita 2 6 2 2 7000 8 Zoli 3 4 1 3 1500 9 Dávid 2 2 1 1 5000 10 Robi 6 5 1 3 1000 11 Kriszti 3 3 2 3 2000 12 Zsófi 6 6 2 2 4000 13 Géza 7 1 1 2 8000 14 Éva 6 7 2 1 1000 15 Dóra 5 7 2 1 3000 16 Vera 1 6 2 2 6000 TK/286. oldal (Sajtos-Mitev

7. Verification of the validity of cluster analysis K-Means method Analyze / Classify / K-Means Cluster Determination of initial cluster center

Output 3 clusters 3 centers of cluster

7. Hierarchical method Comparison Reliable = Non hierarchical method K-Means

Exercise Classification of consumers based on shopping attitudes: Evaluate the statements in a scale from 1 to 7: V1: The shopping is fun. V2: The shopping is not good for the wallet. V3: I often combine shopping with visiting a restaurant. V4: During shopping I try to do the best purchasing. V5: I don t care about shopping. V6: A lot of money can be saved with the comparison of the prices. Malhotra [2005]: Marketingkutatás 703.o.

Number V1 V2 V3 V4 V5 V6 Miskolci Egyetem Gazdaságtudományi Kar 1 Üzleti Információgazdálkodási 6 4 és Módszertani 7 Intézet 3 2 3 2 2 3 1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6 4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9 2 4 3 3 6 3 10 3 5 3 6 4 6 11 1 3 2 3 5 3 12 5 4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7 15 6 5 4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2 6 4 3 19 4 6 3 7 2 7 20 2 3 2 4 7 2

Output

The clusters: 1. Entertainment-loving, interested customers 2. Apathetic customers 3. Careful customers