Descrptve Statstcs Petra Petrovcs
DESCRIPTIVE STATISTICS Defto: Descrptve statstcs s cocered oly wth collectg ad descrbg data Methods: - statstcal tables ad graphs - descrptve measures Descrptve measure a sgle umber that provdes formato about a set of data
Defto of a Populato I. Cetral Tedecy - mea - mode - meda calculato locato II. Percetles, Quartles III. Dsperso IV. Shape
I.1. Meas Arthmetc mea (average) Geometrc mea the rato of ay two cosecutve umbers s costat e.g. compoud terest rate Harmoc mea uts of measuremet dffer betwee the umerator ad deomator e.g. mles per hour Quadratc mea e.g. the form of stadard devato
Arthmetc Mea Typcally referred to as mea. The most commo measure of cetral tedecy. It s the oly commo measure whch all the values play a equal role. Symbol:, called X-bar Raw Data Epressos: 1 2... 1 Frequecy Dstrbuto Epressos: 1 f f
Propertes of Mea d = - 100-100 150-50 210 +10 240 +40 300 +100 Σ 1000 0 200 1. The sum of the dffereces from the mea s 0. 2. =1 =1 - a 2 - = 0 s mmal, f a=
Propertes of Mea 2. +50 1,1=y Z=+y 100 150 110 210 150 200 165 315 210 260 231 441 240 290 264 504 300 350 330 630 Σ 1000 1250 1100 2100 200 250 220 420 3. If you add a costat a to every, the mea wll be a+ 4. If you multply every by a costat b, the mea wll be b* 5. 1, 2,..., y 1, y 2,..., y y 1 + y 1 ;...; + y y
Advatages of Mea Easy calculato, easy uderstadg, Always ests, The mea uses every value the data ad hece s a good represetatve of the data. The roy ths s that most of the tmes ths value ever appears the raw data. Repeated samples draw from the same populato ted to have smlar meas. Is t ecesarry to kow the values of every sgle observatos, the summary could be eough.
Dsadvatages of Mea It s sestve to etreme values/outlers, especally whe the sample sze s small. Therefore, t s ot a approprate measure of cetral tedecy for skewed dstrbuto. Mea caot be calculated for omal or oomal ordal data. Eve though mea ca be calculated for umercal ordal data, may tmes t does ot gve a meagful value, e.g. stage of cacer.
Weghted Meas : observed values f : weghts The value of weghted mea depeds o: absolute values of observatos, ratos of the weghts, weght could be f /=g also.
Geometrc Mea The rate of chage of a varable over tme. The th root of the product of values. Raw Data Epressos: g 1 Frequecy Dstrbuto Epressos: g π 1 f
GDP Hugary Perod Prevous quarter = 100% 2008. Q1 100.9 2008. Q2 99.8 2008. Q3 99.0 2008. Q4 98.1 Source: HCSO Average growth rate: g 4 4 1.0090.9980.990.981 0.978 0.994 99.4%
Harmoc Mea The harmoc mea of a set of umbers s foud by addg up the recprocals of the umbers, ad the dvdg by ths sum. Raw Data Epressos: h Frequecy Dstrbuto Epressos: h = =1 f = 1 =1, where = f k =1
Relato betwee the Parttoal Rato ad Dyamc Rato Factores Turover (MFt) Parttoal of turover (%) t 0 t 1 t 0 (%) t 1 (%) Rato (%) C 30 36 20 19 120 D 40 60 27 32 150 E 70 77 47 41 110 F 10 14.5 6 8 145 Total 150 187.5 100 100 125 t t 1 0
1,25 1,4 14,5 1,1 77 1,5 60 1,2 36 187,5 R A A R 1,25 1 1,45 0,06 1,1 0,47 1,5 0,27 1,2 0,2 B R B R 1,25 150 1,45 10 1,1 70 1,5 40 1,2 30 B R B R 1,25 150 187,5 B A R
Quadratc Mea q 1 2 k k q f f 1 1 2 k q g 1 2
I.2. Meda Statstc whch has a equal umber of varates above ad below t 1 Raw Data Epressos: 2 raked value Idepedet from etreme values Just from data order The mddle term ca be calculated for qualtatve ordal data, ' f me1 Me me 2 h f me me= lower boudary of the meda class = total umber of varates the frequecy dstrbuto f me-1 = cumulatve frequecy of the class below the meda class f me = frequecy of the meda class h = class terval
Water cosumpto (m 3 ) 2 Number of houses 15 8 8 15 25 19 27 25 35 17 44 35 45 9 53 45 7 60 Total 60 - raked value Me 25 60 27 2 17 10 f 26.76(m 3 )
I.3. Mode The value that occurs most frequetly Typcal value mo = the lower class boudary of the mode s class k 1 = the dfferece betwee the frequeces of the mode s class ad the prevous class k 2 = the dfferece betwee the frequeces of the mode s class ad the et class h = class terval Mo mo k 1 k 1 k 2 h
Water cosumpto (m 3 ) Number of houses 15 8 8 15 25 19 27 25 35 17 44 35 45 9 53 45 7 60 Total 60 - f Mo15 19 8 19 8 19 17 10 3 23.46 m
Measuremet Scale Nomal (Categorcal) Ordal Iterval Rato Best Measure of the Mddle Mode Meda Symmetrcal data: Mea Skewed data: Meda Symmetrcal data: Mea Skewed data: Meda
II. Percetles ad Quartles The P th percetle of a group of members s that value below whch le P% (P percet) of the umbers the group. Q 1 (lower quartle): The frst quartle s the 25th percetle. It s that pot below whch le ¼ of the data. Q 2 (mddle quartle): The meda s the data below whch le half the data. It s the 50th percetle. Q 3 (upper quartle): The thrd quartle s the 75th percetle pot. It s that below whch le 75 percet of the data.
Water cosumpto (m 3 ) Number of houses 15 8 8 15 25 19 27 25 35 17 44 35 45 9 53 45 7 60 Total 60 - f 4 3 4 raked value raked value 60 8 Q 4 1 15 10 18.68(m 19 360 44 Q 35 4 3 10 9 3 ) 36.11(m 3 )
III. Measures of Dsperso 1. Rage 2. Iterquartle Rage 3. Populato ad Sample Stadard Devato 4. Populato ad Sample Varace 5. Coeffcet of Varato
III.1. Rage The rage of a set of observatos s the dfferece betwee the largest observato ad the smallest observato. R X X ma m III.2. IQR Iterquartle rage: dfferece betwee the frst ad thrd quartles. IQR Q Q 3 1
III.3. Stadard Devato The stadard devato s a measure of dsperso aroud the mea. A low stadard devato dcates that the data pots ted to be very close to the mea, whereas hgh stadard devato dcates that the data are spread out over a large rage of values. I a ormal dstrbuto, 68% of cases fall wth oe stadard devato of the mea ad 95% of cases fall wth 2 stadard devatos.
Propertes of Stadard Devato 0, f =costat 0 N 1 2 2 2 q
Propertes of Stadard Devato 2 d = - y = +50 d =y - 100-100 10 000 150-100 150-50 2 500 200-50 210 +10 100 260 +10 240 +40 1 600 290 +40 300 +100 10 000 350 +100 Σ 1 000 0 24 200 1 250 0 200 y 250 d σ 2 =4 840 σ 2 =4 840 σ=69.6 y σ=69.6 If you add a costat a to every, the stadard devato wll be the same.
Propertes of Stadard Devato d = - y = 1.1 d =y - 100-100 10 000 110-110 12 100 150-50 2 500 165-55 3 025 210 +10 100 231 +11 121 240 +40 1 600 264 +44 1 936 300 +100 10 000 330 +110 12 100 Σ 1000 0 24 200 1 100 29 282 2 d = 200 = 220 σ 2 =4 840 σ 2 =5 856.4 σ=69.6 y y 2 d σ=76.52 If you multply every by a costat b, the stadard devato wll be b*σ
III.4. Varace Varace of a set of observatos: the average squared devato of the data pots from ther mea. Populato varace: Sample varace: S 2 2 ( X X ) f( X X ) 2 1 1 1 2 2 ( X X ) f ( X X ) 2 1 1 1 III.5. Coeffcet of Varato The measure of dsperso aroud the mea %. s V V X X 1 f 1 f
Water cosumpto (m 3 ) Number of houses 15 8 8 15 25 19 27 25 35 17 44 35 45 9 53 45 7 60 11.94m V Total 60-2 2 28 1920 28... 750 28 8 10 3 11.94 28 0.4266 60 42.66% 2 f
IV. Measures of Shape Skewess s a measure of the degree of asymmetry of a frequecy dstrbuto. Kurtoss s a measure of the flatess (versus peakedess) of a frequecy dstrbuto.
IV.1.Kurtoss The measure of the etet to whch observatos cluster aroud the cetral pot. Postve cluster more ad have loger tals Negatve cluster less ad have shorter tals For a ormal dstrbuto, the value of the kurtoss statstc s zero.
IV.2. Skewess A X Mo F ( Q3Me) ( Me Q1) ( Q Me) ( Me Q ) 3 1 Skewed to the left (log rght tal) Symmetry Me Mo X A>0 A<0 Mo Me X Skewed to the rght X Me Mo
Bo Plot The bo plot s a set of fve summary measures of the dstrbutos of the data: - the meda of the data - the lower quartle - the upper quartle - the smallest observato - the largest observato + asymetry
Bo&Whskers Source: Aczel [1996]
Elemets of Bo Plot Source: Aczel [1996]
Source: Aczel [1996]
Bo Plot The hghest salary The least stadard devato Q 3 Me Q 1
Thaks for your atteto!