Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Nonparametric Tests. Petra Petrovics.

Nonparametric Tests Petra Petrovics PhD Student

Hypothesis Testing Parametric Tests Mean o a population Population proportion Population Standard Deviation Nonparametric Tests Test or Independence Analysis o Variance Goodness o Fit

I. Test or Independence Independence o events: i the probability o their joint occurrence is equal to the product o their marginal probabilities In case o qualitative or territory variables (measured in nominal scale) A and B are independent i: P(A B) = P(A)P(B)

Hypothesis test or independence H 0 : the two classiication variables are independent o each other H 1 : the two classiication variables are NOT independent H 0 : P ij = P i. P.j H 1 : ij: P ij P i. P.j

Properties o the Test or Independence The data are the observed requencies. The data is arranged into a contingency table. The degrees o reedom are the degrees o reedom or the row variable times the degrees o reedom or the column variable. It is not one less than the sample size, it is the product o the two degrees o reedom. It is always a right tail test. It has a chi-square distribution.

Properties o the Test or Independence The expected value is computed by taking the row total times the column total and dividing by the grand total. The value o the test statistic doesn't change i the order o the rows or columns are switched or interchanged (transpose o the matrix). The test statistic is χ = ( Observed Expected Expected )

Test Statistic Chi-square test statistic or independence: t s = * ( ij ij ) χ * i= 1 j= 1 ij ij = expected requencies = i. n.j Degree o reedom: d = (s-1)(t-1)

Example An article in Business Week reports proits and losses o irms by industry. A random sample o 100 irms is selected, and or each irm o the sample, we record whether the company made money or lost money, and whether or not the irm is a service company. The data are summarized in a x contingency table. Using the inormation in the table, determine whether or not you believe that the two events the company made a proit this year and the company is in the service industry are independent. (α = 1%)

Contingency table o Proit/Loss vs. Industry Type Industry Type Service Nonservice Total Proit 4 18 60 Loss 6 34 40 Total 48 5 100

Solution H 0 : P ij = P i. P.j H 1 : ij: P ij P i. P.j n n 60 48 100 1..1 11 = = = 60 5 100 1.. 1 = = = n n 40 48 = 100 40 5 = 100..1 1 = =.. 1 = = 8.8 31. 19. 0.8

χ = (4 8.8) 8.8 + (18 31.) 31. + (6 19.) 19. + (34 0.8) 0.8 = 9.09 d = (-1) (-1) = 1 Critical value: χ crit = 6.63 Pr H 0 H 1 0 6.63 χ 1 α χ 9.09 we reject the null hypothesis the proit/loss and industry type are probably not independent

II. Analysis o variance (ANOVA) In case o a qualitative and a quantitative variable H 0 : the two variables are independent o each other H 1 : the two variables are not independent H 0 : β 1 = β = = β m = 0 H 1 : not all β i (i = 1,, n) are equal

Assumptions o ANOVA Independent random sampling Normally distributed response variable Equal variances o populations

where Test Statistic SSTR F = ( m 1) SSE ( n m ) m: number o populations n: total sample size SSTR ( x x) = n = ( ) i i SSE n i 1 s i F crit ( ν1 = m 1; ν = n m) (1 α )

ANOVA table Source o variation Sum o Squares Degrees o Freedom Mean Square F ratio Between Groups (Treatment) SSTR m 1 MSTR SSTR = m 1 F = MSTR MSE Within Groups (Error) SSE n m MSE = SSE n m Total SST n 1

Example A shop assistant examined the demand o bread: Days Number o Sold bread days Mean (kg) Variance Monday 6 4 84.8 Other days 10 41.8 70.4 Saturday 6 57.33 43.87 46.09 110.47

Assuming normal distribution and equal standard deviations : H 0 : β 1 = β = = β m = 0 H 1 : not all β i (i = 1,, 5) are equal SST=S y = d y =1 110.47=319.87 SSE=5 84.8+9 70.4+5 43.87=176.95 SSTR=6(4-46.09) +10(41.8-46.09) +6(57.33-46.09) =104.44 SST=SSE+SSTR=176.95+104.44=319.39 F = 104.44 3 1 176.95 3 = 7.76

H 0 H 1 crit : ν 1= m 1= 3 1= ν = n m= 3= 19 F 1 0 F crit 3.5 < F=7.76 the computed test statistic alls in the rejection region The demand o bread is not the same in the examined days.

III. Goodness o Fit I a sample o data came rom a population with a speciic distribution. It is a statistical test o how well our data support an assumption about the distribution o a population or random variable o interest. The test determines how well an assumed distribution its the data.

Steps in a Chi-square Analysis 1. We hypothesize about a population by stating the null and alternative hypotheses.. We compute requencies o occurance o certain events that we expect under the null hypothesis. These give us the expected counts o data points in dierent cells. 3. We note the observed counts o data points alling in the dierent cells. 4. We consider the dierence between the observed and the expected Chi-square statistic. 5. We compare the value o the statistic with critical points o the chi-square distribution and make a decision.

Test Statistic Chi-square statistic: χ = ( Observedi Expected Expected ) npi ) np k k i i i = = n( i = 1 i i = 1 i P i ( g 1) D: r-1-b where r: number o categories b: number o estimated parameters

Goodness-o-it test or multinomial distribution H 0 : the probabilities o occurance o events E 1, E k are given by the speciied probabilities p 1, p,, p k H 1 : the probabilities o the k events are not the p i stated in the null hypothesis Pr H 0 H 1 0 χ 1α χ

Goodness-o-it test or normal distribution H 0 : Normal distribution H 1 : Not normal distribution Pr H 0 H 1 E.g.: Kolmogorov-Smirnov Test 0 χ 1α χ

Thanks or your attention!