Nonparametric Tests Petra Petrovics
Hypothesis Testing Parametric Tests Mean of a population Population proportion Population Standard Deviation Nonparametric Tests Test for Independence Analysis of Variance Goodness of Fit
I. Test for Independence Independence of events: if the probability of their joint occurrence is equal to the product of their marginal probabilities In case of qualitative or territory variables (measured in nominal scale) A and B are independent if: P(A B) = P(A)P(B)
Hypothesis test for independence H 0 : the two classification variables are independent of each other H 1 : the two classification variables are NOT independent H 0 : P ij = P i. P.j H 1 : ij: P ij P i. P.j
Properties of the Test for Independence The data are the observed frequencies. The data is arranged into a contingency table. The degrees of freedom are the degrees of freedom for the row variable times the degrees of freedom for the column variable. It is not one less than the sample size, it is the product of the two degrees of freedom. It is always a right tail test. It has a chi-square distribution.
Properties of the Test for Independence The expected value is computed by taking the row total times the column total and dividing by the grand total. The value of the test statistic doesn't change if the order of the rows or columns are switched or interchanged (transpose of the matrix). The test statistic is ( Observed Expected Expected )
Test Statistic Chi-square test statistic for independence: t s * ( f ij fij ) * f i1 j1 ij f ij expected frequencies f i. f n.j Degree of freedom: df = (s-1)(t-1)
Example An article in Business Week reports profits and losses of firms by industry. A random sample of 100 firms is selected, and for each firm of the sample, we record whether the company made money or lost money, and whether or not the firm is a service company. The data are summarized in a x contingency table. Using the information in the table, determine whether or not you believe that the two events the company made a profit this year and the company is in the service industry are independent. (α = 1%)
Contingency table of Profit/Loss vs. Industry Type Industry Type Service Nonservice Total Profit 4 18 60 Loss 6 34 40 Total 48 5 100
Solution H 0 : P ij = P i. P.j H 1 : ij: P ij P i. P.j f f f f f n f n 6048 100 605 100 1..1 11 f f 1.. 1 f f n f n 4048 100 405 100..1 1 f.. 1 8.8 31. 19. 0.8
(4 8.8) 8.8 (18 31.) 31. (6 19.) 19. (34 0.8) 0.8 9.09 df = (-1) (-1) = 1 Critical value: χ crit = 6.63 Pr H 0 H 1 0 6.63 1 a 9.09 we reject the null hypothesis the profit/loss and industry type are probably not independent
II. Analysis of variance (ANOVA) In case of a qualitative and a quantitative variable H 0 : the two variables are independent of each other H 1 : the two variables are not independent H 0 : β 1 = β = = β m = 0 H 1 : not all β i (i = 1,, n) are equal
Assumptions of ANOVA Independent random sampling Normally distributed response variable Equal variances of populations
where m: number of populations n: total sample size Test Statistic SB ( m 1) F SW ( n m) SB n x x i i SW n i 1 s i F crit ( 1 m 1; n m) (1a )
ANOVA table Source of variation Sum of Squares Degrees of Freedom Mean Square F ratio Between Groups (Treatment) SB m 1 SB MB m 1 F MB MW Within Groups (Error) SW n m MW SW n m Total ST n 1
Example A shop assistant examined the demand of bread: Days Number of days Mean (kg) Sold bread Variance Monday 6 4 84.8 Other days 10 41.8 70.4 Saturday 6 57.33 43.87 46.09 110.47
Assuming normal distribution and equal standard deviations : H 0 : β 1 = β = = β m = 0 H 1 : not all β i (i = 1,, 5) are equal ST=1 110.47=319.87 SW=5 84.8+9 70.4+5 43.87=176.95 SB=6(4-46.09) +10(41.8-46.09) +6(57.33-46.09) =104.44 ST=SW+SB=176.95+104.44=319.39 F 104.44 31 176.95 3 7.76
H 0 H 1 F crit 0 m 1 31 F crit < F=7.76 : 1 n m 3 19 3.5 the computed test statistic falls in the rejection region The demand of bread is not the same in the examined days.
III. Goodness of Fit If a sample of data came from a population with a specific distribution. It is a statistical test of how well our data support an assumption about the distribution of a population or random variable of interest. The test determines how well an assumed distribution fits the data.
Steps in a Chi-square Analysis 1. We hypothesize about a population by stating the null and alternative hypotheses.. We compute frequencies of occurance of certain events that we expect under the null hypothesis. These give us the expected counts of data points in different cells. 3. We note the observed counts of data points falling in the different cells. 4. We consider the difference between the observed and the expected Chi-square statistic. 5. We compare the value of the statistic with critical points of the chi-square distribution and make a decision.
Test Statistic Chi-square statistic: k i1 ( Observed i Expected Expected i i ) k i1 ( f i npi ) np i n( g P i i 1) Df: r-1-b where r: number of categories b: number of estimated parameters
Goodness-of-fit test for multinomial distribution H 0 : the probabilities of occurance of events E 1, E k are given by the specified probabilities p 1, p,, p k H 1 : the probabilities of the k events are not the p i stated in the null hypothesis Pr H 0 H 1 0 1 a
Goodness-of-fit test for normal distribution H 0 : Normal distribution H 1 : Not normal distribution Pr H 0 H 1 E.g.: Kolmogorov-Smirnov Test 0 1 a
Thanks for your attention!