Chapter 16 - Chi-Squared Tests

16.1

Chapter 16

Chi-Squared Tests

16.2

A Common Theme…

What to do?

Data Type?

Number ofCategories?

StatisticalTechnique:

Describe apopulation

Nominal

Two or more

X2 goodness of fittest

Compare twopopulations

Nominal

Two or more

X2 test of acontingency table

Compare two ormore populations

Nominal

X2 test of acontingency table

Analyze relationshipbetween twovariables

Nominal

X2 test of acontingency table

One data type…

…Two techniques

16.3

Two Techniques…

The first is a goodness-of-fit test applied to data produced bya multinomial experiment, a generalization of a binomialexperiment and is used to describe one population of data.

The second uses data arranged in a contingency table todetermine whether two classifications of a population ofnominal data are statistically independent; this test can alsobe interpreted as a comparison of two or more populations.

In both cases, we use the chi-squared ( ) distribution.

16.4

The Multinomial Experiment…

Unlike a binomial experiment which only has two possibleoutcomes (e.g. heads or tails), a multinomial experiment:

• Consists of a fixed number, n, of trials.

• Each trial can have one of k outcomes, called cells.

• Each probability pi remains constant.

• Our usual notion of probabilities holds, namely:

p1 + p2 + … + pk = 1, and

• Each trial is independent of the other trials.

16.5

Chi-squared Goodness-of-Fit Test…

We test whether there is sufficient evidence to reject aspecified set of values for pi.

To illustrate, our null hypothesis is:

H0: p1 = a1, p2 = a2, …, pk = ak

(where a1, a2, …, ak are the values of interest)

Our research hypothesis is:

H1: At least one pi ≠ ai

16.6

Chi-squared Goodness-of-Fit Test…

The test builds on comparing actual frequency and theexpected frequency of occurrences in all the cells.

Example 16.1…

We compare market share before and after an advertisingcampaign to see if there is a difference (i.e. if the advertisingwas effective in improving market share).

H0: p1 = a1, p2 = a2, …, pk = ak

Where ai is the market share before the campaign. If therewas no change, we’d expect H0 to not be rejected. If there isevidence to reject H0 in favor of: H1: At least one pi ≠ ai,what’s a logical conclusion?

16.7

Example 16.1…

Market shares before the advertising campaign…

Company A – 45%

Company B – 40%

All Others – 15 %

200 customers surveyed after the campaign. The results:

Company A – 102 customers preferred their product.

Company B – 82 customers…

All Others – 16 customers.

Before the campaign, we’d expect 45% of 200 customers(i.e. 90 customers) to prefer company A’s product. After thecampaign, we observe its 102 customers. Does this mean thecampaign was effective? (i.e. at a 5% significance level).

IDENTIFY

16.8

Example 16.1…

Observed Frequency

Expected Frequency

Are these changesstatisticallysignificant?

16.9

Example 16.1…

Our null hypothesis is:

H0: pCompanyA = .45, pCompanyB = .40, pOthers = .15

(i.e. the market shares pre-campaign), and our alternativehypothesis is:

H1: At least one pi ≠ ai

In order to complete our hypothesis testing we need a teststatistic and a rejection region…

IDENTIFY

16.10

Chi-squared Goodness-of-Fit Test…

Our Chi-squared goodness of fit test statistic is given by:

Note: this statistic is approximately Chi-squared with k–1degrees of freedom provided the sample size is large. Therejection region is:

observedfrequency

expectedfrequency

16.11

Example 16.1…

In order to calculate our test statistic, we lay-out the data in atabular fashion for easier calculation by hand:

Company

ObservedFrequency

ExpectedFrequency

Delta

Summation

Component

(fi – ei)

(fi – ei)2/ei

102

1.60

0.05

Others

-14

6.53

Total

200

8.18

Check that these are equal

COMPUTE

16.12

Example 16.1…

Our rejection region is:

Since our test statistic is 8.18 which is greater than ourcritical value for Chi-squared, we reject H0 in favor of H1,that is,

“There is sufficient evidence to infer that the proportionshave changed since the advertising campaigns wereimplemented”

INTERPRET

16.13

Example 16.1…

Note: Table 5 in Appendix B does not allow for the directcalculation of , so we have to use Excel:

COMPUTE

16.14

Example 16.1…

Note: There are a couple of different ways to calculate thep-value of the test:

p-value

Computed manuallyfrom our table

Computed directlyfrom the data

16.15

Required Conditions…

In order to use this technique, the sample size must be largeenough so that the expected value for each cell is 5 or more.(i.e. n x pi ≥ 5)

If the expected frequency is less than five, combine it withother cells to satisfy the condition.

16.16

Identifying Factors…

Factors that Identify the Chi-Squared Goodness-of-Fit Test:

ei=(n)(pi)

16.17

Chi-squared Test of a Contingency Table

The Chi-squared test of a contingency table is used to:

• determine whether there is enough evidence to inferthat two nominal variables are related, and

• to infer that differences exist among two or morepopulations of nominal variables.

In order to use use these techniques, we need to classify thedata according to two different criteria.

16.18

Example 16.2…

The demand for an MBA program’s optional courses andmajors is quite variable year over year.

The research hypothesis is that the academic background ofthe students (i.e. their undergrad degrees) affects their choiceof major.

A random sample of data on last year’s MBA students wascollected and summarized in a contingency table…

IDENTIFY

16.19

Example 16.2

The Data

MBA Major

UndergradDegree

Accounting

Finance

Marketing

Total

BEng

BBA

Other

Total

152

16.20

Example 16.2…

Again, we are interesting in determining whether or not theacademic background of the students affects their choice ofMBA major. Thus our research hypothesis is:

H1: The two variables are dependent

Our null hypothesis then, is:

H0: The two variables are independent.