The lower right-hand side of the worksheet in Figure 2 shows how to calculate the maximum likelihood statistic (using Definition 1 of Goodness of Fit). Χ 2-crit = ( α, df) = (.05,1) = 3.841 < 5.516 = χ 2-obsĪnd so we reject the null hypothesis and conclude there is a significant difference in the cure rate between the two therapies.Īs was mentioned in Goodness of Fit, the maximum likelihood test is a more precise version of the chi-square test employed thus far. Since we are dealing with a 2 × 2 table of observations, df = (2 – 1)(2 – 1) = 1. The value of this statistic is 5.516 (cell D17 in Figure 2). This time, however, we will use the approach employed in Example 2 of Goodness of Fit, namely calculating the Pearson’s chi-square test statistic directly (using Definition 2 of Goodness of Fit). We next calculate the Expected Values from the Observed Values and then the p-value of the chi-square statistic as we did in Example 1. H 0: There is no difference between the two therapies’ ability to cure cocaine dependence We establish the following null hypothesis: She tests 150 patients and obtains the results in the upper left part of the table below (labeled Observed Values).įigure 2 – Chi-square tests for independence We reject the null hypothesis and conclude that the level of schooling attained is not independent of parents’ wealth.Įxample 2: A researcher wants to know whether there is a significant difference in two therapies for curing patients of cocaine dependence (defined as not taking cocaine for at least 6 months). We can now calculate the p-value for the chi-square test statistic as CHISQ.TEST( Obs, Exp, df) where Obs is the 3 × 3 array of observed values, Exp = the 3 × 3 array of expected values and df = (row count – 1) (column count – 1) = 2 ∙ 2 = 4. See Matrix Operations for more information about the MMULT array function. An alternative approach for filling in all the cells in the Expected Values table is to place the following array formula in range H6:J8 (and then press Ctrl-Shft-Enter): We then set the value of every cell in the Expected Values table to beĮ.g. We start by setting all the totals in the Expected Values table to be the same as the corresponding total in the Observed Values table (e.g.
In this way we can fill out the table for expected values. Thus, based on the null hypothesis, we expect that 10.0% of 175 = 17.5 people are from a wealthy family and have graduated from university. But based on the null hypothesis, the event of being from a wealthy family is independent of graduating from university, and so the expected probability of both events is simply the product of the two events, or 25.7% ∙ 38.9% = 10.0%. Similarly the probability that someone in the sample graduated from university is 68/175 = 38.9%. We know that 45 of the 175 people in the sample are from wealthy families, and so the probability that someone in the sample is from a wealthy family is 45/175 = 25.7%.
We now show how to construct the table of expected values (i.e. To accomplish this we use the fact (by Definition 3 of Basic Probability Concepts) that if A and B are independent events then P( A ∩ B) = P( A) ∙ P( B). We also assume that the proportions for the sample are good estimates for the probabilities of the expected values. We use the chi-square test, and so need to calculate the expected values that correspond to the observed values in the table above. H 0: Highest level of schooling attained is independent of parents’ wealth Based on the data collected is the person’s level of schooling independent of their parents’ wealth?įigure 1 – Observed data and expected values for Example 1 The results are summarized on the left side of Figure 1 (Observed Values). Instead you need to use the equivalent function, CHITEST.Įxample 1: A survey is conducted of 175 young adults whose parents are classified either as wealthy, middle class or poor to determine their highest level of schooling (graduated from university, graduated from high school or neither). The ranges R1 and R2 must have the same size and shape and can only contain numeric values.įor versions of Excel prior to Excel 2010, the CHISQ.TEST function doesn’t exist. For R1 = the array of observed data and R2 = the array of expected values, we haveĬHISQ.TEST(R1, R2) = CHISQ.DIST( x, df) where x is calculated from R1 and R2 as in Definition 2 of Goodness of Fit and df = (row count – 1) (column count – 1). In these cases df = (row count – 1) (column count – 1).Įxcel Function: The CHISQ.TEST function described in Goodness of Fit can be extended to support ranges consisting of multiple rows and columns. Such data are organized in what are called contingency tables, as described in Example 1. The method described in Goodness of Fit can also be used to determine whether two sets of data are independent of each other.