AP Statistics Curriculum 2007 Hypothesis Proportion

From Socr

Revision as of 00:29, 17 February 2008 by IvoDinov (Talk | contribs)
Jump to: navigation, search


General Advance-Placement (AP) Statistics Curriculum - Testing a Claim about Proportion

Testing a Claim about Proportion

Recall that for large samples, the sampling distribution of the sample proportion \hat{p} is approximately Normal, by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When the sample size is small, the normal approximation may be inadequate. To accommodate this, we will modify the sample-proportion \hat{p} slightly and obtain the corrected-sample-proportion \tilde{p}:

\hat{p}={y\over n} \longrightarrow \tilde{y}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},

where z_{\alpha \over 2} is the normal critical value we saw earlier.

The standard error of \hat{p} also needs a slight modification

SE_{\hat{p}} =  \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} =  \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.

Hypothesis Testing about a Sinlge Sample Proportion

  • Null Hypothesis: Ho:p = po (e.g., 0), where p is the population proportion of interest.
  • Alternative Research Hypotheses:
    • One sided (uni-directional): H1:p > po, or Ho:p < po
    • Double sided: H_1: p \not= p_o
  • Test Statistics: Z_o={\tilde{p} -p_o \over SE_{\tilde{p}}}


Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years. At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Use α = 0.05 to formulate a test a research hypothesis that the proportion of subject on aspirin treatment that have heart attacks within 2 years of treatment is po = 0.04.

\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038
SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085

And the corresponding test statistics is

Z_o={\tilde{p} - 0.04 \over SE_{\tilde{p}}}={0.002 \over 0.0085}=0.2353

The p-value corresponding to this test-statistics is clearly insignificant.

Genders of Siblings Example

Is the gender of a second child influenced by the gender of the first child, in families with >1 kid? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1st children. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand.

  Second Child
Male Female Total
First Child Male 3,202 2,776 5,978
Female 2,620 2,792 5,412
Total 5,822 5,568 11,390

Let p1=true proportion of girls in mothers with girl as first child, p2=true proportion of girls in mothers with boy as first child. The parameter of interest is p1p2.

  • Hypotheses: Ho:p1p2 = 0 (skeptical reaction). H1:p1p2 > 0 (research hypothesis).
  Second Child
Number of births Number of girls Proportion
Group 1 (Previous child was girl) n1 = 54122792 \hat{p}_1=0.516
2 (Previous child was boy) n2 = 5978 2776 \hat{p}_2=0.464
  • Test Statistics: Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1) and Zo = 5.4996.
  • P_value = P(Z>Z_o)< 1.9\times 10^{-8}. This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.
  • Practical significance: The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using confidence intervals. A 95% CI(p1p2) = [0.033;0.070] is computed by p_1-p_2 \pm 1.96 SE(p_1 - p_2). Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.


Translate this page:









الامارات العربية المتحدة


इस भाषा में









Česká republika





Personal tools