AP Statistics Curriculum 2007 Estim Proportion

From Socr

(Difference between revisions)
Jump to: navigation, search
(Examples)
m (Sample-Size Estimation)
 
(25 intermediate revisions not shown)
Line 3: Line 3:
=== Estimating a Population Proportion===
=== Estimating a Population Proportion===
-
When the sample size is large, the sampling distribution of the sample proportion <math>\hat{p}</math> is approximately Normal, by [[AP_Statistics_Curriculum_2007_Limits_CLT |CLT]], as the sample proportion may be presented as a [[AP_Statistics_Curriculum_2007_Limits_Norm2Bin |sample average or Bernoulli random variables]]. When the sample size is small, the normal approximation may be inadequate. To accommodate this we will modify the '''sample-proportion''' <math>\hat{p}</math> slightly and obtain the '''corrected-sample-proportion''' <math>\tilde{p}</math>:
+
When the sample size is large, the sampling distribution of the sample proportion <math>\hat{p}</math> is approximately Normal, by [[AP_Statistics_Curriculum_2007_Limits_CLT |CLT]], as the sample proportion may be presented as a [[AP_Statistics_Curriculum_2007_Limits_Norm2Bin |sample average or Bernoulli random variables]]. When the sample size is small, the normal approximation may be inadequate. To accommodate this, we will modify the '''sample-proportion''' <math>\hat{p}</math> slightly and obtain the '''corrected-sample-proportion''' <math>\tilde{p}</math>:
-
: <math>\hat{p}={y\over n} \longrightarrow \tilde{y}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},</math>
+
: <math>\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},</math>
where [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2}</math> is the normal critical value we saw earlier]].
where [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2}</math> is the normal critical value we saw earlier]].
Line 18: Line 18:
===Example===
===Example===
-
Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years.  At the end of the two years he finds that during the study only 17 subjects had a heart attack. Calculate a 95% (<math>\alpha=0.05</math>) confidence interval for the true (unknown) proportion of subjects with early heart disease that have a heart attack while taking aspirin daily. Note that [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2} = z_{0.025}=1.96</math>]]:
+
Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years.  At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Calculate a 95% (<math>\alpha=0.05</math>) confidence interval for the true (unknown) proportion of subjects with early heart disease that have a heart attack while taking aspirin daily. Note that [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2} = z_{0.025}=1.96</math>]]:
: <math>\hat{p} = {17\over 500}=0.034</math> ; <math>\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038</math>  
: <math>\hat{p} = {17\over 500}=0.034</math> ; <math>\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038</math>  
Line 29: Line 29:
: <math>\tilde{p}\pm 1.96 SE_{\tilde{p}}=[0.0213, 0.0547]</math>
: <math>\tilde{p}\pm 1.96 SE_{\tilde{p}}=[0.0213, 0.0547]</math>
-
===Sample-size estimation===
+
===Sample-size Estimation===
-
For a given margin of error we can derive the minimum sample-size that guarantees an interval estimate within the given margin of error. The margin of error is the standard-error of the sample-proportion:
+
For a given margin of error (ME) we can derive the minimum sample-size that guarantees an interval estimate within the given margin of error. The margin of error (ME) is the product of the critical value (t or z) and the standard-error of the sample-proportion:
-
: <math>SE_{\tilde{p}} =  \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.</math>
+
: <math>ME = z_{\alpha\over 2}\times SE_{\tilde{p}} =  z_{\alpha\over 2}\sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.</math>
-
This equation has one unknown parameter (n), which we can solve for if we are given an upper limit for the margin of error.
+
This equation has one unknown parameter (n), which we can solve for if we are given an upper limit for the margin of error (remember that the critical value <math>z_{\alpha\over 2} \approx 2</math>):
-
: <math>SE_{\tilde{p}} \geq  \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2} \longrightarrow n \geq {\tilde{p}(1-\tilde{p})\over {SE_{\tilde{p}}^2} } -z_{\alpha \over 2}^2.</math>
+
: <math>ME \geq  z_{\alpha \over 2}\sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2} \longrightarrow n \geq z_{\alpha \over 2}^2{\tilde{p}(1-\tilde{p})\over {ME^2} } -z_{\alpha \over 2}^2.</math>
===Examples===
===Examples===
-
====Sample-SIze Estimation====
+
====Sample-Size Estimation====
-
How many subjects are needed if the heart-researchers want <math>SE < 0.005</math> for a 95% CI, and have a guess based on previous research that <math>\tilde{p}= 0.04</math>?
+
How many subjects are needed if the heart-researchers want <math>ME < 0.005</math> for a 95% CI, and have a guess based on previous research that <math>\tilde{p}= 0.04</math>?
-
: <math>n \geq {0.04(1-0.04)\over 0.005^2} - 1.96^2=1533.16 \approx 1534.</math>
+
: <math>n \geq {1.96^20.04(1-0.04)\over {0.005^2}} - 1.96^2= 5896.856 \approx 5897.</math>
====Siblings Genders====
====Siblings Genders====
Line 61: Line 61:
</center>
</center>
-
Let <math>p_1</math>=true proportion of girls in mothers with girl as first child, <math>p_2</math>=true proportion of girls in mothers with boy as first child. The parameter of interest is <math>p_1- p_2</math>. Hypotheses:
+
Let <math>p_1</math>=true proportion of girls in mothers with girl as first child, <math>p_2</math>=true proportion of girls in mothers with boy as first child. The parameter of interest is <math>p_1- p_2</math>.
-
: <math>H_o: p_1- p_2=0</math> (skeptical reaction). <math>H_1: p_1- p_2>0</math> (research hypothesis).
+
* Hypotheses: <math>H_o: p_1- p_2=0</math> (skeptical reaction). <math>H_1: p_1- p_2>0</math> (research hypothesis).
 +
 
<center>
<center>
{| class="wikitable" style="text-align:center; width:75%" border="1"
{| class="wikitable" style="text-align:center; width:75%" border="1"
|-
|-
-
| colspan=1 rowspan=2|&nbsp;
+
| colspan=2 rowspan=2|&nbsp;
-
| colspan=2| '''Second Child'''
+
| colspan=3| '''Second Child'''
|-
|-
|  Number of births || Number of girls || '''Proportion'''
|  Number of births || Number of girls || '''Proportion'''
|-
|-
-
| rowspan=2| '''Group''' || 1 (Previous child was girl) ||  5412||2792 || 0.516
+
| rowspan=2| '''Group''' || 1 (Previous child was girl) ||  <math>n_1=5412</math>||2792 || <math>\hat{p}_1=0.516</math>
|-
|-
-
|  2 (Previous child was boy) || 5978|| 2776 || 0.464
+
|  2 (Previous child was boy) || <math>n_2=5978</math>|| 2776 || <math>\hat{p}_2=0.464</math>
|}
|}
</center>
</center>
 +
 +
* Test Statistics: <math>Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1)</math> and <math>Z_o=5.4996</math>.
 +
 +
* <math>P_{value} = P(Z>Z_o)< 1.9\times 10^{-8}</math>. This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.
 +
 +
* '''Practical significance''': The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using [[AP_Statistics_Curriculum_2007_Estim_Proportion#Confidence_intervals_for_proportions |confidence intervals]]. A 95% <math>CI (p_1- p_2) =[0.033; 0.070]</math> is computed by <math>\hat{p}_1-\hat{p}_2 \pm 1.96 SE(\hat{p}_1 - \hat{p}_2)</math>, or when using the corrected sample proportion the CI is <math>\tilde{p}_1-\tilde{p}_2 \pm 1.96 SE(\tilde{p}_1 - \tilde{p}_2)</math>. Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.
 +
 +
* This [[SOCR_EduMaterials_AnalysisActivities_Chi_Contingency | SOCR Analysis Activity]] illustrates how to use the [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the p-values and answer the hypothesis testing challenge.
 +
<center>[[Image:SOCR_EBook_Dinov_Hypothesis_020508_Fig6.jpg|700px]]</center>
<hr>
<hr>
===References===
===References===
-
* TBD
+
* [http://muse.jhu.edu/journals/human_biology/v081/81.1.stansfield.pdf William Stansfield and Matthew Carlton (2009) The Most Widely Publicized Gender Problem in Human Genetics, Human Biology, February 2009, v. 81, no. 1, pp. 3–11].
 +
 
 +
===[[EBook_Problems_Estim_Proportion|Problems]]===
<hr>
<hr>

Current revision as of 01:26, 11 March 2011

Contents

General Advance-Placement (AP) Statistics Curriculum - Estimating a Population Proportion

Estimating a Population Proportion

When the sample size is large, the sampling distribution of the sample proportion \hat{p} is approximately Normal, by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When the sample size is small, the normal approximation may be inadequate. To accommodate this, we will modify the sample-proportion \hat{p} slightly and obtain the corrected-sample-proportion \tilde{p}:

\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},

where z_{\alpha \over 2} is the normal critical value we saw earlier.

The standard error of \hat{p} also needs a slight modification

SE_{\hat{p}} =  \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} =  \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.

Confidence intervals for proportions

The confidence intervals for the sample proportion \hat{p} and the corrected-sample-proportion \tilde{p} are given by

\hat{p}\pm z_{\alpha\over 2} SE_{\hat{p}}
\tilde{p}\pm z_{\alpha\over 2} SE_{\tilde{p}}

Example

Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years. At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Calculate a 95% (α = 0.05) confidence interval for the true (unknown) proportion of subjects with early heart disease that have a heart attack while taking aspirin daily. Note that z_{\alpha \over 2} = z_{0.025}=1.96:

\hat{p} = {17\over 500}=0.034 ; \tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038
SE_{\hat{p}}= \sqrt{0.034(1-0.034)\over 500}=0.0036; SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085

And the corresponding confidence intervals are given by

\hat{p}\pm 1.96 SE_{\hat{p}}=[0.026944, 0.041056]
\tilde{p}\pm 1.96 SE_{\tilde{p}}=[0.0213, 0.0547]

Sample-size Estimation

For a given margin of error (ME) we can derive the minimum sample-size that guarantees an interval estimate within the given margin of error. The margin of error (ME) is the product of the critical value (t or z) and the standard-error of the sample-proportion:

ME = z_{\alpha\over 2}\times SE_{\tilde{p}} =  z_{\alpha\over 2}\sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.

This equation has one unknown parameter (n), which we can solve for if we are given an upper limit for the margin of error (remember that the critical value z_{\alpha\over 2} \approx 2):

ME \geq  z_{\alpha \over 2}\sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2} \longrightarrow n \geq z_{\alpha \over 2}^2{\tilde{p}(1-\tilde{p})\over {ME^2} } -z_{\alpha \over 2}^2.

Examples

Sample-Size Estimation

How many subjects are needed if the heart-researchers want ME < 0.005 for a 95% CI, and have a guess based on previous research that \tilde{p}= 0.04?

n \geq {1.96^20.04(1-0.04)\over {0.005^2}} - 1.96^2= 5896.856 \approx 5897.

Siblings Genders

Is the gender of a second child influenced by the gender of the first child, in families with >1 kid? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1st children. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand.

  Second Child
Male Female Total
First Child Male 3,202 2,776 5,978
Female 2,620 2,792 5,412
Total 5,822 5,568 11,390

Let p1=true proportion of girls in mothers with girl as first child, p2=true proportion of girls in mothers with boy as first child. The parameter of interest is p1p2.

  • Hypotheses: Ho:p1p2 = 0 (skeptical reaction). H1:p1p2 > 0 (research hypothesis).
  Second Child
Number of births Number of girls Proportion
Group 1 (Previous child was girl) n1 = 54122792 \hat{p}_1=0.516
2 (Previous child was boy) n2 = 5978 2776 \hat{p}_2=0.464
  • Test Statistics: Z_o = {Estimate-HypothesizedValue\over SE(Estimate)} = {\hat{p}_1 - \hat{p}_2 - 0 \over SE(\hat{p}_1 - \hat{p}_2)} = {\hat{p}_1 - \hat{p}_2 - 0 \over \sqrt{{\hat{p}_1(1-\hat{p}_1)\over n_1} + {\hat{p}_2(1-\hat{p}_2)\over n_2}}} \sim N(0,1) and Zo = 5.4996.
  • P_{value} = P(Z>Z_o)< 1.9\times 10^{-8}. This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.
  • Practical significance: The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using confidence intervals. A 95% CI(p1p2) = [0.033;0.070] is computed by \hat{p}_1-\hat{p}_2 \pm 1.96 SE(\hat{p}_1 - \hat{p}_2), or when using the corrected sample proportion the CI is \tilde{p}_1-\tilde{p}_2 \pm 1.96 SE(\tilde{p}_1 - \tilde{p}_2). Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.

References

Problems




Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools