LearningActivities CauchyTGaussian

From Socr

(Difference between revisions)
Jump to: navigation, search
(Solutions)
(Generalized Cauchy distribution CDF derivation)
 
(35 intermediate revisions not shown)
Line 1: Line 1:
== [[LearningActivities| Distributome Learning Activities]] - Distributome Activity on the relations between Cauchy, Student's T and Gaussian Distributions ==
== [[LearningActivities| Distributome Learning Activities]] - Distributome Activity on the relations between Cauchy, Student's T and Gaussian Distributions ==
-
===Overview===
+
===Introduction===
[[Image:LearningActivities_CauchyTGaussian_Fig1.png|150px|thumbnail|right| The relation between Cauchy and Gaussian distributions ]]
[[Image:LearningActivities_CauchyTGaussian_Fig1.png|150px|thumbnail|right| The relation between Cauchy and Gaussian distributions ]]
-
 
-
This activity illustrates the inter-distribution relationships between Cauchy, Student's T and Standard Normal (Gaussian) distributions.
 
Surveys about public opinions on controversial social issues are becoming increasingly frequent as topics such as the legalization of marijuana, abortion policy, marriage rights for homosexuals, and immigration policy are hotly debated in the media.  For example, both the [http://i2.cdn.turner.com/cnn/2011/images/04/19/rel6h.pdf Opinion Research Corporation (polling for CNN)] and the [http://www.people-press.org/2011/03/03/section-3-attitudes-toward-social-issues/ Pew Research Center for The People and The Press conducted surveys] of American adults in spring of 2011 to estimate the percentage of the public that favors the legalization of marijuana.  The sample sizes in the two polls were 824 for the Opinion Research Corporation poll and 1504 for the Pew poll.  
Surveys about public opinions on controversial social issues are becoming increasingly frequent as topics such as the legalization of marijuana, abortion policy, marriage rights for homosexuals, and immigration policy are hotly debated in the media.  For example, both the [http://i2.cdn.turner.com/cnn/2011/images/04/19/rel6h.pdf Opinion Research Corporation (polling for CNN)] and the [http://www.people-press.org/2011/03/03/section-3-attitudes-toward-social-issues/ Pew Research Center for The People and The Press conducted surveys] of American adults in spring of 2011 to estimate the percentage of the public that favors the legalization of marijuana.  The sample sizes in the two polls were 824 for the Opinion Research Corporation poll and 1504 for the Pew poll.  
===Goals===
===Goals===
-
TBD
+
This activity illustrates the inter-distribution relationships between Cauchy, Student's T and Standard Normal (Gaussian) distributions.
===Hands-on Activity===
===Hands-on Activity===
-
In this activity you may assume that both of these pollsters use similar techniques that involve telephone interviews and weighting the answers given by individuals to align the respondents demographics with population values and finally averaging to produce unbiased and essentially normally distributed estimates.
+
In this activity you may assume that both of these pollsters use similar techniques that involve telephone interviews and weighting the answers given by individuals to align the respondents demographics with population values and finally averaging to produce unbiased and essentially normally distributed estimates. Below are 4 related, but complementary, problems regarding this study.
-
* The Pew poll had almost twice the sample size of the Opinion Research Corporation poll.  What is the chance that it was more accurate than that poll for estimating ''p = the percentage of American adults that favored the legalization of marijuana in spring, 2011''?  Be sure to clearly define how you are interpreting “more accurate.”  Also state and justify any assumptions you make in solving for this probability.
+
: '''Note''': The problems below may be appropriate for an undergraduate course in probability. The [[LearningActivities_CauchyTGaussian#Problem_4:_Accuracy_of_probability_estimates.3F|last part (Problem 4)]] would be more appropriate for masters level course (and should have a General Cauchy distribution tag and a tag for its relationship to the bivariate normal).
-
{{hidden|See a Hint|Decide on the measure of the accuracy of a poll and then evaluate <math>P[\frac{Accuracy_{poll1}}{Accuracy_{poll2}} < 1]</math> to gauge the chance that one poll s more accurate than the other.}}
+
===Specific Problems and their Solutions===
-
* Describe how you would combine the data from these two polls to form a single estimate of <math>p</math>.
+
====Problem 1: Difference in Poll Accuracies?====
 +
The Pew poll had almost twice the sample size of the Opinion Research Corporation poll.  What is the chance that it was more accurate than that poll for estimating ''p = the percentage of American adults that favored the legalization of marijuana in spring, 2011''?  Be sure to clearly define how you are interpreting “more accurate.”  Also state and justify any assumptions you make in solving for this probability.
-
* What is the correlation between your estimate above and the individual estimate produced by the Pew poll?
+
{{hidden|See a Hint|Decide on the measure of the accuracy of a poll and then evaluate \(P[\frac{Accuracy_{poll1} }{Accuracy_{poll2} } < 1]\) to gauge the chance that one poll s more accurate than the other.}}
-
* What is the probability that your combined estimate from part b) is more accurate than the estimate based only on the Pew poll?
+
We want the probability that the Pew estimate based on a sample size of \(n_1=1504\), which comes closer to the true value of \(p\) than the Opinion Research poll based on \(n_2=824\) respondents. 
-
: '''Note''': The questions above may be appropriate for an undergraduate course in probability. The last part would be more appropriate for masters level course (and should have a General Cauchy distribution tag and a tag for its relationship to the bivariate normal).  
+
If \(\hat{p_1}\) = the estimate of \(p\) from Pew, and \(\hat{p_2}\) = the estimate of \(p\) from Opinion Research.
-
===Solutions===
+
{{hidden|See a Solution: Step 1| With this notation, the problem asks for \(P(\vert\hat{p_1}-p\vert < \vert\hat{p_2}-p\vert ) \) }}
-
====Part 1====
+
{{hidden|See a Solution: Step 2| Taking \(X \equiv \hat{p_1}-p\) and \(Y\equiv \hat{p_2}-p\), the problem statement indicates that \(X\sim N(0, \sigma_1^2)\) and \(Y\sim N(0, \sigma_2^2)\).}}
-
We want the probability that the Pew estimate based on a sample size of <math>n_1=1504</math>, which comes closer to the true value of <math>p</math> than the Opinion Research poll based on <math>n_2=824</math> respondents.
+
-
If <math>\hat{p_1}</math> = the estimate of <math>p</math> from Pew, and <math>\hat{p_2}</math> = the estimate of <math>p</math> from Opinion Research.
+
{{hidden|See a Solution: Step 3| Since the two polls use a similar methodology we may assume that the technique produces estimates with variance \(\frac{\sigma^2}{n}\) when a sample size of \(n\) is used (for \(n\) large enough for the normal approximation to be valid as in these cases). }}
-
{{hidden|See a Solution to First Problem: Step 1| With this notation, the problem asks for <math>P[|\hat{p_1}-p| < |\hat{p_1}-p| ]</math>. }}
+
{{hidden|See a Solution: Step 4| Thus, \(\frac{\sigma_1^2}{\sigma_1^2}\equiv \frac{824}{1504}\equiv 0.548\). Finally, we should assume that the estimates from the two polls are stochastically independent, which is reasonable here since they were conducted by two separate companies and the population of U.S. adults is much larger than either sample size.}}
-
{{hidden|See a Solution to First Problem: Step 2| Taking <math>X=\hat{p_1}-p</math> and <math>Y=\hat{p_2}-p</math>, the problem statement indicates that <math>X\sim N(0, \sigma_1^2)</math> and <math>Y\sim N(0, \sigma_2^2)</math>.}}
+
{{hidden|See a Solution: Step 5| Now \(P(\vert X\vert < \vert Y\vert)\equiv P(-1 < \frac{X}{Y} < 1)\equiv F(1)-F(-1)\), where F is the CDF of the [http://socr.ucla.edu/htmls/dist/GeneralCauchy_Distribution.html Cauchy distribution] with scale parameter 0.548 so \(F(u)\equiv 0.5+\frac{1}{\pi}\arctan(\frac{u}{\sqrt{0.548} })\), and the answer is \(F(1)-F(-1)\equiv \frac{1}{\pi}[\arctan(\frac{1}{\sqrt{0.548} }) - \arctan(\frac{-1}{\sqrt{0.548} })]\approx 0.594.\) Many people are surprised by how low this answer is - despite having nearly double the sample size, the chance that the Pew poll is more accurate is only 0.594.}}
-
{{hidden|See a Solution to First Problem: Step 3| Since the two polls use a similar methodology we may assume that the technique produces estimates with variance <math>\frac{\sigma^2}{n}</math> when a sample size of <math>n</math> is used (for <math>n</math> large enough for the normal approximation to be valid as in these cases). }}
+
=====Alternative approaches=====
 +
* Alternative 1: Ratio of bivariate Normal variables.
 +
{{hidden|See a Solution to First Problem: Alternative 1| You may solve \(P(-1 < \frac{X}{Y} < 1)\equiv P(-\frac{\sigma_2}{\sigma_1} < \frac{X/\sigma_1}{Y/\sigma_2} < \frac{\sigma_2}{\sigma_1})\)
 +
\(\equiv P(-\sqrt{0.548}< \frac{Z_1}{Z_2} < \sqrt{0.548})\), where \(Z_1\) and \(Z_2\) are standard normal variables, and their ratio is the standard Cauchy that can then be used to get the correct numerical answer.}}
-
{{hidden|See a Solution to First Problem: Step 4| Thus, <math>\frac{\sigma_1^2}{\sigma_1^2}=\frac{824}{1504}=0.548</math>. Finally, we should assume that the estimates from the two polls are stochastically independent, which is reasonable here since they were conducted by two separate companies and the population of U.S. adults is much larger than either sample size.}}
+
* Alternative 2: Direct calculation of the marginal distribution
 +
{{hidden|See a Solution to First Problem: Alternative 2| A second, more cumbersome, alternative is to work directly to find the marginal distribution of \(\frac{X}{Y}\) from the bivariate normal – see [[LearningActivities_CauchyTGaussian#Generalized_Cauchy_distribution_CDF_derivation|solution to the last part below]] with \(\rho\equiv 0\).}}
-
{{hidden|See a Solution to First Problem: Step 5| Now <math>P(|X| < |Y|)=P(-1 < \frac{X}{Y} < 1)=F(1)-F(-1)</math>, where F is the CDF of the [http://socr.ucla.edu/htmls/dist/GeneralCauchy_Distribution.html Cauchy distribution] with scale parameter 0.548 so <math>F(u)=0.5+\frac{1}{\pi}\arctan(\frac{u}{\sqrt{0.548}})</math>, and the answer is <math>F(1)-F(-1)=\frac{1}{\pi}[\arctan(\frac{1}{\sqrt{0.548}}) - \arctan(\frac{-1}{\sqrt{0.548}})]\approx 0.594.</math> Many people are surprised by how low this answer is - despite having nearly double the sample size, the chance that the Pew poll is more accurate is only 0.594.}}
+
====Problem 2: Pooling Data across Polls?====
 +
Describe how you would combine the data from these two polls to form a single estimate of \(p\).
-
====Alternative approaches====
+
The obvious choice is to propose a linear combination of the two estimates weighting inversely proportional to the variances to get the smallest overall variance amongst such linear combinations.
-
* Alternative 1:
+
-
{{hidden|See a Solution to First Problem: Alternative 1| You may solve <math>P(-1 < \frac{X}{Y} < 1)=P(-\frac{\sigma_2}{\sigma_1} < \frac{X/\sigma_1}{Y/\sigma_2} < \frac{\sigma_2}{\sigma_1})</math>
+
-
<math>= P(-\sqrt{0.548}< \frac{Z_1}{Z_2} < \sqrt{0.548})</math>, where <math>Z_1</math> and <math>Z_2</math> are standard normal variables, and their ratio is the standard Cauchy that can then be used to get the correct numerical answer.}}
+
-
* Alternative 2:
+
{{hidden| See a Solution to Problem 2| \(\hat{p}\equiv \frac{1504}{2328}\hat{p_1}+\frac{824}{2328}\hat{p_2}\). Of course, this is just the estimate that comes from combining the samples into one large sample of \(n\equiv 2328\) and has variance \(\frac{\sigma^2}{2328}\).}}
-
{{hidden|See a Solution to First Problem: Alternative 2| A second, more cumbersome, alternative is to work directly to find the marginal distribution of <math>\frac{X}{Y}</math> from the bivariate normal – see solution to the last part below with <math>\rho=0</math>.}}
+
-
====Part 2====
+
====Problem 3: Are these probability estimates correlated?====
-
The obvious choice is to propose a linear combination of the two estimates weighting inversely proportional to the variances to get the smallest overall variance amongst such linear combinations: 
+
What is the correlation between your estimate above and the individual estimate produced by the Pew poll?
-
: <math>\hat{p}=\frac{1504}{2328}\hat{p_1}+\frac{824}{2328}\hat{p_2}</math>.
+
-
Of course, this is just the estimate that comes from combining the samples into one large sample of <math>n=2328</math> and has variance <math>\frac{\sigma^2}{2328}</math>.
+
-
====Part 3====
+
Note that \(Cov(\hat{p},\hat{p_1})=Cov(\frac{1504}{2328}\hat{p_1}+\frac{824}{2328}\hat{p_2}, \hat{p_1})=\frac{1504}{2328}\sigma_1^2=\frac{\sigma^2}{2328}\).  
-
Note that <math>Cov(\hat{p},\hat{p_1})=Cov(\frac{1504}{2328}\hat{p_1}+\frac{824}{2328}\hat{p_2}, \hat{p_1})=\frac{1504}{2328}\sigma_1^2=\frac{\sigma^2}{2328}</math>. Thus,
+
-
<math>Corr(\hat{p},\hat{p_1})=\frac{\sigma^2/2328}{\sqrt{\sigma^2(\frac{1504}{2328^2}+\frac{824}{2328^2})}\sqrt{\frac{\sigma^2}{1504}}}=\sqrt{\frac{1504}{2328}}=0.8038.</math>.
+
 +
{{hidden| See a Solution to Problem 3| Thus, \( Corr(\hat{p},\hat{p_1})\equiv \frac{\sigma^2/2328} { \sqrt{\sigma^2(\frac{1504}{2328^2}+\frac{824}{2328^2})} \sqrt{\frac{\sigma^2}{1504}  }  } \equiv \sqrt{\frac{1504}{2328} } \equiv 0.8038.\)}}
-
====Part 4====
+
====Problem 4: Accuracy of probability estimates?====
-
d) We want <math>P[|\hat{p}-p| < |\hat{p_1}-p| ]</math>. 
+
What is the probability that your combined estimate (from the second problem) is more accurate than the estimate based only on the Pew poll?
-
Taking <math>X=\hat{p}-p</math> and <math>Y=\hat{p_1}-p</math>, then <math>X\sim N(0, \frac{\sigma^2}{2328})</math>, <math>Y\sim  N(0, \frac{\sigma^2}{1504})</math> and <math>Corr(X,Y)=0.8038</math>.
+
-
Hence, <math>P(|X| < |Y|) = P(-1< \frac{X}{Y} < 1) = G(1) - G(-1)</math>, where G is the CDF of the [http://socr.ucla.edu/htmls/dist/GeneralCauchy_Distribution.html generalized Cauchy distribution] (see below):
+
We want \(P[|\hat{p}-p| < |\hat{p_1}-p| ]\).
-
: <math>G(u) = 0.5 +\frac{1}{\pi}\arctan(\frac{u-\rho\frac{\sigma_X}{\sigma_Y}}{\frac{\sigma_X}{\sigma_Y}\sqrt{1-\rho^2}})</math>.
+
-
:: So the answer is <math>\frac{1}{\pi}(\arctan(\sqrt{\frac{824}{1504}}) -\arctan(\frac{-3832}{\sqrt{1504 \times 824}}))\approx 0.613</math>.
+
{{hidden| See a Solution to Problem 4| Taking \(X\equiv \hat{p}-p\) and \(Y\equiv \hat{p_1}-p\), then \(X\sim N(0, \frac{\sigma^2}{2328})\), \(Y\sim  N(0, \frac{\sigma^2}{1504})\) and \(Corr(X,Y)\equiv 0.8038\). <br/> Hence, \(P(\vert X\vert < \vert Y\vert) \equiv P(-1< \frac{X}{Y} < 1) \equiv G(1) - G(-1)\), where G is the CDF of the [http://socr.ucla.edu/htmls/dist/GeneralCauchy_Distribution.html generalized Cauchy distribution] ([[LearningActivities_CauchyTGaussian#Generalized_Cauchy_distribution_CDF_derivation|see below]]): <br/> \(G(u) \equiv 0.5 +\frac{1}{\pi}\arctan(\frac{u-\rho\frac{\sigma_X}{\sigma_Y} }{\frac{\sigma_X}{\sigma_Y}\sqrt{1-\rho^2} })\). <br/> So, the answer is \(\frac{1}{\pi}(\arctan(\sqrt{\frac{824}{1504} }) -\arctan(\frac{-3832}{\sqrt{1504 \times 824} }))\approx 0.613\).}}
===Generalized Cauchy distribution CDF derivation===
===Generalized Cauchy distribution CDF derivation===
To derive the [http://socr.ucla.edu/htmls/dist/GeneralCauchy_Distribution.html generalized Cauchy distribution] CDF directly, we start with the bivariate normal distribution of ''X'' and ''Y'':
To derive the [http://socr.ucla.edu/htmls/dist/GeneralCauchy_Distribution.html generalized Cauchy distribution] CDF directly, we start with the bivariate normal distribution of ''X'' and ''Y'':
-
: <math>f(x,y) =\frac{1}{\pi\sigma_x\sigma_y\sqrt{1-\rho^2}} exp(-\frac{(x/\sigma_x)^2-2\rho xy/(\sigma_x\sigma_y)+(y/\sigma_y)^2}{2(1-\rho^2)})</math>.
+
{{hidden| See the Derivation of the Cauchy CDF| We start with \(f(x,y) \equiv \frac{1}{\pi\sigma_x\sigma_y\sqrt{1-\rho^2} } exp(-\frac{(x/\sigma_x)^2-2\rho xy/(\sigma_x\sigma_y)+(y/\sigma_y)^2}{2(1-\rho^2)})\). <br/> Then the pdf of \(U\equiv \frac{X}{Y}\) is <br/> \(g(u) \equiv \int_{-\infty}^{\infty}{\vert y\vert f(uy,y)dy}\equiv \) \(\frac{1}{\pi\sigma_x\sigma_y\sqrt{1-\rho^2} } \int_0^{\infty} {yexp(-y^2\frac{(u/\sigma_x)^2-2\rho uy/(\sigma_x\sigma_y)+(1/\sigma_y)^2}{2(1-\rho^2)})dy}\).<br/> Since  \(\int_0^{\infty}{y \exp(-ay^2)dy} \equiv  \frac{1}{2a}\), we find that <br/> \(g(u)\equiv \frac{1}{\pi\sigma_x\sigma_y\sqrt{1-\rho^2} } (\frac{1-\rho^2}{(u/ \sigma_x)^2 - 2\rho u / (\sigma_x\sigma_y) +(1/\sigma_y)^2})\equiv \) \(\equiv \frac{1}{\pi}\frac{\frac{\sigma_x}{\sigma_y}\sqrt{1-\rho^2} }{u^2-2\rho u \frac{\sigma_x}{\sigma_y} (\frac{\sigma_x}{\sigma_y})^2}.\)<br/> Thus, \(g(u)\equiv \frac{1}{\pi}\frac{\frac{\sigma_x}{\sigma_y}\sqrt{1-\rho^2} }{(u-\rho \sigma_x / \sigma_y)^2  +(\sigma_x / \sigma_y)^2(1-\rho^2)}.\) <br/> Finally, we know that \(\int_0^y {\frac{b}{(u-m)^2+b^2}du} \equiv \arctan(\frac{y-m}{b})\). <br/> Therefore, the generalized Cauchy distribution CDF is <br/> \(G(u) \equiv 0.5 +\frac{1}{\pi}\arctan(\frac{u-\rho\frac{\sigma_X}{\sigma_Y} }{\frac{\sigma_X}{\sigma_Y}\sqrt{1-\rho^2} })\).}}
-
: Then the pdf of <math>U=\frac{X}{Y}</math> is:
+
-
:: <math>g(u) = \int_{-\infty}{\infty}{|y|f(uy,y)dy}=
+
-
\frac{1}{\pi\sigma_x\sigma_y\sqrt{1-\rho^2}} \int_0^{\infty} {yexp(-y^2\frac{(u/\sigma_x)^2-2\rho uy/(\sigma_x\sigma_y)+(1/\sigma_y)^2}{2(1-\rho^2)})dy}</math>.
+
-
: Since  <math>\int{0^{\infty}{y exp(-ay^2)dy}=\frac{1}[2a}</math>, we find that
+
-
:: <math>g(u)=\frac{1}{\pi\sigma_x\sigma_y\sqrt{1-\rho^2}} (\frac{1-\rho^2}{(u/ \sigma_x)^2 - 2\rho u / (\sigma_x\sigma_y) +(1/\sigma_y)^2})=</math>
+
-
::: <math>=\frac{1}{\pi}\frac{\frac{\sigma_x}{\sigma_y}\sqrt{1-\rho^2}}{u^2-2\rho u \frac{\sigma_x}{\sigma_y} (\frac{\sigma_x}{\sigma_y})^2}.</math>
+
-
+
-
:: Thus, <math>g(u)=\frac{1}{\pi}\frac{\frac{\sigma_x}{\sigma_y}\sqrt{1-\rho^2}}{(u-\rho \sigma_x / \sigma_y)^2  +(\sigma_x / \sigma_y)^2(1-\rho^2)}.</math>
+
-
:: Finally, we know that <math>\int_0^y {\frac{b}{(u-m)^2+b^2}du} = \arctan(\frac{y-m}{b})</math>
+
-
 
+
-
: Therefore, the generalized Cauchy distribution CDF is:
+
-
:: <math>G(u) = 0.5 +\frac{1}{\pi}\arctan(\frac{u-\rho\frac{\sigma_X}{\sigma_Y}}{\frac{\sigma_X}{\sigma_Y}\sqrt{1-\rho^2}})</math>.
+
===Cauchy, Student's T and Gaussian distribution interrelations===
===Cauchy, Student's T and Gaussian distribution interrelations===
The [http://wiki.stat.ucla.edu/socr/index.php/AP_Statistics_Curriculum_2007_StudentsT Student's T-distribution] represents a one-parameter homotopy path connecting Cauchy and [http://wiki.stat.ucla.edu/socr/index.php/AP_Statistics_Curriculum_2007_Normal_Std Gaussian Distribution]:
The [http://wiki.stat.ucla.edu/socr/index.php/AP_Statistics_Curriculum_2007_StudentsT Student's T-distribution] represents a one-parameter homotopy path connecting Cauchy and [http://wiki.stat.ucla.edu/socr/index.php/AP_Statistics_Curriculum_2007_Normal_Std Gaussian Distribution]:
-
: <math>Cauchy=T_{(df=1)} \longrightarrow T_{(df)}\longrightarrow N(0,1)=T_{(df=\infty)}</math>.
+
: \(Cauchy=T_{(df=1)} \longrightarrow T_{(df)}\longrightarrow N(0,1)=T_{(df=\infty)}\).
: See the [http://www.distributome.org/js/DistributomeNavigator.html Cauchy], [http://www.distributome.org/js/DistributomeNavigator.html Student's T] and [http://socr.ucla.edu/htmls/dist/Normal_Distribution.html Gaussian] distribution calculators.
: See the [http://www.distributome.org/js/DistributomeNavigator.html Cauchy], [http://www.distributome.org/js/DistributomeNavigator.html Student's T] and [http://socr.ucla.edu/htmls/dist/Normal_Distribution.html Gaussian] distribution calculators.
: Explore the relations between these and many other probability distributions using the [http://www.distributome.org/js/DistributomeNavigator.html interactive graphical Distributome Navigator].
: Explore the relations between these and many other probability distributions using the [http://www.distributome.org/js/DistributomeNavigator.html interactive graphical Distributome Navigator].
===Conclusions===
===Conclusions===
-
TBD
+
Increasing the sample size may help significantly in certain situations - but not as much as intuition often suggests.
 +
 
 +
===Short link===
 +
http://ucla.in/ua1x3z
{{translate|pageName=http://wiki.stat.ucla.edu/distributome/index.php?title=LearningActivities_CauchyTGaussian}}
{{translate|pageName=http://wiki.stat.ucla.edu/distributome/index.php?title=LearningActivities_CauchyTGaussian}}

Current revision as of 21:51, 17 November 2011

Contents

Distributome Learning Activities - Distributome Activity on the relations between Cauchy, Student's T and Gaussian Distributions

Introduction

The relation between Cauchy and Gaussian distributions

Surveys about public opinions on controversial social issues are becoming increasingly frequent as topics such as the legalization of marijuana, abortion policy, marriage rights for homosexuals, and immigration policy are hotly debated in the media. For example, both the Opinion Research Corporation (polling for CNN) and the Pew Research Center for The People and The Press conducted surveys of American adults in spring of 2011 to estimate the percentage of the public that favors the legalization of marijuana. The sample sizes in the two polls were 824 for the Opinion Research Corporation poll and 1504 for the Pew poll.

Goals

This activity illustrates the inter-distribution relationships between Cauchy, Student's T and Standard Normal (Gaussian) distributions.

Hands-on Activity

In this activity you may assume that both of these pollsters use similar techniques that involve telephone interviews and weighting the answers given by individuals to align the respondents demographics with population values and finally averaging to produce unbiased and essentially normally distributed estimates. Below are 4 related, but complementary, problems regarding this study.

Note: The problems below may be appropriate for an undergraduate course in probability. The last part (Problem 4) would be more appropriate for masters level course (and should have a General Cauchy distribution tag and a tag for its relationship to the bivariate normal).

Specific Problems and their Solutions

Problem 1: Difference in Poll Accuracies?

The Pew poll had almost twice the sample size of the Opinion Research Corporation poll. What is the chance that it was more accurate than that poll for estimating p = the percentage of American adults that favored the legalization of marijuana in spring, 2011? Be sure to clearly define how you are interpreting “more accurate.” Also state and justify any assumptions you make in solving for this probability.


We want the probability that the Pew estimate based on a sample size of \(n_1=1504\), which comes closer to the true value of \(p\) than the Opinion Research poll based on \(n_2=824\) respondents.

If \(\hat{p_1}\) = the estimate of \(p\) from Pew, and \(\hat{p_2}\) = the estimate of \(p\) from Opinion Research.






Alternative approaches
  • Alternative 1: Ratio of bivariate Normal variables.


  • Alternative 2: Direct calculation of the marginal distribution


Problem 2: Pooling Data across Polls?

Describe how you would combine the data from these two polls to form a single estimate of \(p\).

The obvious choice is to propose a linear combination of the two estimates weighting inversely proportional to the variances to get the smallest overall variance amongst such linear combinations.


Problem 3: Are these probability estimates correlated?

What is the correlation between your estimate above and the individual estimate produced by the Pew poll?

Note that \(Cov(\hat{p},\hat{p_1})=Cov(\frac{1504}{2328}\hat{p_1}+\frac{824}{2328}\hat{p_2}, \hat{p_1})=\frac{1504}{2328}\sigma_1^2=\frac{\sigma^2}{2328}\).


Problem 4: Accuracy of probability estimates?

What is the probability that your combined estimate (from the second problem) is more accurate than the estimate based only on the Pew poll?

We want \(P[|\hat{p}-p| < |\hat{p_1}-p| ]\).


Generalized Cauchy distribution CDF derivation

To derive the generalized Cauchy distribution CDF directly, we start with the bivariate normal distribution of X and Y:


Cauchy, Student's T and Gaussian distribution interrelations

The Student's T-distribution represents a one-parameter homotopy path connecting Cauchy and Gaussian Distribution:

\(Cauchy=T_{(df=1)} \longrightarrow T_{(df)}\longrightarrow N(0,1)=T_{(df=\infty)}\).
See the Cauchy, Student's T and Gaussian distribution calculators.
Explore the relations between these and many other probability distributions using the interactive graphical Distributome Navigator.

Conclusions

Increasing the sample size may help significantly in certain situations - but not as much as intuition often suggests.

Short link

http://ucla.in/ua1x3z




Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools