LearningActivities CauchyTGaussian

From Socr

Revision as of 00:27, 25 October 2011 by IvoDinov (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Distributome Learning Activities - Distributome Activity on the relations between Cauchy, Students' T and Gaussian Distributions

Overview

Crime Trends

Surveys about public opinions on controversial social issues are becoming increasingly frequent as topics such as the legalization of marijuana, abortion policy, marriage rights for homosexuals, and immigration policy are hotly debated in the media. For example, both the Opinion Research Corporation (polling for CNN) and the Pew Research Center for The People and The Press conducted surveys of American adults in spring of 2011 to estimate the percentage of the public that favors the legalization of marijuana. The sample sizes in the two polls were 824 for the Opinion Research Corporation poll and 1504 for the Pew poll. This activity illustrates the inter-distribution relationships between Cauchy, Students' T and Standard Normal (Gaussian) distributions.

Goals

Hands-on Activity

<example id = “sample size and accuracy”> <type>real</type> <data>none</data> <distribution id = “Cauchy”> inter-distribution relationship: from Standard normal to Standard Cauchy </distribution>

Surveys about public opinions on controversial social issues are becoming increasingly frequent as topics such as the legalization of marijuana, abortion policy, marriage rights for homosexuals, and immigration policy are hotly debated in the media.   For example, both the Opinion Research Corporation (polling for CNN) and the Pew Research Center for The People and The Press conducted surveys of American adults in spring of 2011 to estimate the percentage of the public that favors the legalization of marijuana.  The sample sizes in the two polls were 824 for the Opinion Research Corporation poll and 1504 for the Pew poll. In this problem you may assume that both of these pollsters use similar techniques that involve telephone interviews and weighting the answers given by individuals to align the respondents demographics with population values and finally averaging to produce unbiased and essentially normally distributed estimates.  

a) The Pew poll had almost twice the sample size of the Opinion Research Corporation poll. What is the chance that it was more accurate than that poll for estimating p = the percentage of American adults that favored the legalization of marijuana in spring, 2011? Be sure to clearly define how you are interpreting “more accurate.” Also state and justify any assumptions you make in solving for this probability. <hint> Decide on the measure the accuracy of a poll and then evaluate to gauge the chance that one poll s more accurate than the other.</hint>

b) Describe how you would combine the data from these two polls to form a single estimate of p.

c) What is the correlation between your estimate from part b) and the individual estimate produced by the Pew poll?

d) What is the probability that your combined estimate from part b) is more accurate than the estimate based only on the Pew poll?

note – parts a to c are appropriate for an undergraduate course in probability; part d would be more appropriate for masters level (and should have a General Cauchy distribution tag and a tag for its relationship to the bivariate normal).

Solutions:

a) We want the probability that the Pew estimate based on a sample size of n1=1504 comes closer to the true value of p than the Opinion Research poll based on n2=824 respondents. Thus if = the estimate of p from Pew and = the estimate of p from Opinion Research, then the problem asks for Taking and , the problem statement indicates that Also, since the two polls use a similar methodology we may assume that the technique produces estimates with variance when a sample size of n is used (for n large enough for the normal approximation to be valid as in these cases). Thus, Finally, we should assume that the estimates from the two polls are stochastically independent, which is reasonable here since they were conducted by two separate companies and the population of U.S. adults is much larger than either sample size. Now where F is the cdf of the Cauchy distribution with scale parameter 0.548 so and the answer is Many people are surprised by how low this answer is - despite having nearly double the sample size, the chance that the Pew poll is more accurate is only 0.594.

Alternatively, you may solve where Z1 and Z2 are standard normal so the ratio is the standard Cauchy that can then be used to get the correct numerical answer.

A second, more cumbersome, alternative is to work directly to find the marginal distribution of X/Y from the bivariate normal – see solution to part d below with .

b) The obvious choice is to propose a linear combination of the two estimates weighting inversely proportional to the variances to get the smallest overall variance amongst such linear combinations: Of course, this is just the estimate that comes from combining the samples into one large sample of n=2328 and has variance .

c) So


d) We want Taking and , then Hence,

where G is the cdf of the generalized Cauchy distribution (derived below)   So the answer is
≈0.613.   

To derive this result directly, we start with the bivariate normal distribution of X and Y:

, then the pdf of	

U=X/Y is

= 

but since , we find that


or

and since
 we get the cdf above.

Conclusions



Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige