SOCR EduMaterials Activities LawOfLargeNumbers

From Socr

(Difference between revisions)
Jump to: navigation, search
 
(41 intermediate revisions not shown)
Line 1: Line 1:
== [[SOCR_EduMaterials_Activities | SOCR Educational Materials - Activities]] - SOCR Law of Large Numbers Activity ==
== [[SOCR_EduMaterials_Activities | SOCR Educational Materials - Activities]] - SOCR Law of Large Numbers Activity ==
-
== This is a heterogeneous Activity that demonstrates the Law of large Numbers (LNN) ==
+
== Overview==
 +
This is part I of a heterogeneous activity that demonstrates the theory and applications of the Law of Large Numbers (LLN). [[SOCR_EduMaterials_Activities_LawOfLargeNumbers2 | Part II]] and [[SOCR_EduMaterials_Activities_LawOfLargeNumbersExperiment | Part III]] of this activity contain more examples and diverse experiments. The [http://socr.ucla.edu/htmls/exp/LLN_Simple_Experiment.html SOCR LLN applet is available here].
-
==== Example ====
+
===Goals of the SOCR LLN activity===
 +
The goals of this activity are to:
 +
* illustrate the theoretical meaning and practical implications of the LLN;
 +
* present the LLN in varieties of situations;
 +
* provide empirical evidence in support of the LLN-convergence and dispel the common LLN misconceptions.
 +
 
 +
===Example===
The average weight of 10 students from a class of 100 students is most likely closer to the ''real average'' weight of all 100 students, compared to the average weight of 3 randomly chosen students from that same class. This is because the sample of 10 is a ''larger number'' than the sample of only 3 and better represents the entire class. At the extreme, a sample of 99 of the 100 students will produce a sample average almost exactly the same as the average for all 100 students. On the other extreme, sampling a single student will be an extremely variant estimate of the overall class average weight.
The average weight of 10 students from a class of 100 students is most likely closer to the ''real average'' weight of all 100 students, compared to the average weight of 3 randomly chosen students from that same class. This is because the sample of 10 is a ''larger number'' than the sample of only 3 and better represents the entire class. At the extreme, a sample of 99 of the 100 students will produce a sample average almost exactly the same as the average for all 100 students. On the other extreme, sampling a single student will be an extremely variant estimate of the overall class average weight.
-
==== Statement of the Law of Large Numbers ====
+
 
 +
===Statement of the Law of Large Numbers===
If an event of probability p is observed repeatedly during '''independent repetitions''', the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.
If an event of probability p is observed repeatedly during '''independent repetitions''', the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.
-
==== Complete details about the LLN can be found [http://en.wikipedia.org/wiki/Law_of_large_numbers here] ====
+
===The theory behind the LLN===
 +
Complete details about the ''weak'' and ''strong'' laws of large numbers may be found [http://en.wikipedia.org/wiki/Law_of_large_numbers here].
-
== SOCR Demonstrations of the LLN ==
+
== Exercise 1==
 +
This exercise illustrates the statement and validity of the LLN in the situation of tossing (biased or fair) coins repeatedly. Suppose we let H and T denote Heads and Tails, the probabilities of observing a Head or a Tail at each trial are <math>0<p<1</math> and <math>0<1-p<1</math>, respectfully. The sample space of this experiment consists of sequences of H's and Ts. For example, an outcome may be <math>\{H, H, T, H, H, T, T, T, ....\}</math>. If we toss a coin n times, the size of the sample-space is <math>2^n</math>, as the coin tosses are independent. [[About_pages_for_SOCR_Distributions | Binomial Distribution]] governs the probability of observing <math>0\le k\le n</math> Heads in <math>n</math> experiments, which is evaluated by the binomial density at <math>k</math>.
-
* '''Exercise 1''': Go to the [http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Experiments] and select the [[About_pages_for_SOCR_Experiments | Binomial Coin Experiment]]. Select the number of coints ('''n=3''') and probability of heads ('''p=0.5'''). Notice the blue model distribution of the '''Number of Heads (X)''', in the right panel. Try varying the probability (p) and/or the number of coins (n) and see how these parameters affect the shape of this distribution. Can you make sense of it? For example, if p increases, why does the distribution move to the right and become concentrated at the right end (i.e., left-skewed)? And vice-versa, if you decrease the probability of a head, the distribution will become skewed to the right and centered in the left end of the range of X (<math>0\le X\le n</math>).
+
In this case we will be interested in two random variables associated with this process. The first variable will be the ''proportion of Heads'' and the second will be the ''differences of the number of Heads and Tails''. This will empirically demonstrate the LLN and its most common misconceptions (presented below). Point your browser to the [http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Experiments] and select the '''Coin Toss LLN Experiment''' from the drop-down list of experiments in the top-left panel. This applet consists of a control toolbar on the top followed by a graph panel in the middle and a results table at the bottom. Use the toolbar to flip coins one at a time, 10, 100, 1,000 at a time or continuously! The toolbar also allows you to stop or reset an experiment and select the probability of Heads ('''p''') using the slider. The graph panel in the middle will dynamically plot the values of the two variables of interest (''proportion of heads'' and ''difference of Heads and Tails''). The outcome table at the bottom presents the summaries of all trials of this experiment. From this table, you can copy and paste the summary for further processing using other computational resources (e.g., [http://socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] or [http://office.microsoft.com/excel MS Excel]).  
-
<center>[[Image:SOCR_Activities_LLN_Dinov_121406_Fig1.jpg|300px]]</center>
+
* '''Note:''' We report the normalized differences of the number of Heads minus the number of Tails in the graph and result table. Let <math>H</math> and <math>T</math> are the number of Heads and Tails, up to the current trial (<math>k</math>), respectively. Then we define the normalized difference <math>| H - T|</math> = <math>p+ ((1-p)H-pT )/(2/3 \times Max_k)</math>, where <math>Max_k = \max_{1 \le i \le k}{||H-T||_i}</math> and <math>||H-T||_i</math> is the maximum difference of Heads and Tails up to the <math>i^{th}</math> trial. Observe that the expectation of the normalized difference <math>E(|H-T|)=p</math>, since <math>E((1-p)H-pT)=0</math>. This ensures that the normalized differences oscillate around the chosen <math>p</math> (the LLN limit of the proportion of Heads) and they are visible within the graph window.
-
+
 
-
Let us toss three coins 10 times (by clicking 10 times on the RUN button of the applet on the top). We observe the sampling distribution of '''X''', how many times did we observe 0, 1, 2 or 3 heads in the 10 experiments (each experiment involves tossing 3 coins independently) in red color superimposed to the theoretical (exact) distribution of '''X''', in blue. The four panels in the middle of the Binomial Coin Applet show:
+
<center>[[Image:SOCR_Activities_LLN_Dinov_022007_Fig1.jpg|400px]]</center>
-
{| align="center" border="1"
+
 
-
| Coin Box Panel, where all coin tosses are shown
+
Now, select '''n=100''' and '''p=0.5'''. The figure below shows a snapshot of the applet. Remember that each time you run the applet the random samples will be different and the figures and results will generally vary. Click on the '''Run''' or '''Step''' buttons to perform the experiment and observe the ''proportion of heads'' and ''differences'' evolve over time. Choosing '''Continuous''' from the number of experiments drop-down list in the tool bar will run the experiment in a continuous mode (use the '''Stop''' button to terminate the experiment in this case). The statement of the LLN in this experiment is simply that '''as the number of experiments increases the sample proportion of Heads (red curve) will approach the theoretical (user preset) value of p (in this case ''p=0.5'')'''. Try to change the value of '''p''' and run the experiment interactively several times. Notice the behavior of the graphs of the two variables we study. Try to pose and answer questions like these:
-
| The theoretical (blue) and sampling (observed, red) distributions of the Number of Heads in the series of 3-coin-toss experiments ('''X''')
+
* If we set '''p=0.4''', how large of a sample-size is needed to ensure that the sample-proportion stays within [0.4; 0.6]?
 +
* What is the behavior of the curve representing the differences of Heads and Tails (red curve)?
 +
* Is the convergence of the sample-proportion to the theoretical proportion (that we preset) dependent on p?
 +
* Remember that the more experiments you run the closer the theoretical and sample proportions will be (by LLN). Go in '''Continuous run mode''' and watch the convergence of the sample proportion to <math>p</math>. Can you explain in words, why can't we expect the second variable of interest (the differences of Heads and Tails) to converge? [[Image:SOCR_Activities_LLN_Dinov_022007_Fig2.jpg|200px]]
 +
 
 +
==Exercise 2==
 +
The second SOCR demonstration of the law of large numbers will be quite different and practically useful. Here we show how the LLN implies practical algorithms for estimation of [http://en.wikipedia.org/wiki/Transcendental_number transcendental numbers]. The two most popular transcendental numbers are [http://en.wikipedia.org/wiki/Pi <math>\pi</math>] and [http://en.wikipedia.org/wiki/E_%28mathematical_constant%29 ''e''].
 +
 
 +
===Estimating ''e'' using SOCR simulation===
 +
 
 +
The [[SOCR_EduMaterials_Activities_Uniform_E_EstimateExperiment | SOCR E-Estimate Experiment]] provides the complete details of this simulation. In a nutshell, we can estimate the value of the [http://en.wikipedia.org/wiki/E_%28mathematical_constant%29 natural number e] using random sampling from Uniform distribution. Suppose <math>X_1, X_2, ..., X_n</math> are drawn from [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html uniform distribution on (0, 1)] and define <math>U= {\operatorname{argmin}}_n { \left (X_1+X_2+...+X_n > 1 \right )}</math>, note that all <math>X_i \ge 0</math>.
 +
 
 +
Now, the expected value <math>E(U) = e \approx 2.7182</math>. Therefore, by LLN, taking averages of <math>\left \{ U_1, U_2, U_3, ..., U_k \right \}</math> values, each computed from random samples <math>X_1, X_2, ..., X_n \sim U(0,1)</math> as described above, will provide a more accurate estimate (as <math>k \rightarrow \infty</math>) of the natural number ''e''.
 +
 
 +
The '''Uniform E-Estimate Experiment''', part of [http://www.socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Experiments], provides a hands-on demonstration of how the LLN facilitates stochastic simulation-based estimation of ''e''.
 +
 
 +
<center>[[Image:SOCR_Activities_Uniform_E_EstimateExperiment_Dinov_121907_Fig1.jpg|400px]]</center>
 +
 
 +
===Estimating <math>\pi</math> using SOCR simulation===
 +
 
 +
Similarly, one may approximate the transcendental number <math>\pi</math>, using the [[SOCR_EduMaterials_Activities_BuffonNeedleExperiment#Buffon.27s_needle_experiment_and_estimation_of_the_constant_.CF.80 | SOCR Buffon’s Needle Experiment]]. Here, the LLN again provides the foundation for a better approximation of <math>\pi</math> by virtually dropping  needles (many times) on a tiled surface and observing if the needle crosses a tile grid-line. For a tile grid of size 1, the odds of a needle-line intersection are <math>{ 2 \over \pi} \approx 0.63662</math>. In practice, to estimate <math>\pi</math> from a number of needle drops (N), we take the reciprocal of the sample odds-of-intersection.
 +
 
 +
==Experiment 3==
 +
Suppose we row 10 loaded hexagonal (6-face) dice 8 times and we are interested in the probability of observing the event A={3 ones, 3 twos, 2 threes, and 2 fours}. Assume the dice are loaded to the small outcomes according to the following probabilities of the 6 outcomes (''one'' is the most likely and ''six'' is the least likely outcome).
 +
<center>
 +
{| class="wikitable" style="text-align:center; width:75%" border="1"
|-
|-
-
| Summary statistics table that includes columns for the index of each Run, the Number of Heads and the Proportion of heads in each experiment
+
| ''x'' || 1 || 2 || 3 || 4 || 5 || 6
-
| Numerical comparisons of the Theoretical and Sampling distribution (<math>0\le X\le n</math>) and two statistics (mean, SD)
+
|-
 +
| ''P(X=x)'' || 0.286 || 0.238 || 0.19 || 0.143 || 0.095 || 0.048
|}
|}
 +
</center>
 +
 +
: ''P(A)=?''
 +
 +
Of course, we can compute this number exactly as:
 +
 +
: <math>P(A) = {10! \over 3!\times 3! \times 2! \times 2! } \times 0.286^3 \times 0.238^3\times 0.19^2 \times 0.143^2 = 0.00586690138260962656816896.</math>
 +
 +
However, we can also find a pretty close empirically-driven estimate using the [[SOCR_EduMaterials_Activities_DiceExperiment | SOCR Dice Experiment]].
 +
 +
For instance, running the [http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Dice Experiment] 1,000 times with number of dice n=10, and the loading probabilities listed above, we get an output like the one shown below.
 +
 +
<center>[[Image:SOCR_EBook_Dinov_Multinimial_030508_Fig1.jpg|500px]]</center>
-
Now take a snapshot of these results or store these summaries in the tables on the bottom.  
+
Now, we can actually count how many of these 1,000 trials generated the event ''A'' as an outcome. In one such experiment of 1,000 trials, there were 8 outcomes of the type {3 ones, 3 twos, 2 threes and 2 fours}. Therefore, the relative proportion of these outcomes to 1,000 will give us a fairly accurate estimate of the exact probability we computed above
-
<center>[[Image:SOCR_Activities_LLN_Dinov_121406_Fig2.jpg|300px]]</center>
+
: <math>P(A) \approx {8 \over 1,000}=0.008</math>.
-
According to the LLN, if we were to increase the number of coins we tossed at each experiments, say from '''n=3''' to '''n=9''', we need to get a better fit between theoretical and sampling distributions. Is this the case? Are the sample and theoretical (Binomial) probabilities less or more similar now ('''n=9'''), compared to the values we got when '''n=3'''?
+
Note that that this approximation is close to the exact answer above. By the Law of Large Numbers, we know that this SOCR empirical approximation to the exact multinomial probability of interest will significantly improve as we increase the number of trials in this experiment to 10,000.
-
<center>[[Image:SOCR_Activities_LLN_Dinov_121406_Fig3.jpg|300px]]</center>
+
-
Of course, we are doing random sampling, so nothings is guaranteed, unless we ran a large number of coin tosses (say > 50) which you can do by setting n=50 and pressing the Run button. How close to the theoretical p is now the empirical sample proportion of Heads (Column M)? These should be very close.
+
== Hands-on activities==
 +
The following practice problems will help students experiment with the SOCR LLN activity and understand the meaning, ramifications and limitations of the LLN.
-
* '''Common Misconceptions regarding the LNN''':
+
* Run the [http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Coin Toss LLN Experiment] twice with stop=100 and p=0.5. This corresponds to flipping a fair coin 100 times and observing the behavior of the proportion of heads across (discrete) time.
-
** '''Misconception 1''': If we observe a streak of 10 consecutive heads (when p=0.5, say) the odds of the <math>11^{th}</math> trial being a Head is > p! This is of course, incorrect, as the coin tosses are independent trials (an example of a ''memoryless'' process).
+
** What will be different in the outcomes of the 2 experiments?
-
** '''Misconception 2''': If run large number of coin tosses, the '''number of heads''' and '''number of tails''' become more and more equal. This is incorrect, as the LLN only guarantees that the sample proportion of heads will converge to the true population proportion (the p parameter that we selected). In fact, the difference |Heads - Tails| diverges!
+
** What properties of the 2 outcomes will be very similar?
 +
** If we did this 10 times, what is expected to vary and what may be predicted accurately?
 +
* Use the [http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Uniform ''e''-Estimate Experiment] to obtain stochastic estimates of the natural number <math>e \approx  2.7182</math>.
 +
** Try to explain in words, and support your argument with data/results from this simulation, why is the expected value of the variable ''U'' (defined above) equal to ''e'', <math>E(U)=e</math>.
 +
** How does the LLN come into play in this experiment?
 +
** How would you go about in practice if you had to estimate <math>e^2 \approx 7.38861124</math> ?
 +
** Similarly, try to estimate <math>\pi \approx 3.141592</math> and <math>\pi^2 \approx 9.8696044</math> using the [[SOCR_EduMaterials_Activities_BuffonNeedleExperiment#Buffon.27s_needle_experiment_and_estimation_of_the_constant_.CF.80 | SOCR Buffon’s Needle Experiment]].
-
* '''Exercise 2''': Much like we did above with coin tosses, one can see the action of the LLN in a variety of situations where one samples and looks for consistency of probabilities of various events (theoretically vs. empirically). Such examples may include [[SOCR_EduMaterials_Activities_CardsCoinsSampling | Cards and Coins Experiments]], [[SOCR_EduMaterials_Activities_DiceExperiment | Dice Experiments]], etc.
+
* Run the [http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Roulette Experiment] and bet on 1-18 (out of the 38 possible numbers/outcomes).
 +
** What is the probability of success (''p'')?
 +
** What does the LLN imply about p and repeated runs of this experiment?
 +
** Run this experiment 3 times. What is the sample estimate of p (<math>\hat{p}</math>)? What is the difference <math>p-\hat{p}</math>? Would this difference change if we ran the experiment 10 or 100 times? How?
 +
** In 100 Roulette experiments, what can you say about the difference of the number of successes (outcome in 1-18) and the number of failures? How about the proportion of successes?
-
Let's try the [[About_pages_for_SOCR_Experiments | Ball and Urn Experiment]]. Go to the [http://socr.ucla.edu/htmls/SOCR_Experiments.html  SOCR Experiments] and select the this experiment in the drop-down list. Select N (population size), R (number of Red balls <=N) and n (sample-size, number of balls to draw from the urn. Note that you can sample with (Binomial) or without (Hypergeometric) replacement). Notice the blue model distribution of '''Y''', the '''Number of Red balls in the sample of n balls''', in the right panel. Again, in Red we see the sampling distribution of '''Y''', as we do this experiment repeatedly. The probability of drawing a Red ball will depend on whether we replace the balls and the proportion of Red balls in the urn. For example, if R increases, the distribution moves to the right and become concentrated at the right end (i.e., left-skewed). Analogously, if you decrease R, the distribution will become skewed to the right and centered in the left end of the range of '''Y''' (<math>0\le Y\le R</math>).
+
== Other SOCR LLN Activities==
 +
* [[SOCR_EduMaterials_Activities_LawOfLargeNumbers2 | Part II of this activity]]
 +
* [[SOCR_EduMaterials_Activities_LawOfLargeNumbersExperiment | Part III of this activity]]
-
<center>[[Image:SOCR_Activities_LLN_Dinov_121406_Fig4.jpg|300px]]</center>
+
== Common Misconceptions regarding the LLN==
 +
* '''Misconception 1''': If we observe a streak of 10 consecutive heads (when p=0.5, say) the odds of the <math>11^{th}</math> trial being a Head is > p! This is of course, incorrect, as the coin tosses are independent trials (an example of a ''memoryless'' process).
 +
* '''Misconception 2''': If run large number of coin tosses, the '''number of heads''' and '''number of tails''' become more and more equal. This is incorrect, as the LLN only guarantees that the sample proportion of heads will converge to the true population proportion (the p parameter that we selected). In fact, the difference |Heads - Tails| diverges!
-
Try repeating what we did in the [[SOCR_EduMaterials_Activities_LawOfLargeNumbers#SOCR_Demonstrations_of_the_LLN | Coin Toss Exercise]] above and see the effects of the LLN in this situation (with respect to the sample size n).  
+
==References==
 +
* Dinov, ID., Christou, N., Gould, R [http://www.amstat.org/publications/jse/v17n1/dinov.html Law of Large Numbers: the Theory, Applications and Technology-based Education]. [http://www.amstat.org/publications/jse JSE], [http://www.amstat.org/publications/jse/ Vol. 17, No. 1, 1-15, 2009].
<hr>
<hr>

Current revision as of 22:27, 16 January 2009

Contents

SOCR Educational Materials - Activities - SOCR Law of Large Numbers Activity

Overview

This is part I of a heterogeneous activity that demonstrates the theory and applications of the Law of Large Numbers (LLN). Part II and Part III of this activity contain more examples and diverse experiments. The SOCR LLN applet is available here.

Goals of the SOCR LLN activity

The goals of this activity are to:

  • illustrate the theoretical meaning and practical implications of the LLN;
  • present the LLN in varieties of situations;
  • provide empirical evidence in support of the LLN-convergence and dispel the common LLN misconceptions.

Example

The average weight of 10 students from a class of 100 students is most likely closer to the real average weight of all 100 students, compared to the average weight of 3 randomly chosen students from that same class. This is because the sample of 10 is a larger number than the sample of only 3 and better represents the entire class. At the extreme, a sample of 99 of the 100 students will produce a sample average almost exactly the same as the average for all 100 students. On the other extreme, sampling a single student will be an extremely variant estimate of the overall class average weight.

Statement of the Law of Large Numbers

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

The theory behind the LLN

Complete details about the weak and strong laws of large numbers may be found here.

Exercise 1

This exercise illustrates the statement and validity of the LLN in the situation of tossing (biased or fair) coins repeatedly. Suppose we let H and T denote Heads and Tails, the probabilities of observing a Head or a Tail at each trial are 0 < p < 1 and 0 < 1 − p < 1, respectfully. The sample space of this experiment consists of sequences of H's and Ts. For example, an outcome may be {H,H,T,H,H,T,T,T,....}. If we toss a coin n times, the size of the sample-space is 2n, as the coin tosses are independent. Binomial Distribution governs the probability of observing 0\le k\le n Heads in n experiments, which is evaluated by the binomial density at k.

In this case we will be interested in two random variables associated with this process. The first variable will be the proportion of Heads and the second will be the differences of the number of Heads and Tails. This will empirically demonstrate the LLN and its most common misconceptions (presented below). Point your browser to the SOCR Experiments and select the Coin Toss LLN Experiment from the drop-down list of experiments in the top-left panel. This applet consists of a control toolbar on the top followed by a graph panel in the middle and a results table at the bottom. Use the toolbar to flip coins one at a time, 10, 100, 1,000 at a time or continuously! The toolbar also allows you to stop or reset an experiment and select the probability of Heads (p) using the slider. The graph panel in the middle will dynamically plot the values of the two variables of interest (proportion of heads and difference of Heads and Tails). The outcome table at the bottom presents the summaries of all trials of this experiment. From this table, you can copy and paste the summary for further processing using other computational resources (e.g., SOCR Modeler or MS Excel).

  • Note: We report the normalized differences of the number of Heads minus the number of Tails in the graph and result table. Let H and T are the number of Heads and Tails, up to the current trial (k), respectively. Then we define the normalized difference | HT | = p+ ((1-p)H-pT )/(2/3 \times Max_k), where Max_k = \max_{1 \le i \le k}{||H-T||_i} and | | HT | | i is the maximum difference of Heads and Tails up to the ith trial. Observe that the expectation of the normalized difference E( | HT | ) = p, since E((1 − p)HpT) = 0. This ensures that the normalized differences oscillate around the chosen p (the LLN limit of the proportion of Heads) and they are visible within the graph window.

Now, select n=100 and p=0.5. The figure below shows a snapshot of the applet. Remember that each time you run the applet the random samples will be different and the figures and results will generally vary. Click on the Run or Step buttons to perform the experiment and observe the proportion of heads and differences evolve over time. Choosing Continuous from the number of experiments drop-down list in the tool bar will run the experiment in a continuous mode (use the Stop button to terminate the experiment in this case). The statement of the LLN in this experiment is simply that as the number of experiments increases the sample proportion of Heads (red curve) will approach the theoretical (user preset) value of p (in this case p=0.5). Try to change the value of p and run the experiment interactively several times. Notice the behavior of the graphs of the two variables we study. Try to pose and answer questions like these:

  • If we set p=0.4, how large of a sample-size is needed to ensure that the sample-proportion stays within [0.4; 0.6]?
  • What is the behavior of the curve representing the differences of Heads and Tails (red curve)?
  • Is the convergence of the sample-proportion to the theoretical proportion (that we preset) dependent on p?
  • Remember that the more experiments you run the closer the theoretical and sample proportions will be (by LLN). Go in Continuous run mode and watch the convergence of the sample proportion to p. Can you explain in words, why can't we expect the second variable of interest (the differences of Heads and Tails) to converge?

Exercise 2

The second SOCR demonstration of the law of large numbers will be quite different and practically useful. Here we show how the LLN implies practical algorithms for estimation of transcendental numbers. The two most popular transcendental numbers are π and e.

Estimating e using SOCR simulation

The SOCR E-Estimate Experiment provides the complete details of this simulation. In a nutshell, we can estimate the value of the natural number e using random sampling from Uniform distribution. Suppose X1,X2,...,Xn are drawn from uniform distribution on (0, 1) and define U= {\operatorname{argmin}}_n { \left (X_1+X_2+...+X_n > 1 \right )}, note that all X_i \ge 0.

Now, the expected value E(U) = e \approx 2.7182. Therefore, by LLN, taking averages of \left \{ U_1, U_2, U_3, ..., U_k \right \} values, each computed from random samples X_1, X_2, ..., X_n \sim U(0,1) as described above, will provide a more accurate estimate (as k \rightarrow \infty) of the natural number e.

The Uniform E-Estimate Experiment, part of SOCR Experiments, provides a hands-on demonstration of how the LLN facilitates stochastic simulation-based estimation of e.

Estimating π using SOCR simulation

Similarly, one may approximate the transcendental number π, using the SOCR Buffon’s Needle Experiment. Here, the LLN again provides the foundation for a better approximation of π by virtually dropping needles (many times) on a tiled surface and observing if the needle crosses a tile grid-line. For a tile grid of size 1, the odds of a needle-line intersection are { 2 \over \pi} \approx 0.63662. In practice, to estimate π from a number of needle drops (N), we take the reciprocal of the sample odds-of-intersection.

Experiment 3

Suppose we row 10 loaded hexagonal (6-face) dice 8 times and we are interested in the probability of observing the event A={3 ones, 3 twos, 2 threes, and 2 fours}. Assume the dice are loaded to the small outcomes according to the following probabilities of the 6 outcomes (one is the most likely and six is the least likely outcome).

x 1 2 3 4 5 6
P(X=x) 0.286 0.238 0.19 0.143 0.095 0.048
P(A)=?

Of course, we can compute this number exactly as:

P(A) = {10! \over 3!\times 3! \times 2! \times 2! } \times 0.286^3 \times 0.238^3\times 0.19^2 \times 0.143^2 = 0.00586690138260962656816896.

However, we can also find a pretty close empirically-driven estimate using the SOCR Dice Experiment.

For instance, running the SOCR Dice Experiment 1,000 times with number of dice n=10, and the loading probabilities listed above, we get an output like the one shown below.

Now, we can actually count how many of these 1,000 trials generated the event A as an outcome. In one such experiment of 1,000 trials, there were 8 outcomes of the type {3 ones, 3 twos, 2 threes and 2 fours}. Therefore, the relative proportion of these outcomes to 1,000 will give us a fairly accurate estimate of the exact probability we computed above

P(A) \approx {8 \over 1,000}=0.008.

Note that that this approximation is close to the exact answer above. By the Law of Large Numbers, we know that this SOCR empirical approximation to the exact multinomial probability of interest will significantly improve as we increase the number of trials in this experiment to 10,000.

Hands-on activities

The following practice problems will help students experiment with the SOCR LLN activity and understand the meaning, ramifications and limitations of the LLN.

  • Run the SOCR Coin Toss LLN Experiment twice with stop=100 and p=0.5. This corresponds to flipping a fair coin 100 times and observing the behavior of the proportion of heads across (discrete) time.
    • What will be different in the outcomes of the 2 experiments?
    • What properties of the 2 outcomes will be very similar?
    • If we did this 10 times, what is expected to vary and what may be predicted accurately?
  • Use the SOCR Uniform e-Estimate Experiment to obtain stochastic estimates of the natural number e \approx  2.7182.
    • Try to explain in words, and support your argument with data/results from this simulation, why is the expected value of the variable U (defined above) equal to e, E(U) = e.
    • How does the LLN come into play in this experiment?
    • How would you go about in practice if you had to estimate e^2 \approx 7.38861124 ?
    • Similarly, try to estimate \pi \approx 3.141592 and \pi^2 \approx 9.8696044 using the SOCR Buffon’s Needle Experiment.
  • Run the SOCR Roulette Experiment and bet on 1-18 (out of the 38 possible numbers/outcomes).
    • What is the probability of success (p)?
    • What does the LLN imply about p and repeated runs of this experiment?
    • Run this experiment 3 times. What is the sample estimate of p (\hat{p})? What is the difference p-\hat{p}? Would this difference change if we ran the experiment 10 or 100 times? How?
    • In 100 Roulette experiments, what can you say about the difference of the number of successes (outcome in 1-18) and the number of failures? How about the proportion of successes?

Other SOCR LLN Activities

Common Misconceptions regarding the LLN

  • Misconception 1: If we observe a streak of 10 consecutive heads (when p=0.5, say) the odds of the 11th trial being a Head is > p! This is of course, incorrect, as the coin tosses are independent trials (an example of a memoryless process).
  • Misconception 2: If run large number of coin tosses, the number of heads and number of tails become more and more equal. This is incorrect, as the LLN only guarantees that the sample proportion of heads will converge to the true population proportion (the p parameter that we selected). In fact, the difference |Heads - Tails| diverges!

References




Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools