# SOCR EduMaterials Activities General CI Experiment

(Difference between revisions)
 Revision as of 17:06, 27 July 2009 (view source)IvoDinov (Talk | contribs)← Older edit Revision as of 19:27, 27 July 2009 (view source)IvoDinov (Talk | contribs) Newer edit → Line 24: Line 24: : $P\left(-z_{\frac{\alpha}{2}} \le \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}} \le z_{\frac{\alpha}{2}} \right)=1-\alpha,$ : $P\left(-z_{\frac{\alpha}{2}} \le \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}} \le z_{\frac{\alpha}{2}} \right)=1-\alpha,$ : where $-z_{\frac{\alpha}{2}}$ and $z_{\frac{\alpha}{2}}$ are defined as shown in the figure below: : where $-z_{\frac{\alpha}{2}}$ and $z_{\frac{\alpha}{2}}$ are defined as shown in the figure below: -
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig2.png|500px]]
+
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig2.png|300px]]
The area $1-\alpha$ is called ''confidence level''.  Usually, the choices for confidence levels are the following: The area $1-\alpha$ is called ''confidence level''.  Usually, the choices for confidence levels are the following: Line 32: Line 32: ! $1-\alpha$  || $z_{\frac{\alpha}{2}}$ ! $1-\alpha$  || $z_{\frac{\alpha}{2}}$ |- |- - | 0.90             &1.645 + | 0.90 || 1.645 |- |- - | 0.95           &1.960 + | 0.95 || 1.960 |- |- - | 0.98           &2.325 + | 0.98 || 2.325 |- |- - | 0.99         &2.575 + | 0.99 || &2.575 |} |} Line 71: Line 71: To access the [http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experiment_General.html SOCR applet on confidence intervals] go to http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experiment_General.html.  To select the type and parameters of the specific confidence interval of interest click on the '''Confidence Interval''' button on the top -- this will open a new pop-up window as shown below: To access the [http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experiment_General.html SOCR applet on confidence intervals] go to http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experiment_General.html.  To select the type and parameters of the specific confidence interval of interest click on the '''Confidence Interval''' button on the top -- this will open a new pop-up window as shown below: -
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig3.png|500px]] +
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig3.png|300px]] - [[Image:SOCR_Activities_General_CI_Activity_070709_Fig4.png|500px]]
+ [[Image:SOCR_Activities_General_CI_Activity_070709_Fig4.png|200px]]
- This will open the following window: + A confidence interval of interest can be selected from the drop-down list under ''CI Settings''. In this case, we selected ''Mean - Population Variance Known''. - \begin{figure}[h] +
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig5.png|300px]]
- \hspace{.3in} \includegraphics[height=2.0in,width=2.5in]{ci2.png} + - \end{figure} + - \noindent From the drop-down button under CI Settings" a list of the available confidence interval of interest can be found as shown below: + In the same pop-up window, under ''SOCR Distributions'', the drop-down menu offers a list of [http://socr.ucla.edu/htmls/dist/ all the available distributions of SOCR].  These distributions are the same as the ones included in the [http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions applet]. - \begin{figure}[h] +
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig6.png|300px]]
- \hspace{.3in} \includegraphics[height=2.0in,width=2.5in]{ci3.png} + - \end{figure} + - \noindent For this case we select Mean - Population Variance Known". + Once the desired distribution is selected, its parameters can be chosen numerically or via the sliders.  In this example we select: + : ''normal distribution'' with ''mean 5'' and ''standard deviation 2'', + : sample size (number of observations selected from the distribution) is ''20'', + : the confidence level (''1-alpha=0.95''), and + : the number of intervals to be constructed is 50 (see screenshot below). + : '''Note''':  Make sure to hit enter after you enter any of the parameters above. - \noindent In the same window, under SOCR Distributions", the drop-down menu offers the list of all the available distributions of SOCR.  These distributions are the same as the ones under the Distributions" component at INSERT LINK FOR SOCR DISTRIBUTIONS. +
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig7.png|300px]]
- \begin{figure}[h] + To run the SOCR CI simulation, go back to the applet in the main browser window. We can run the experiment once, by clicking on the ''Step'' button, or many times by clicking on the ''Run'' button. The number of experiments can be controlled by the value of the ''Number of Experiments'' variable (10, 100, 1,000, 10,000, or continuously). - \hspace{.3in} \includegraphics[height=2.0in,width=2.5in]{ci4.png} + - \end{figure} + - \noindent Once the desired distribution is selected, its parameters can be chosen manually or using the sliders.  In this example we select the normal distribution with mean 5 and standard deviation 2, sample size (number of observations selected from the distribution) is 20, the confidence level (1-alpha=0.95), and the number of intervals to be constructed is 50 (see screenshot below).  Note: Make sure to hit enter after you enter any of the parameters above. +
[[Image:SOCR_Activities_General_CI_Activity_070709_Fig8.png|300px]]
- \begin{figure}[h] + In the screenshot above we observe the following: - \hspace{.3in} \includegraphics[height=2.0in,width=2.5in]{ci5.png} + * The shape of the distribution that was selected (in this case Normal). - \end{figure} + * The observations selected from the distribution for the construction of each of the 50 intervals shown in blue on the top-left graph panel. + * The confidence intervals shown as red line segments on the bottom-left panel. + * The green dots represent instances of confidence intervals that do not include the estimated parameter (in this case population mean of 5). + * All the parameters and simulation results are summarized on the right panel of the applet. - \clearpage + ===Practice=== - \noindent To run our simulations we will go back to the main applet.  We can run the experiment once by clicking on the Step" button, or many times by clicking on the Run" button.  The number of experiments can be controlled by the Number of Experiments" button (10, 100, 1000, 10000, or continuously). + Run the same experiment using sample sizes of 20, 30, 40, 50 with the same confidence level ($1-alpha=0.95$). What are your observations and conclusions? - \begin{figure}[h] + == Confidence intervals for the population mean $\mu$ with known population variance $\sigma^2$== - \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci6.png} + - \end{figure} + + From the [[AP_Statistics_Curriculum_2007_Limits_CLT |central limit theorem]] we know that when the sample size is large (usually $n \ge 30$) the distribution of the sample mean $\bar X$ approximately follows $\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}}$. Therefore, the confidence interval for the population mean $\mu$ is approximately given by the expression we previously discussed: + : $P\left(\bar x -z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \le \mu \le + \bar x + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \right) \approx 1-\alpha.$ - \noindent In the screenshot above we observe the following: + The mean $\mu$ falls in the interval $\bar x \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$. - \begin{itemize} + - \item[a.]  The shape of the distribution that was selected. + - \item[b.]  The observations selected from the distribution for the construction of each of the 50 intervals shown in blue. + - \item[c.]  The confidence intervals shown as red line segments. + - \item[d.]  The green dots represent the confidence intervals that do not include the population mean. + - \item[e.]  All the parameters and simulation results are recorded on the right panel of the applet. + - \end{itemize} + + Also, the sample size determination is given by the same formula: + : $E=z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \Rightarrow n=\left(\frac{z_{\frac{\alpha}{2}} \sigma}{E}\right)^2.$ + ===Example 3=== + A sample of size ''n=50'' is taken from the production of light bulbs at a certain factory. The sample mean of the lifetime of these 50 light bulbs is found to be $\bar x = 1,570$ hours.  Assume + that the population standard deviation is $\sigma=120$ hours. + * Construct a ''95% confidence interval'' for $\mu$. + * Construct a ''99% confidence interval'' for $\mu$. + * What sample size is needed so that the length of the interval is 30 hours with ''95% confidence''? - \noindent {\bf Exploration:}  \\ + ==An empirical investigation== - Run the same experiment using sample sizes of 20, 30, 40, 50 with the same confidence level (1-alpha).  What can be concluded? + Two dice are rolled and the sum X of the two numbers that occurred is recorded.  The probability distribution of X is as follows: +
+ {| class="wikitable" + |- + ! X ||2 ||3 ||4 ||5 ||6 ||7 ||8 ||9 ||10 ||11 ||12 + |- + | P(X) ||1/36 ||2/36 ||3/36 ||4/36 ||5/36 ||6/36 ||5/36 ||4/36 ||3/36 ||2/36 ||1/36 + |}
- \clearpage + This distribution has mean $\mu=7$ and standard deviation $\sigma=2.42$.  We take 100 samples of size n=50 each from this distribution and compute for each sample the sample mean $\bar x$. Pretend now that we only know that $\sigma=2.42$, and that $\mu$ is unknown.  We are going to use these 100 sample means to construct 100 confidence intervals. each one with ''95% confidence level'' for the true population mean $\mu$. Here are the results: - \noindent {\bf B. Confidence intervals for the population mean $\mbox{\boldmath$\mu$}$ with known population variance $\mbox{\boldmath$\sigma^2$}$:} \\ + - \noindent From the central limit theorem we know that when the sample size is large (usually $n \ge 30$) the distribution of the sample mean $\bar X$ approximately follows: +
- $+ {| class="wikitable" - \bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}}) + |- -$ + ! Sample || $\bar x || 95% CI for [itex]\mu$: $\bar x - 1.96 \frac{2.42}{\sqrt {50}} \le \mu \le \bar x + 1.96 \frac{2.42}{\sqrt {50}}$  || Is $\mu=7$ included? - Therefore, the confidence interval for the populaton mean $\mu$ is approximately given by the expression we found in part (A): + |- - $+ | 1 || 6.9 || $6.23\leq \mu\leq 7.57$ || YES - P\left(\bar x -z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \le \mu \le + |- - \bar x + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \right) \approx 1-\alpha. + | 2 || 6.3 || $5.63\leq\mu\leq 6.97$ || NO -$ + |- - The mean $\mu$ falls in the interval $\bar x \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$. \\ + | 3 || 6.58 ||  $5.91\leq\mu\leq 7.25$ ||  YES - + |- - \noindent Also the sample size determination is given by the same formula we found in part (A): + | 4 || 6.54 ||  $5.87\leq\mu\leq 7.21$ ||  YES - $+ |- - E=z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \Rightarrow n=\left(\frac{z_{\frac{\alpha}{2}} \sigma}{E}\right)^2. + | 5 || 6.7 || $6.03\leq\mu\leq 7.37$ || YES -$ + |- - + | 6 || 6.58 ||  $5.91\leq\mu\leq 7.25$ ||  YES - \noindent {\bf Example:}  \\ + |- - A sample of size $n=50$ is taken from the production of lightbulbs at a certain + | 7 || 7.2 ||  $6.53\leq\mu\leq 7.87$ ||  YES - factory.  The sample mean of the lifetime of these 50 lightbulbs + |- - is found to be $\bar x = 1570$ hours.  Assume + | 8 || 7.62 ||  $6.95\leq\mu\leq 8.29$ ||  YES - that the population standard deviation is $\sigma=120$ hours. + |- - \begin{itemize} + | 9 || 6.94 ||  $6.27\leq\mu\leq 7.61$ ||  YES - \item[a.] Construct a $95 \%$ confidence interval for $\mu$. + |- - \item[b.] Construct a $99 \%$ confidence interval for $\mu$. + | 10 || 7.36 ||  $6.69\leq\mu\leq 8.03$ ||  YES - \item[c.] What sample size is needed so that the length of the interval is + |- - $30$ hours with $95 \%$ confidence? + | 11 || 7.06 ||  $6.39\leq\mu\leq 7.73$ ||  YES - \end{itemize} + |- - + | 12 || 7.08 ||  $6.41\leq\mu\leq 7.75$ ||  YES - + |- - \clearpage + | 13 || 7.42 ||  $6.75\leq\mu\leq 8.09$ ||  YES - \begin{center} + |- - {\bf Confidence intervals - An empirical investigation} \\ + | 14 || 7.42 ||  $6.75\leq\mu\leq 8.09$ ||  YES - \end{center} + |- - \small + | 15 || 6.8 ||  $6.13\leq\mu\leq 7.47$ ||  YES - \noindent  Two dice are rolled and the sum $X$ of the two numbers that occured is recorded.  The probability distribution of $X$ is as follows: + |- - + | 16 || 6.94 ||  $6.27\leq\mu\leq 7.61$ ||  YES - \begin{tabular}{rcccccccccccc} + |- - $X$ &2&3&4&5&6&7&8&9&10&11&12 \\ + | 17 || 7.2 ||  $6.53\leq\mu\leq 7.87$ ||  YES - $P(X)$  &1/36&2/36&3/36&4/36&5/36&6/36&5/36&4/36&3/36&2/36&1/36 + |- - \end{tabular} + | 18 || 6.7 ||  $6.03\leq\mu\leq 7.37$ ||  YES - + |- - \noindent This distribution has mean $\mu=7$ and standard deviation + | 19 || 7.1 ||  $6.43\leq\mu\leq 7.77$ ||  YES - $\sigma=2.42$.  We take $100$ samples of size $n=50$ each from this + |- - distribution and compute for each sample the sample mean $\bar x$. + | 20 || 7.04 ||  $6.37\leq\mu\leq 7.71$ ||  YES - Pretend now that we only know that $\sigma=2.42$, and that $\mu$ is unknown. + |- - We are going to use these $100$ sample means to construct $100$ confidence intervals each one with $95 \%$ confidence level for the true population mean $\mu$.  Here are the results:  \$.3in] + | 21 || 6.98 || $6.31\leq\mu\leq 7.65$ || YES - %\vspace{0.15in} + |- - + | 22 || 7.18 || $6.51\leq\mu\leq 7.85$ || YES - \tiny + |- - + | 23 || 6.8 || $6.13\leq\mu\leq 7.47$ || YES - \begin{tabular}{||r||l||c|c||} \hline\hline + |- - Sample &\bar x &95 \% C.I. for \mu: + | 24 || 6.94 || $6.27\leq\mu\leq 7.61$ || YES - \bar x - 1.96 \frac{2.42}{\sqrt {50}} \le \mu \le + |- - \bar x + 1.96 \frac{2.42}{\sqrt {50}} &Is \mu=7 included? \\ + | 25 || 8.1 || $7.43\leq\mu\leq 8.77$ || NO - \hline\hline + |- - 1&6.9& 6.23\leq \mu\leq 7.57& YES \\ + | 26 || 7 || $6.33\leq\mu\leq 7.67$ || YES - 2&6.3& 5.63\leq\mu\leq 6.97& NO \\ + |- - 3&6.58& 5.91\leq\mu\leq 7.25& YES \\ + | 27 || 7.06 || $6.39\leq\mu\leq 7.73$ || YES - 4&6.54& 5.87\leq\mu\leq 7.21& YES \\ + |- - 5&6.7& 6.03\leq\mu\leq 7.37& YES \\ + | 28 || 6.82 || $6.15\leq\mu\leq 7.49$ || YES - 6&6.58& 5.91\leq\mu\leq 7.25& YES \\ + |- - 7&7.2& 6.53\leq\mu\leq 7.87& YES \\ + | 29 || 6.96 || $6.29\leq\mu\leq 7.63$ || YES - 8&7.62& 6.95\leq\mu\leq 8.29& YES \\ + |- - 9&6.94& 6.27\leq\mu\leq 7.61& YES \\ + | 30 || 7.46 || $6.79\leq\mu\leq 8.13$ || YES - 10&7.36& 6.69\leq\mu\leq 8.03& YES \\ + |- - 11&7.06& 6.39\leq\mu\leq 7.73& YES \\ + | 31 || 7.04 || $6.37\leq\mu\leq 7.71$ || YES - 12&7.08& 6.41\leq\mu\leq 7.75& YES \\ + |- - 13&7.42& 6.75\leq\mu\leq 8.09& YES \\ + | 32 || 7.06 || $6.39\leq\mu\leq 7.73$ || YES - 14&7.42& 6.75\leq\mu\leq 8.09& YES \\ + |- - 15&6.8& 6.13\leq\mu\leq 7.47& YES \\ + | 33 || 7.06 || $6.39\leq\mu\leq 7.73$ || YES - 16&6.94& 6.27\leq\mu\leq 7.61& YES \\ + |- - 17&7.2& 6.53\leq\mu\leq 7.87& YES \\ + | 34 || 6.8 || $6.13\leq\mu\leq 7.47$ || YES - 18&6.7& 6.03\leq\mu\leq 7.37& YES \\ + |- - 19&7.1& 6.43\leq\mu\leq 7.77& YES \\ + | 35 || 7.12 || $6.45\leq\mu\leq 7.79$ || YES - 20&7.04& 6.37\leq\mu\leq 7.71& YES \\ + |- - 21&6.98& 6.31\leq\mu\leq 7.65& YES \\ + | 36 || 7.18 || $6.51\leq\mu\leq 7.85$ || YES - 22&7.18& 6.51\leq\mu\leq 7.85& YES \\ + |- - 23&6.8& 6.13\leq\mu\leq 7.47& YES \\ + | 37 || 7.08 || $6.41\leq\mu\leq 7.75$ || YES - 24&6.94& 6.27\leq\mu\leq 7.61& YES \\ + |- - 25&8.1& 7.43\leq\mu\leq 8.77& NO \\ + | 38 || 7.24 || $6.57\leq\mu\leq 7.91$ || YES - 26&7& 6.33\leq\mu\leq 7.67& YES \\ + |- - 27&7.06& 6.39\leq\mu\leq 7.73& YES \\ + | 39 || 6.82 || $6.15\leq\mu\leq 7.49$ || YES - 28&6.82& 6.15\leq\mu\leq 7.49& YES \\ + |- - 29&6.96& 6.29\leq\mu\leq 7.63& YES \\ + | 40 || 7.26 || $6.59\leq\mu\leq 7.93$ || YES - 30&7.46& 6.79\leq\mu\leq 8.13& YES \\ + |- - 31&7.04& 6.37\leq\mu\leq 7.71& YES \\ + | 41 || 7.34 || $6.67\leq\mu\leq 8.01$ || YES - 32&7.06& 6.39\leq\mu\leq 7.73& YES \\ + |- - 33&7.06& 6.39\leq\mu\leq 7.73& YES \\ + | 42 || 6.62 || $5.95\leq\mu\leq 7.29$ || YES - 34&6.8& 6.13\leq\mu\leq 7.47& YES \\ + |- - 35&7.12& 6.45\leq\mu\leq 7.79& YES \\ + | 43 || 7.1 || $6.43\leq\mu\leq 7.77$ || YES - 36&7.18& 6.51\leq\mu\leq 7.85& YES \\ + |- - 37&7.08& 6.41\leq\mu\leq 7.75& YES \\ + | 44 || 6.98 || $6.31\leq\mu\leq 7.65$ || YES - 38&7.24& 6.57\leq\mu\leq 7.91& YES \\ + |- - 39&6.82& 6.15\leq\mu\leq 7.49& YES \\ + | 45 || 6.98 || $6.31\leq\mu\leq 7.65$ || YES - 40&7.26& 6.59\leq\mu\leq 7.93& YES \\ + |- - 41&7.34& 6.67\leq\mu\leq 8.01& YES \\ + | 46 || 7.06 || $6.39\leq\mu\leq 7.73$ || YES - 42&6.62& 5.95\leq\mu\leq 7.29& YES \\ + |- - 43&7.1& 6.43\leq\mu\leq 7.77& YES \\ + | 47 || 7.14 || $6.47\leq\mu\leq 7.81$ || YES - 44&6.98& 6.31\leq\mu\leq 7.65& YES \\ + |- - 45&6.98& 6.31\leq\mu\leq 7.65& YES \\ + | 48 || 7.5 || $6.83\leq\mu\leq 8.17$ || YES - 46&7.06& 6.39\leq\mu\leq 7.73& YES \\ + |- - 47&7.14& 6.47\leq\mu\leq 7.81& YES \\ + | 49 || 7.08 || $6.41\leq\mu\leq 7.75$ || YES - 48&7.5& 6.83\leq\mu\leq 8.17& YES \\ + |- - 49&7.08& 6.41\leq\mu\leq 7.75& YES \\ + | 50 || 7.32 || $6.65\leq\mu\leq 7.99$ || YES - 50&7.32& 6.65\leq\mu\leq 7.99& YES \\ + |- - \hline\hline + | 51 || 6.54 || $5.87\leq\mu\leq 7.21$ || YES - \end{tabular} + |- - + | 52 || 7.14 || $6.47\leq\mu\leq 7.81$ || YES - \clearpage + |- - \begin{tabular}{||r||l||c|c||} \hline\hline + | 53 || 6.64 || $5.97\leq\mu\leq 7.31$ || YES - Sample &\bar x &95 \% C.I. for \mu: + |- - \bar x - 1.96 \frac{2.42}{\sqrt {50}} \le \mu \le + | 54 || 7.46 || $6.79\leq\mu\leq 8.13$ || YES - \bar x + 1.96 \frac{2.42}{\sqrt {50}} &Is \mu=7 included? \\ + |- - \hline\hline + | 55 || 7.34 || $6.67\leq\mu\leq 8.01$ || YES - 51&6.54& 5.87\leq\mu\leq 7.21& YES \\ + |- - 52&7.14& 6.47\leq\mu\leq 7.81& YES \\ + | 56 || 7.28 || $6.61\leq\mu\leq 7.95$ || YES - 53&6.64& 5.97\leq\mu\leq 7.31& YES \\ + |- - 54&7.46& 6.79\leq\mu\leq 8.13& YES \\ + | 57 || 6.56 || $5.89\leq\mu\leq 7.23$ || YES - 55&7.34& 6.67\leq\mu\leq 8.01& YES \\ + |- - 56&7.28& 6.61\leq\mu\leq 7.95& YES \\ + | 58 || 7.72 || $7.05\leq\mu\leq 8.39$ || NO - 57&6.56& 5.89\leq\mu\leq 7.23& YES \\ + |- - 58&7.72& 7.05\leq\mu\leq 8.39& NO \\ + | 59 || 6.66 || $5.99\leq\mu\leq 7.33$ || YES - 59&6.66& 5.99\leq\mu\leq 7.33& YES \\ + |- - 60&6.8& 6.13\leq\mu\leq 7.47& YES \\ + | 60 || 6.8 || $6.13\leq\mu\leq 7.47$ || YES - 61&7.08& 6.41\leq\mu\leq 7.75& YES \\ + |- - 62&6.58& 5.91\leq\mu\leq 7.25& YES \\ + | 61 || 7.08 || $6.41\leq\mu\leq 7.75$ || YES - 63&7.3& 6.63\leq\mu\leq 7.97& YES \\ + |- - 64&7.1& 6.43\leq\mu\leq 7.77& YES \\ + | 62 || 6.58 || $5.91\leq\mu\leq 7.25$ || YES - 65&6.68& 6.01\leq\mu\leq 7.35& YES \\ + |- - 66&6.98& 6.31\leq\mu\leq 7.65& YES \\ + | 63 || 7.3 || $6.63\leq\mu\leq 7.97$ || YES - 67&6.94& 6.27\leq\mu\leq 7.61& YES \\ + |- - 68&6.78& 6.11\leq\mu\leq 7.45& YES \\ + | 64 || 7.1 || $6.43\leq\mu\leq 7.77$ || YES - 69&7.2& 6.53\leq\mu\leq 7.87& YES \\ + |- - 70&6.9& 6.23\leq\mu\leq 7.57& YES \\ + | 65 || 6.68 || $6.01\leq\mu\leq 7.35$ || YES - 71&6.42& 5.75\leq\mu\leq 7.09& YES \\ + |- - 72&6.48& 5.81\leq\mu\leq 7.15& YES \\ + | 66 || 6.98 || $6.31\leq\mu\leq 7.65$ || YES - 73&7.12& 6.45\leq\mu\leq 7.79& YES \\ + |- - 74&6.9& 6.23\leq\mu\leq 7.57& YES \\ + | 67 || 6.94 || $6.27\leq\mu\leq 7.61$ || YES - 75&7.24& 6.57\leq\mu\leq 7.91& YES \\ + |- - 76&6.6& 5.93\leq\mu\leq 7.27& YES \\ + | 68 || 6.78 || $6.11\leq\mu\leq 7.45$ || YES - 77&7.28& 6.61\leq\mu\leq 7.95& YES \\ + |- - 78&7.18& 6.51\leq\mu\leq 7.85& YES \\ + | 69 || 7.2 || $6.53\leq\mu\leq 7.87$ || YES - 79&6.76& 6.09\leq\mu\leq 7.43& YES \\ + |- - 80&7.06& 6.39\leq\mu\leq 7.73& YES \\ + | 70 || 6.9 || $6.23\leq\mu\leq 7.57$ || YES - 81&7& 6.33\leq\mu\leq 7.67& YES \\ + |- - 82&7.08& 6.41\leq\mu\leq 7.75& YES \\ + | 71 || 6.42 || $5.75\leq\mu\leq 7.09$ || YES - 83&7.18& 6.51\leq\mu\leq 7.85& YES \\ + |- - 84&7.26& 6.59\leq\mu\leq 7.93& YES \\ + | 72 || 6.48 || $5.81\leq\mu\leq 7.15$ || YES - 85&6.88& 6.21\leq\mu\leq 7.55& YES \\ + |- - 86&6.28& 5.61\leq\mu\leq 6.95& NO \\ + | 73 || 7.12 || $6.45\leq\mu\leq 7.79$ || YES - 87&7.06& 6.39\leq\mu\leq 7.73& YES \\ + |- - 88&6.66& 5.99\leq\mu\leq 7.33& YES \\ + | 74 || 6.9 || $6.23\leq\mu\leq 7.57$ || YES - 89&7.18& 6.51\leq\mu\leq 7.85& YES \\ + |- - 90&6.86& 6.19\leq\mu\leq 7.53& YES \\ + | 75 || 7.24 || $6.57\leq\mu\leq 7.91$ || YES - 91&6.96& 6.29\leq\mu\leq 7.63& YES \\ + |- - 92&7.26& 6.59\leq\mu\leq 7.93& YES \\ + | 76 || 6.6 || $5.93\leq\mu\leq 7.27$ || YES - 93&6.68& 6.01\leq\mu\leq 7.35& YES \\ + |- - 94&6.76& 6.09\leq\mu\leq 7.43& YES \\ + | 77 || 7.28 || $6.61\leq\mu\leq 7.95$ || YES - 95&7.3& 6.63\leq\mu\leq 7.97& YES \\ + |- - 96&7.04& 6.37\leq\mu\leq 7.71& YES \\ + | 78 || 7.18 || $6.51\leq\mu\leq 7.85$ || YES - 97&7.34& 6.67\leq\mu\leq 8.01& YES \\ + |- - 98&6.72& 6.05\leq\mu\leq 7.39& YES \\ + | 79 || 6.76 || $6.09\leq\mu\leq 7.43$ || YES - 99&6.64& 5.97\leq\mu\leq 7.31& YES \\ + |- - 100&7.3& 6.63\leq\mu\leq 7.97& YES \\ \hline\hline + | 80 || 7.06 || $6.39\leq\mu\leq 7.73$ || YES - + |- - \end{tabular} + | 81 || 7 || $6.33\leq\mu\leq 7.67$ || YES + |- + | 82 || 7.08 || $6.41\leq\mu\leq 7.75$ || YES + |- + | 83 || 7.18 || $6.51\leq\mu\leq 7.85$ || YES + |- + | 84 || 7.26 || $6.59\leq\mu\leq 7.93$ || YES + |- + | 85 || 6.88 || $6.21\leq\mu\leq 7.55$ || YES + |- + | 86 || 6.28 || $5.61\leq\mu\leq 6.95$ || NO + |- + | 87 || 7.06 || $6.39\leq\mu\leq 7.73$ || YES + |- + | 88 || 6.66 || $5.99\leq\mu\leq 7.33$ || YES + |- + | 89 || 7.18 || $6.51\leq\mu\leq 7.85$ || YES + |- + | 90 || 6.86 || $6.19\leq\mu\leq 7.53$ || YES + |- + | 91 || 6.96 || $6.29\leq\mu\leq 7.63$ || YES + |- + | 92 || 7.26 || $6.59\leq\mu\leq 7.93$ || YES + |- + | 93 || 6.68 || $6.01\leq\mu\leq 7.35$ || YES + |- + | 94 || 6.76 || $6.09\leq\mu\leq 7.43$ || YES + |- + | 95 || 7.3 || $6.63\leq\mu\leq 7.97$ || YES + |- + | 96 || 7.04 || $6.37\leq\mu\leq 7.71$ || YES + |- + | 97 || 7.34 || $6.67\leq\mu\leq 8.01$ || YES + |- + | 98 || 6.72 || $6.05\leq\mu\leq 7.39$ || YES + |- + | 99 || 6.64 || $5.97\leq\mu\leq 7.31$ || YES + |- + | 100 || 7.3 || $6.63\leq\mu\leq 7.97$ || YES + |} - \vspace{.2in} + We observe that four confidence intervals among the 100 that we constructed fail to include the true population mean $\mu=7$ (about 5%). - \normalsize + - \noindent We observe that four confidence intervals among the 100 that we constructed fail to include the true population mean \mu=7 (about 5 \%). + - %It is also clear from this experiment why we should never use the word probability to interpret a confidence interval. Consider for example the first sample. Our confidence interval is 6.23 \le \mu \le 7.57. Does it make sense to say the probability is 95 \% that \mu=7 falls between 6.23 and 7.57?" Of course the probability is 1 here. Look at sample 2. The resulting confidence interval is 65.63 \le \mu \le 6.97. Here the probability that \mu=7 included in this interval is 0. Therefore, the probability is either 0 or 1. The confidence interval either includes or not the population mean \mu. We say: we are 95 \% confident that \mu falls in the interval we just constructed". + - \clearpage + ===Example 4=== - \noindent For this case we will select the exponential distribution with \lambda=5 (mean of 0.2), sample size 60, confidence level 0.95, and number of intervals 50 as shown on the screenshot below. + For this example, we will select the [http://socr.ucla.edu/htmls/dist/Exponential_Distribution.html Exponential distribution] with $\lambda=5$ (mean of 1/5 = 0.2), sample size 60, confidence level 0.95, and number of intervals 50 as shown on the screenshot below. ## Revision as of 19:27, 27 July 2009 ## Contents ## SOCR Experiments Activities - General Confidence Interval Activity ## Summary There are two types of parameter estimates – point-based and interval-based estimates. Point-estimates refer to unique quantitative estimates of various parameters. Interval-estimates represent ranges of plausible values for the parameters of interest. There are different algorithmic approaches, prior assumptions and principals for computing data-driven parameter estimates. Both point and interval estimates depend on the distribution of the process of interest, the available computational resources and other criteria that may be desirable (Stewarty 1999) – e.g., biasness and robustness of the estimates. Accurate, robust and efficient parameter estimation is critical in making inference about observable experiments, summarizing process characteristics and prediction of experimental behaviors. This activity demonstrates the usage and functionality of SOCR General Confidence Interval Applet. This applet is complementary to the SOCR Simple Confidence Interval Applet and its corresponding activity. ## Goals The aims of this activity is to: • TBD • TBD • TBD. ## Motivational example A 2005 study proposing a new computational brain atlas for Alzheimer’s disease (Mega et al., 2005) investigated the mean volumetric characteristics and the spectra of shapes and sizes of different cortical and subcortical brain regions for Alzheimer’s patients, individuals with minor cognitive impairment and asymptomatic subjects. This study estimated a number of centrality and variability parameters for these thee populations. Based on these point- and interval-estimates, the study analyzed a number of digital scans to derive criteria for imaging-based classification of subjects based on the intensities of their 3D brain scans. Their results enabled a number of subsequent inference studies that quantified the effects of subject demographics (e.g., education level, familial history, APOE allele, etc.), stage of the disease and the efficacy of new drug treatments targeting Alzheimer’s disease. The Figure to the right illustrates the shape, center and distribution parameters for the 3D geometric structure of the right hippocampus in the Alzheimer’s disease brain atlas. New imaging data can then be coregistered and compared relative to the amount of anatomical variability encoded in this atlas. This enables automated, efficient and quantitative inference on large number of brain volumes. Examples of point and interval estimates computed in this atlas framework include the mean-intensity and mean shape location, and the standard deviation of intensities and the mean deviation of shape, respectively. ## Activity ===Confidence intervals (CI) for the population mean μ of normal population with known population variance σ2=== Let $X_1, X_2, \cdots, X_n$ be a random sample from N(μ,σ). We know that $\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}})$. Therefore, $P\left(-z_{\frac{\alpha}{2}} \le \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}} \le z_{\frac{\alpha}{2}} \right)=1-\alpha,$ where $-z_{\frac{\alpha}{2}}$ and $z_{\frac{\alpha}{2}}$ are defined as shown in the figure below: The area 1 − α is called confidence level. Usually, the choices for confidence levels are the following: 1 − α $z_{\frac{\alpha}{2}}$ 0.90 1.645 0.95 1.960 0.98 2.325 0.99 &2.575 The expression above can be written as: $P\left(\bar x -z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \le \mu \le \bar x + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \right)=1-\alpha.$ We say that we are 1 − α confident that the mean μ falls in the interval $\bar x \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$. ### Example 1 Suppose that the length of iron rods from a certain factory follows the normal distribution with known standard deviation $\sigma=0.2\ m$ but unknown mean μ. Construct a 95% confidence interval for the population mean μ if a random sample of n=16 of these iron rods has sample mean $\bar x=6 \ m$. We solve this problem by using our CI recipe $6 \pm 1.96 \frac{0.2}{\sqrt{16}}$ $6 \pm 0.098$ $5.902 \le \mu \le 6.098.$ ### Sample size determination for a given length of the confidence interval Find the sample size n needed when we want the width of the confidence interval to be $\pm E$ with confidence level 1 − α. #### Solution In the expression $\bar x \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$ the width of the confidence interval is given by $z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$ (also called margin of error). We want this width to be equal to E. Therefore, $E=z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \Rightarrow n=\left(\frac{z_{\frac{\alpha}{2}} \sigma}{E}\right)^2.$ ### Example 2 Following our first example above, suppose that we want the entire width of the confidence interval to be equal to $0.05 \ m$. Find the sample size n needed. $n=\left(\frac{1.96 \times 0.2}{0.025}\right)^2=245.9 \Rightarrow n \approx 246.$ ## Introduction of the SOCR Confidence Interval Applet To access the SOCR applet on confidence intervals go to http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experiment_General.html. To select the type and parameters of the specific confidence interval of interest click on the Confidence Interval button on the top -- this will open a new pop-up window as shown below: A confidence interval of interest can be selected from the drop-down list under CI Settings. In this case, we selected Mean - Population Variance Known. In the same pop-up window, under SOCR Distributions, the drop-down menu offers a list of all the available distributions of SOCR. These distributions are the same as the ones included in the SOCR Distributions applet. Once the desired distribution is selected, its parameters can be chosen numerically or via the sliders. In this example we select: normal distribution with mean 5 and standard deviation 2, sample size (number of observations selected from the distribution) is 20, the confidence level (1-alpha=0.95), and the number of intervals to be constructed is 50 (see screenshot below). Note: Make sure to hit enter after you enter any of the parameters above. To run the SOCR CI simulation, go back to the applet in the main browser window. We can run the experiment once, by clicking on the Step button, or many times by clicking on the Run button. The number of experiments can be controlled by the value of the Number of Experiments variable (10, 100, 1,000, 10,000, or continuously). In the screenshot above we observe the following: • The shape of the distribution that was selected (in this case Normal). • The observations selected from the distribution for the construction of each of the 50 intervals shown in blue on the top-left graph panel. • The confidence intervals shown as red line segments on the bottom-left panel. • The green dots represent instances of confidence intervals that do not include the estimated parameter (in this case population mean of 5). • All the parameters and simulation results are summarized on the right panel of the applet. ### Practice Run the same experiment using sample sizes of 20, 30, 40, 50 with the same confidence level (1 − alpha = 0.95). What are your observations and conclusions? ## Confidence intervals for the population mean μ with known population variance σ2 From the central limit theorem we know that when the sample size is large (usually $n \ge 30$) the distribution of the sample mean $\bar X$ approximately follows $\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}}$. Therefore, the confidence interval for the population mean μ is approximately given by the expression we previously discussed: $P\left(\bar x -z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \le \mu \le \bar x + z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \right) \approx 1-\alpha.$ The mean μ falls in the interval $\bar x \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$. Also, the sample size determination is given by the same formula: $E=z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \Rightarrow n=\left(\frac{z_{\frac{\alpha}{2}} \sigma}{E}\right)^2.$ ### Example 3 A sample of size n=50 is taken from the production of light bulbs at a certain factory. The sample mean of the lifetime of these 50 light bulbs is found to be $\bar x = 1,570$ hours. Assume that the population standard deviation is σ = 120 hours. • Construct a 95% confidence interval for μ. • Construct a 99% confidence interval for μ. • What sample size is needed so that the length of the interval is 30 hours with 95% confidence? ## An empirical investigation Two dice are rolled and the sum X of the two numbers that occurred is recorded. The probability distribution of X is as follows: X 2 3 4 5 6 7 8 9 10 11 12 P(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 This distribution has mean μ = 7 and standard deviation σ = 2.42. We take 100 samples of size n=50 each from this distribution and compute for each sample the sample mean $\bar x$. Pretend now that we only know that σ = 2.42, and that μ is unknown. We are going to use these 100 sample means to construct 100 confidence intervals. each one with 95% confidence level for the true population mean μ. Here are the results: Sample $\bar x$ 95% CI for μ: $\bar x - 1.96 \frac{2.42}{\sqrt {50}} \le \mu \le \bar x + 1.96 \frac{2.42}{\sqrt {50}}$ Is μ = 7 included? 1 6.9 $6.23\leq \mu\leq 7.57$ YES 2 6.3 $5.63\leq\mu\leq 6.97$ NO 3 6.58 $5.91\leq\mu\leq 7.25$ YES 4 6.54 $5.87\leq\mu\leq 7.21$ YES 5 6.7 $6.03\leq\mu\leq 7.37$ YES 6 6.58 $5.91\leq\mu\leq 7.25$ YES 7 7.2 $6.53\leq\mu\leq 7.87$ YES 8 7.62 $6.95\leq\mu\leq 8.29$ YES 9 6.94 $6.27\leq\mu\leq 7.61$ YES 10 7.36 $6.69\leq\mu\leq 8.03$ YES 11 7.06 $6.39\leq\mu\leq 7.73$ YES 12 7.08 $6.41\leq\mu\leq 7.75$ YES 13 7.42 $6.75\leq\mu\leq 8.09$ YES 14 7.42 $6.75\leq\mu\leq 8.09$ YES 15 6.8 $6.13\leq\mu\leq 7.47$ YES 16 6.94 $6.27\leq\mu\leq 7.61$ YES 17 7.2 $6.53\leq\mu\leq 7.87$ YES 18 6.7 $6.03\leq\mu\leq 7.37$ YES 19 7.1 $6.43\leq\mu\leq 7.77$ YES 20 7.04 $6.37\leq\mu\leq 7.71$ YES 21 6.98 $6.31\leq\mu\leq 7.65$ YES 22 7.18 $6.51\leq\mu\leq 7.85$ YES 23 6.8 $6.13\leq\mu\leq 7.47$ YES 24 6.94 $6.27\leq\mu\leq 7.61$ YES 25 8.1 $7.43\leq\mu\leq 8.77$ NO 26 7 $6.33\leq\mu\leq 7.67$ YES 27 7.06 $6.39\leq\mu\leq 7.73$ YES 28 6.82 $6.15\leq\mu\leq 7.49$ YES 29 6.96 $6.29\leq\mu\leq 7.63$ YES 30 7.46 $6.79\leq\mu\leq 8.13$ YES 31 7.04 $6.37\leq\mu\leq 7.71$ YES 32 7.06 $6.39\leq\mu\leq 7.73$ YES 33 7.06 $6.39\leq\mu\leq 7.73$ YES 34 6.8 $6.13\leq\mu\leq 7.47$ YES 35 7.12 $6.45\leq\mu\leq 7.79$ YES 36 7.18 $6.51\leq\mu\leq 7.85$ YES 37 7.08 $6.41\leq\mu\leq 7.75$ YES 38 7.24 $6.57\leq\mu\leq 7.91$ YES 39 6.82 $6.15\leq\mu\leq 7.49$ YES 40 7.26 $6.59\leq\mu\leq 7.93$ YES 41 7.34 $6.67\leq\mu\leq 8.01$ YES 42 6.62 $5.95\leq\mu\leq 7.29$ YES 43 7.1 $6.43\leq\mu\leq 7.77$ YES 44 6.98 $6.31\leq\mu\leq 7.65$ YES 45 6.98 $6.31\leq\mu\leq 7.65$ YES 46 7.06 $6.39\leq\mu\leq 7.73$ YES 47 7.14 $6.47\leq\mu\leq 7.81$ YES 48 7.5 $6.83\leq\mu\leq 8.17$ YES 49 7.08 $6.41\leq\mu\leq 7.75$ YES 50 7.32 $6.65\leq\mu\leq 7.99$ YES 51 6.54 $5.87\leq\mu\leq 7.21$ YES 52 7.14 $6.47\leq\mu\leq 7.81$ YES 53 6.64 $5.97\leq\mu\leq 7.31$ YES 54 7.46 $6.79\leq\mu\leq 8.13$ YES 55 7.34 $6.67\leq\mu\leq 8.01$ YES 56 7.28 $6.61\leq\mu\leq 7.95$ YES 57 6.56 $5.89\leq\mu\leq 7.23$ YES 58 7.72 $7.05\leq\mu\leq 8.39$ NO 59 6.66 $5.99\leq\mu\leq 7.33$ YES 60 6.8 $6.13\leq\mu\leq 7.47$ YES 61 7.08 $6.41\leq\mu\leq 7.75$ YES 62 6.58 $5.91\leq\mu\leq 7.25$ YES 63 7.3 $6.63\leq\mu\leq 7.97$ YES 64 7.1 $6.43\leq\mu\leq 7.77$ YES 65 6.68 $6.01\leq\mu\leq 7.35$ YES 66 6.98 $6.31\leq\mu\leq 7.65$ YES 67 6.94 $6.27\leq\mu\leq 7.61$ YES 68 6.78 $6.11\leq\mu\leq 7.45$ YES 69 7.2 $6.53\leq\mu\leq 7.87$ YES 70 6.9 $6.23\leq\mu\leq 7.57$ YES 71 6.42 $5.75\leq\mu\leq 7.09$ YES 72 6.48 $5.81\leq\mu\leq 7.15$ YES 73 7.12 $6.45\leq\mu\leq 7.79$ YES 74 6.9 $6.23\leq\mu\leq 7.57$ YES 75 7.24 $6.57\leq\mu\leq 7.91$ YES 76 6.6 $5.93\leq\mu\leq 7.27$ YES 77 7.28 $6.61\leq\mu\leq 7.95$ YES 78 7.18 $6.51\leq\mu\leq 7.85$ YES 79 6.76 $6.09\leq\mu\leq 7.43$ YES 80 7.06 $6.39\leq\mu\leq 7.73$ YES 81 7 $6.33\leq\mu\leq 7.67$ YES 82 7.08 $6.41\leq\mu\leq 7.75$ YES 83 7.18 $6.51\leq\mu\leq 7.85$ YES 84 7.26 $6.59\leq\mu\leq 7.93$ YES 85 6.88 $6.21\leq\mu\leq 7.55$ YES 86 6.28 $5.61\leq\mu\leq 6.95$ NO 87 7.06 $6.39\leq\mu\leq 7.73$ YES 88 6.66 $5.99\leq\mu\leq 7.33$ YES 89 7.18 $6.51\leq\mu\leq 7.85$ YES 90 6.86 $6.19\leq\mu\leq 7.53$ YES 91 6.96 $6.29\leq\mu\leq 7.63$ YES 92 7.26 $6.59\leq\mu\leq 7.93$ YES 93 6.68 $6.01\leq\mu\leq 7.35$ YES 94 6.76 $6.09\leq\mu\leq 7.43$ YES 95 7.3 $6.63\leq\mu\leq 7.97$ YES 96 7.04 $6.37\leq\mu\leq 7.71$ YES 97 7.34 $6.67\leq\mu\leq 8.01$ YES 98 6.72 $6.05\leq\mu\leq 7.39$ YES 99 6.64 $5.97\leq\mu\leq 7.31$ YES 100 7.3 $6.63\leq\mu\leq 7.97$ YES We observe that four confidence intervals among the 100 that we constructed fail to include the true population mean μ = 7 (about 5%). ### Example 4 For this example, we will select the Exponential distribution with λ = 5 (mean of 1/5 = 0.2), sample size 60, confidence level 0.95, and number of intervals 50 as shown on the screenshot below. \begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci7.png} \end{figure} \noindent The results of the simulations are shown below: \begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci8.png} \end{figure} \clearpage \noindent {\bf C. Confidence intervals for the population mean of normal distribution when the population variance \mbox{\boldmath\sigma^2} is unknown:} \\ \noindent Let X_1, X_2, \cdots, X_n be a random sample from N(\mu, \sigma). It is known that \frac{\bar X - \mu}{\frac{s}{\sqrt{n}}} \sim t_{n-1}. Therefore, \[ P\left(-t_{\frac{\alpha}{2}; n-1} \le \frac{\bar X - \mu}{\frac{s}{\sqrt{n}}} \le t_{\frac{\alpha}{2}; n-1} \right)=1-\alpha$ where $-t_{\frac{\alpha}{2};n-1}$ and $t_{\frac{\alpha}{2};n-1}$ are defined as follows:

\begin{figure}[h] \vspace{-0.9in} \hspace{.3in} \includegraphics[height=5.0in,width=6.5in]{conf_alpha_t.pdf} \vspace{-0.9in} \end{figure}

\noindent The area $1-\alpha$ is called {\it confidence level}. The values of $t_{\frac{\alpha}{2};n-1}$ can be found from the $t$ table. Here are some examples: \\

\begin{tabular}{|l|c|c|} \hline $1-\alpha$ &$n$ &$t_{\frac{\alpha}{2};n-1}$ \\ \hline 0.90 &13 &1.782 \\ \hline 0.95 & 21 &2.086 \\ \hline 0.98 &31 &2.457 \\ \hline 0.99 &61 &2.660 \\ \hline \end{tabular}

\vspace{.2in}

\noindent Note: \\ The sample standard deviation is computed as follows: $s=\sqrt{\frac{\sum_{i=1}^{n} (x_i-\bar x)^2}{n-1}}$ or easier using the shortcut formula. $s=\sqrt{\frac{1}{n-1}\left[\sum_{i=1}^{n} x_i^2 - \frac{(\sum_{i=1}^{n} x_i)^2}{n}\right]}$

\clearpage \noindent After some rearranging the expression above can be written as: %$$$P\left(\bar x -t_{\frac{\alpha}{2};n-1} \frac{s}{\sqrt{n}} \le \mu \le \bar x + t_{\frac{\alpha}{2};n-1} \frac{s}{\sqrt{n}} \right)=1-\alpha$$ %$ We say that we are $1-\alpha$ confident that $\mu$ falls in the interval: $\bar x \pm t_{\frac{\alpha}{2};n-1} \frac{s}{\sqrt{n}}.$

\noindent Example: \\ The daily production of a chemical product last week in tons was: 785, 805, 790, 793, and 802. \begin{itemize} \item[a.] Construct a $95 \%$ confidence interval for the population mean $\mu$. \item[b.] What assumptions are necessary? \end{itemize}

\noindent {\bf SOCR investigation:} \\ \noindent For this case we have selected the normal distribution with mean 5 and standard deviation 2, sample size of 25, number of intervals 50, and confidence level 0.95. The simulation results are shown next:

\begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci9.png} \end{figure}

\noindent We observe that the length of the confidence interval differs for all the intervals because the margin of error is computed using the sample standard deviation.

\clearpage \noindent {\bf D. Confidence interval for the population proportion $\mbox{\boldmath$p$}$:} \\

\noindent Let $X_1, X_2, \cdots, X_n$ be a random sample from the Bernoulli distribution with probability of success $p$. To construct a confidence interval for $p$ the following result is used based on normal approximaton. $\frac{X-np}{\sqrt{np(1-p)}} \sim N(0,1)$ Therefore, $P\left(-z_{\frac{\alpha}{2}} \le \frac{X-np}{\sqrt{np(1-p)}} \le z_{\frac{\alpha}{2}} \right)=1-\alpha, \ \ \mbox{where -z_{\frac{\alpha}{2}} and z_{\frac{\alpha}{2}} defined as above.}$ After rearranging we get: $P\left(\frac{X}{n} - z_{\frac{\alpha}{2}} \sqrt{\frac{p(1-p)}{n}} \le p \le \frac{X}{n} + z_{\frac{\alpha}{2}} \sqrt{\frac{p(1-p)}{n}}\right)=1-\alpha.$ The ratio $\frac{x}{n}$ is the point estimate of the population $p$ and it is denoted with $\hat p=\frac{x}{n}$. The problem with this interval is that the unknown $p$ appears also at the end points of the interval. As an approximation we can simply replace $p$ with its estimate $\hat p=\frac{x}{n}$. Finally the confidence interval is given below: %$$$P\left(\hat p - z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p(1-\hat p)}{n}} \le p \le \hat p + z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p(1-\hat p)}{n}}\right)=1-\alpha.$$ %$ We say that we are $1-\alpha$ confident that $p$ falls in $\hat p \pm z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p(1-\hat p)}{n}}$

\noindent {\bf Sample size determination:} \\ Determine the sample size needed so that the resulting confidence interval will have margin of error $E$ with confidence level $1-\alpha$. \$.1in] \noindent {\bf Answer:} \\ In the expression \hat p \pm z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p(1-\hat p)}{n}} the width of the confidence interval is given by the margin of error \frac{\hat p(1-\hat p)}{n}. We simply solve for n: \[ E=z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p(1-\hat p)}{n}} \Rightarrow n = \frac {z_{\frac{\alpha}{2}}^2\hat p (1-\hat p)}{E^2}.$ However the value of $\hat p$ is not known because we have not selected our sample yet. If we use $\hat p=0.5$ we will obtain the largest possible sample size. Of course if we have an idea about its value (from another study, etc.) we can use it.

\clearpage \noindent {\bf Example:} \\ At a survey poll before the elections candidate $A$ receives the support of 650 voters in a sample of 1200 voters. \begin{itemize} \item[a.] Construct a $95 \%$ confidence interval for the population proportion $p$ that supports candidate $A$. \item[b.] Find the sample size needed so that the margin of error will be $\pm 0.01$ with confidence level $95 \%$. \end{itemize}

\vspace{2.0in} \noindent {\bf Another formula for the confidence interval for the population proportion $\mbox{\boldmath$p$}$:} \\

\noindent Another way to solve for $p$ is presented below: \begin{eqnarray*} P\left(-z_{\frac{\alpha}{2}} \le \frac{X-np}{\sqrt{np(1-p)}} \le z_{\frac{\alpha}{2}} \right)&=&1-\alpha \\ P\left(-z_{\frac{\alpha}{2}} \le \frac{\frac{X}{n}-p}{\sqrt{\frac{p(1-p)}{n}}} \le z_{\frac{\alpha}{2}} \right)&=&1-\alpha \\ P\left(\frac{|\hat p - p|}{\sqrt{\frac{p(1-p)}{n}}} \le z_{\frac{\alpha}{2}} \right) &=&1-\alpha \\ P\left(\frac{(\hat p - p)^2}{\frac{p(1-p)}{n}} \le z_{\frac{\alpha}{2}}^2 \right) &=&1-\alpha \\ \end{eqnarray*} We obtain a quadratic expression in $p$: \begin{eqnarray*} (\hat p - p)^2 - z_{\frac{\alpha}{2}}^2 \frac{p(1-p)}{n} \le 0 \\ (1+\frac{z_{\frac{\alpha}{2}}^2}{n})p^2 - (2\hat p + \frac{z_{\frac{\alpha}{2}}^2}{n})p + \hat p^2 &=& 0 \\ \end{eqnarray*} Solving for $p$ we get the following confidence interval: %$$$\frac{\hat p +\frac{z_{\frac{\alpha}{2}}^2}{2n} \pm z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p(1-\hat p)}{n}+\frac{z_{\frac{\alpha}{2}}^2}{4n^2}}} {1+\frac{z_{\frac{\alpha}{2}}^2}{n}}. \ \ \mbox{When n is large this is the same as (3)}.$$ %$

\noindent {\bf Exact confidence interval for $\mbox{\boldmath$p$}$:} \\ The first interval for proportions above (normal approximation) produces intervals that are too narrow when the sample size is small. The coverage is below $1-\alpha$. The following exact (or Clopper-Pearson) improves the low coverage of the normal approximation confidence interval. The exact confidence interval however has higher coverage than $1-\alpha$. $\left[1+\frac{n-x+1}{xF_{1-\frac{\alpha}{2};2x,2(n-x+1)}}\right]^{-1} < p < \left[1+\frac{n-x}{(x+1)F_{\frac{\alpha}{2};2(x+1),2(n-x)}}\right]^{-1}$ where, $x$ is the number of successes among $n$ trials, and $F_{a,b,c}$ is the $a$ quantile of the $F$ distribution with numerator degrees of freedom $b$ and denominator degrees of freedom $c$. \\

\noindent {\bf SOCR investigation:}

\clearpage \noindent {\bf E. Confidence interval for the population variance $\mbox{\boldmath$\sigma^2$}$ of normal distribution:} \\

\noindent Let $X_1, X_2, \cdots, X_n$ random sample from $N(\mu, \sigma)$. It is known that $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}$. Therefore, $P\left(\chi^2_{\frac{\alpha}{2}; n-1} \le \frac{(n-1)S^2}{\sigma^2} \le \chi^2_{1-\frac{\alpha}{2}; n-1} \right)=1-\alpha$ where $\chi^2_{\frac{\alpha}{2};n-1}$ and $\chi^2_{1-\frac{\alpha}{2};n-1}$ are defined as follows:

\begin{figure}[h] \vspace{-0.9in} \hspace{.3in} \includegraphics[height=5.0in,width=6.5in]{chi_conf_alpha.pdf} \vspace{-0.9in} \end{figure}

\noindent Some examples on how to find the values $\chi^2_{\frac{\alpha}{2};n-1}$ and $\chi^2_{1-\frac{\alpha}{2};n-1}$: \\

\begin{tabular}{|l|r|r|r|} \hline $1-\alpha$ &$n$ &$\chi^2_{\frac{\alpha}{2};n-1}$ &$\chi^2_{1-\frac{\alpha}{2};n-1}$ \\ \hline 0.90 &4 &0.352 &7.81 \\ \hline 0.95 & 16 &6.26 &27.49 \\ \hline 0.98 &25 &10.86 &42.98 \\ \hline 0.99 &41 &20.71 &66.77 \\ \hline \end{tabular}

\vspace{0.15in} \noindent After rearranging the inequality above we get: %$$$P\left(\frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2};n-1}^2} \le \sigma^2 \le \frac{(n-1)s^2}{\chi_{\frac{\alpha}{2};n-1}^2}\right)=1-\alpha$$ %$ We say that we are $1-\alpha$ confident that the population variance $\sigma^2$ falls in the interval: $\left[\frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2};n-1}^2}, \ \ \frac{(n-1)s^2}{\chi_{\frac{\alpha}{2};n-1}^2}\right]$

\clearpage \noindent {\bf Commnet:} When the sample size $n$ is large the $\chi^2_{n-1}$ distribution can be approximated by $N(n-1, \sqrt{2(n-1)})$. Therefore, in this situation, the confidence interval for the variance can be computed as follows:

$\frac{s^2}{1+z_{\frac{\alpha}{2}}\sqrt{\frac{2}{n-2}}} \le \sigma^2 \le \frac{s^2}{1-z_{\frac{\alpha}{2}}\sqrt{\frac{2}{n-2}}}$

\noindent {\bf Example:} \\ A precision instrument is guaranteed to read accurately to within 2 units. A sample of 4 instrument readings on the same object yielded the measurements 353, 351, 351, and 355. Find a $90 \%$ confidence interval forn the population variance. Assume that these observations were selected from a population that follows the normal distribution. \\

\noindent {\bf SOCR investigation:} \\ Using the SOCR confidence intervals applet we run the following simulation experiment: normal distribution with mean 5 and standard deviation 2, sample size 30, confidence intervals 50, and confidence level 0.95.

\begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci10.png} \end{figure}

\clearpage \noindent However, if the population is not normal the coverage is poor and this can be seen at the following SOCR example. Consider the exponential distribution with $\lambda=2$ (variance is $\sigma^2=0.25$). If we use the confidence interval based on the $\chi^2$ distribution described above we obtain the following results (first with sample size 30 and then sample size 300).

\begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci11.png} \end{figure}

\begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci12.png} \end{figure}

\vspace{0.2in} We observe that regardless of the sample size the coverage is poor.

\clearpage \noindent In these situations (sampling from non-normal populations) an asymptotically distribution-free confidence interval for the variance can be obtained using the large sample theory result: $\sqrt{n}(s^2-\sigma^2) \rightarrow N\left(0, \sqrt{\mu_4-\sigma^4}\right)$ or, $\frac{\sqrt{n}(s^2-\sigma^2)}{ \sqrt{\mu_4-\sigma^4}} \rightarrow N(0,1)$ where, $\mu_4=E(X-\mu)^4$ is the fourth moment of the distribution. Of course, $\mu_4$ is unknown and will be estimated by the fourth sample moment $m_4=\frac{1}{n}\sum_{i=1}^n(X_i-\bar X)^4$. The confidence interval for the population variance is computed as follows: $s^2 - z_{\frac{\alpha}{2}} \frac{\sqrt{m_4-s^4}}{\sqrt{n}} \le \sigma^2 \le s^2 + z_{\frac{\alpha}{2}} \frac{\sqrt{m_4-s^4}}{\sqrt{n}}$

\noindent Using SOCR (exponential distribution with $\lambda=2$, sample size 300, number of intervals 50, confidence level 0.95) we see that the coverage of this interval is approximately $95 \%$. \begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci13.png} \end{figure}

\noindent The coverage is about $95 \%$.

\clearpage \noindent {\bf F. Confidence intervals for the population parameters of a distribution based on the asymptotic properties of maximum likelihood estimates:} \\

\noindent To construct confidence intervals for parameter of a distribution the following method can be used based on the large sample theory of maximum likelihood estimates. \\

\noindent As the sample size $n$ increases it can be shown that the maximum likelihood estimate $\hat \theta$ of a parameter $\theta$ follows approximately normal distribution with mean $\theta$ and variance equal to the lower bound of the Cramer-Rao inequality. $\hspace{-.3in} \hat \theta \sim N\left(\theta, \sqrt{\frac{1}{nI(\theta)}}\right), \ \ \mbox{where} \ \ \sqrt{\frac{1}{nI(\theta)}} \ \ \mbox{is the lower bound of the Cramer-Rao inequality}.$ Because $I(\theta)$ (Fisher's information) is a function of the unknown parameter $\theta$ we replace $\theta$ with its maximum likelihood estimate $\hat \theta$ to get $I(\hat \theta)$. \\

\noindent Since, $\hspace{-.3in} Z=\frac{\hat \theta - \theta}{\sqrt{\frac{1}{nI(\hat \theta)}}},$ we can write $\hspace{-.3in} P(-z_{\frac{\alpha}{2}} \le Z \le z_{\frac{\alpha}{2}})$ We replace $Z$ with $Z=\frac{\hat \theta - \theta}{\sqrt{\frac{1}{nI(\hat \theta)}}}$ to get $\hspace{-.3in} P\left(-z_{\frac{\alpha}{2}} \le \frac{\hat \theta - \theta}{\sqrt{\frac{1}{nI(\hat \theta)}}} \le z_{\frac{\alpha}{2}}\right)$ And finally, $\hspace{-.3in} P\left(\hat \theta -z_{\frac{\alpha}{2}} \sqrt{\frac{1}{nI(\hat \theta)}} \le \theta \le \hat \theta + z_{\frac{\alpha}{2}} \sqrt{\frac{1}{nI(\hat \theta)}} \right)$ Therefore we are $1-\alpha$ confident that $\theta$ falls in the interval $\hspace{-.3in} \hat \theta \pm z_{\frac{\alpha}{2}} \sqrt{\frac{1}{nI(\hat \theta)}}$

\clearpage \noindent {\bf Example:} \\ Use the result above to construct a confidence interval for the Poisson parameter $\lambda$. Let $X_1, X_2, \cdots, X_n$ be independent and identically distributed random variables from a Poisson distribution with parameter $\lambda$. \\

\noindent We know that the maximum likelihood estimate of $\lambda$ is $\hat \lambda=\bar x$. We need to find the lower bound of the Cramer-Rao inequality: $f(x)=\frac{\lambda e^{-\lambda}}{x!} \Rightarrow lnf(x) = xln\lambda - \lambda -lnx!$ Let's find the first and second derivatives w.r.t. $\lambda$. $\frac{\partial {lnf(x)}}{\partial \lambda}=\frac{x}{\lambda}-1 \ \ \mbox{and} \ \ \ \frac{\partial^2{lnf(x)}}{\partial \lambda^2}=-\frac{x}{\lambda^2}.$

Therefore, $\frac{1}{-nE\left(\frac{\partial^2 lnf(x)}{\partial \lambda^2}\right)}=\frac{1}{-nE(-\frac{X}{\lambda^2})}= \frac{\lambda^2}{n}=\frac{\lambda}{n}.$ Therefore when $n$ is large $\hat \lambda$ follows approximately $\hat \lambda \sim N\left(\lambda, \sqrt{\frac{\lambda}{n}}\right)$ Because $\lambda$ is unknown we replace it with its mle estimate $\hat \lambda$: $\hat \lambda \sim N\left(\lambda, \sqrt{\frac{\hat \lambda}{n}}\right) \ \ \mbox{or} \ \ \hat \lambda \sim N\left(\lambda, \sqrt{\frac{\bar X}{n}}\right)$ Therefore, the confidence interval for $\lambda$ is: $\bar X \pm z_{\frac{\alpha}{2}} \sqrt{\frac{\bar X}{n}}$

\noindent {\bf Application:} \\ The number of pine trees at a certain forest follows the Poisson distribution with unknown parameter $\lambda$ per acre. A random sample of size $n=50$ acres is selected and the number of pine trees in each acre is counted. Here are the results: \begin{verbatim} 7 4 5 3 1 5 7 6 4 3 2 6 6 9 2 3 3 7 2 5 5 4 4 8 8 7 2 6 3 5 0 5 8 9 3 4 5 4 6 1 0 5 4 6 3 6 9 5 7 6 \end{verbatim} The sample mean is $\bar x=4.76$. Therefore a $95 \%$ confidence interval for the parameter $\lambda$ is $4.76 \pm 1.96 \sqrt{\frac{4.76}{50}} \ \ \mbox{or} \ \ 4.76 \pm 0.31$ Therefore $4.15 \le \lambda \le 5.34$.

\clearpage \noindent Exponential distribution: \\ Verify that for the parameter $\lambda$ of the exponential distribution the confidence interval obtained by this method is given as follows: $\frac{1}{\bar x} \pm z_{\frac{\alpha}{2}} \sqrt{\frac{1}{n \bar x^2}}$

The following SOCR simulations refer to \begin{itemize} \item[a.] Poisson distribution, $\lambda=5$, sample size 40, number of intervals 50, confidence level 0.95. \item[b.] Exponential distribution, $\lambda=0.5$, sample size 30, number of intervals 50, confidence level 0.95. \end{itemize}

\begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci14.png} \end{figure}

\begin{figure}[h] \hspace{.3in} \includegraphics[height=2.0in,width=4.5in]{ci15.png} \end{figure}

## References

• Mega, M., Dinov, I., Thompson, P., Manese, M., Lindshield, C., Moussai, J., Tran, N., Olsen, K., Felix, J., Zoumalan, C., Woods, R., Toga, A., and Mazziotta, J. (2005). Automated brain tissue assessment in the elderly and demented population: Construction and validation of a sub-volume probabilistic brain atlas. NeuroImage, 26(4), 1009-1018.
• Stewarty, C. (1999). Robust Parameter Estimation in Computer Vision. SIAM Review, 41(3), 513–537.
• Wolfram, S. (2002). A New Kind of Science, Wolfram Media Inc.