AP Statistics Curriculum 2007 Distrib Dists

From Socr

(Difference between revisions)

Jump to: navigation, search

Current revision as of 19:35, 23 June 2012

General Advance-Placement (AP) Statistics Curriculum - Geometric, HyperGeometric, Negative Binomial Random Variables and Experiments

Geometric

Definition: The Geometric Distribution is the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set {1, 2, 3, ...}. The name geometric is a direct derivative from the mathematical notion of geometric series.

Mass Function: If the probability of successes on each trial is P(success)=p, then the probability that x trials are needed to get one success is $P(X = x) = (1 - p)^{x-1} \times p$ , for x = 1, 2, 3, 4,....

Expectation: The Expected Value of a geometrically distributed random variable X is ${1\over p}.$ This is because geometric series have this property:

$\sum_{k=0}^{n} p(1-p)^k = p(1-p)^0+p(1-p)^1+p(1-p)^2+p(1-p)^3+\cdots+p(1-p)^n.$

Let r=(1-p), then p=(1-r) and $\sum_{k=0}^{n} p(1-p)^k = \begin{align} (1-r) \sum_{k=0}^{n} r^k & = (1-r)(r^0 + r^1+r^2+r^3+\cdots+r^n) \\ & = r^0 + r^1+r^2+r^3+\cdots+r^n \\ & -( r^1+r^2+r^3+\cdots +r^n + r^{n+1}) \\ & = r^0 - r^{n+1} = 1 - r^{n+1}. \end{align}$

Thus: $\sum_{k=0}^{n} p(1-p)^k = \frac{p - pr^{n+1}}{1-r} = 1-pr^{n+1},$ which converges to 1, as $n\longrightarrow \infty,$ , and hence the above geometric density is well defined.

Denote the geometric expectation by E = E(X) = $\sum_{k=0}^{\infty} kpr^k$ , where r=1-p. Then $pE = E - (1-p)E = \sum_{k=0}^{\infty} kpr^k - (\sum_{k=0}^{\infty} kpr^{k+1})=$ $\sum_{k=0}^{\infty} pr^k = 1$ . Therefore, $E = \frac{1}{p}$ .

Variance: The Variance is ${1-p\over p^2}.$

Example: See this SOCR Geometric distribution activity.

The Geometric distribution gets its name because its probability mass function is a geometric progression. It is the discrete analogue of the Exponential distribution and is also known as Furry distribution.

HyperGeometric

The hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement. An experimental design for using Hypergeometric distribution is illustrated in this table:

Type	Drawn	Not-Drawn	Total
Defective	k	m-k	m
Non-Defective	n-k	N+k-n-m	N-m
Total	n	N-n	N

Explanation: Suppose there is a shipment of N objects in which m are defective. The Hypergeometric Distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective.

Mass function: The random variable X follows the Hypergeometric Distribution with parameters N, m and n, then the probability of getting exactly k successes is given by

$P(X=k) = {{{m \choose k} {{N-m} \choose {n-k}}}\over {N \choose n}}.$

This formula for the Hypergeometric Mass Function may be interpreted as follows: There are ${{N}\choose{n}}$ possible samples (without replacement). There are ${{m}\choose{k}}$ ways to obtain k defective objects and there are ${{N-m}\choose{n-k}}$ ways to fill out the rest of the sample with non-defective objects.

The mean and variance of the hypergeometric distribution have the following closed forms:

Mean: $n \times m\over N$

Variance: ${ {nm\over N} ( 1-{m\over N} ) (N-n)\over N-1}$

Examples

SOCR Activity: The SOCR Ball and Urn Experiment provides a hands-on demonstration of the utilization of Hypergeometric distribution in practice. This activity consists of selecting n balls at random from an urn with N balls, R of which are red and the other N - R green. The number of red balls Y in the sample is recorded on each update. The distribution and moments of Y are shown in blue in the distribution graph and are recorded in the distribution table. On each update, the empirical density and moments of Y are shown in red in the distribution graph and are recorded in the distribution table. Either of two sampling models can be selected with the list box: with replacement and without replacement. The parameters N, R, and n can vary with scroll bars.

A lake contains 1,000 fish; 100 are randomly caught and tagged. Suppose that later we catch 20 fish. Use SOCR Hypergeometric Distribution to:
- Compute the probability mass function of the number of tagged fish in the sample of 20.
- Compute the expected value and the variance of the number of tagged fish in this sample.
- Compute the probability that this random sample contains more than 3 tagged fish.

Hypergeometric distribution may also be used to estimate the population size: Suppose we are interested in determining the population size. Let N = number of fish in a particular isolated region. Suppose we catch, tag and release back M=200 fish. Several days later, when the fish are randomly mixed with the untagged fish, we take a sample of n=100 and observe m=5 tagged fish. Suppose p=200/N is the population proportion of tagged fish. Notice that when sampling fish, we sample without replacement. Thus, hypergeometric is the exact model for this process. Assuming the sample-size (n) is < 5% of the population size(N), we can use binomial approximation to hypergeometric. Thus if the sample of n=100 fish had 5 tagged, the sample-proportion (estimate of the population proportion) will be $\hat{p}={5\over 100}=0.05$ . Thus, we can estimate that $0.05=\hat{p}={200\over N}$ , and $N\approx 4,000$ , as shown in the figure below.

You can also see a manual calculation example using the hypergeometric distribution here.

Negative Binomial

The family of Negative Binomial Distributions is a two-parameter family; p and r with 0 < p < 1 and r > 0. There are two (identical) combinatorial interpretations of Negative Binomial processes (X or Y).

X=Trial index (n) of the r^th success, or Total # of experiments (n) to get r successes

Probability Mass Function: $P(X=n) = {n-1 \choose r-1}\cdot p^r \cdot (1-p)^{n-r} \!$ , for n = r,r+1,r+2,.... (n=trial number of the r^th success)
Mean: $E(X)= {r \over p}$
Variance: $Var(X)= {r(1-p) \over p^2}$

Y = Number of failures (k) to get r successes

Probability Mass Function: $P(Y=k) = {k+r-1 \choose k}\cdot p^r \cdot (1-p)^k \!$ , for k = 0,1,2,.... (k=number of failures before the r^th successes)
$Y \sim NegBin(r, p)$ , the probability of k failures and r successes in n=k+r Bernoulli(p) trials with success on the last trial.
Mean: $E(Y)= {r(1-p) \over p}$ .
Variance: $Var(Y)= {r(1-p) \over p^2}$ .
Note that X = Y + r, and E(X) = E(Y) + r, whereas VAR(X)=VAR(Y).

SOCR Negative Binomial Experiment

Application

Suppose Jane is promoting and fund-raising for a presidential candidate. She wants to visit all 50 states and she's pledged to get all electoral votes of 6 states before she and the candidate she represents are satisfied. In every state, there is a 30% chance that Jane will be able to secure all electoral votes and a 70% chance that she'll fail.

What's the probability mass function of the number of failures (k=n-r) to get r=6 successes?

In other words, what's the probability mass function that the last 6^th state she succeeds to secure all electoral votes happens to be at the n^th state she campaigns in?

NegBin(r, p) distribution describes the probability of k failures and r successes in n=k+r Bernoulli(p) trials with success on the last trial. Looking to secure the electoral votes for 6 states means Jane needs to get 6 successes before she (and her candidate) is happy. The number of trials (i.e., states visited) needed is n=k+6. The random variable we are interested in is X={number of states visited to achieve 6 successes (secure all electoral votes within these states)}. So, n = k+6, and $X\sim NegBin(r=6, p=0.3)$ . Thus, for $n \geq 6$ , the mass function (giving the probabilities that Jane will visit n states before her ultimate success is:

$P(X=n) = {n-1 \choose r-1}\cdot p^r \cdot (1-p)^{n-r} = {n - 1 \choose r-1} \cdot 0.3^6 \cdot 0.7^{n-r}$

What's the probability that Jane finishes her campaign in the 10^th state?

Let $X\sim NegBin(r=6, p=0.3)$ , then $P(X=10) = {10-1 \choose 6-1}\cdot 0.3^6 \cdot 0.7^{10-6} = 0.022054.$

What's the probability that Jane finishes campaigning on or before reaching the 8^th state?

$P(X\leq 8) = 0.011292$

Suppose the success of getting all electoral votes within a state is reduced to only 10%, then X~NegBin(r=6, p=0.1). Notice that the shape and domain the Negative-Binomial distribution significantly chance now (see image below).

What's the probability that Jane covers all 50 states but fails to get all electoral votes in any 6 states (as she had hoped for)?

$P(X\geq 50) = 0.632391$

SOCR Activity: If you want to see an interactive Negative-Binomial Graphical calculator you can go to this applet (select Negative Binomial) and see this activity.

Normal approximation to Negative Binomial distribution

The central limit theorem provides the foundation for approximation of negative binomial distribution by Normal distribution. Each negative binomial random variable, $V_k \sim NB(k,p)$, may be expressed as a sum of k independent, identically distributed (geometric) random variables $\{X_i\}$, i.e., $ V_k = \sum_{i=1}^k{X_i}$, where $ X_i \sim Geometric(p)$. In various scientific applications, given a large k, the distribution of $V_k$ is approximately normal with mean and variance given by $\mu=k\frac{1}{p}$ and $\sigma^2=k\frac{1-p}{p^2}$, as $k \longrightarrow \infty$. Depending on the parameter p, k may need to be rather large for the approximation to work well. Also, when using the normal approximation, we should remember to use the continuity correction, since the negative binomial and Normal distributions are discrete and continuous, respectively.

In the above example, $P(X\le 8)$, $V_k \sim NegBin(k=r=6, p=0.3)$, the normal distribution approximation, $N(\mu=\frac{k}{p}=20, \sigma=\sqrt{k\frac{1-p}{p^2}}=6.83)$, is shown it the following image and table:

The probabilities of the real Negative Binomial and approximate Normal distributions (on the range [2:4]) are not identical but are sufficiently close.

Summary	$NegativeBinomial(k=6,p=0.3)$	$Normal(\mu=20, \sigma=6.83)$
Mean	20.0	20.0
Median	19.0	20.0
Variance	46.666667	46.6489
Standard Deviation	6.831301	6.83
Max Density	0.062439	0.058410
Probability Areas
$\le 8$	.011292	0.039433
>8	.988708	0.960537

Negative Multinomial Distribution (NMD)

The Negative Multinomial Distribution is a generalization of the two-parameter Negative Binomial distribution (NB(r,p)) to $m\ge 1$ outcomes. Suppose we have an experiment that generates $m\ge 1$ possible outcomes, $\{X_0,\cdots,X_m\}$ , each occurring with probability $\{p_0,\cdots,p_m\}$ , respectively, where with $0 < p i < 1$ and $\sum_{i=0}^m{p_i}=1$ . That is, $p_0 = 1-\sum_{i=1}^m{p_i}$ . If the experiment proceeds to generate independent outcomes until $\{X_0, X_1, \cdots, X_m\}$ occur exactly $\{k_0, k_1, \cdots, k_m\}$ times, then the distribution of the m-tuple $\{X_1, \cdots, X_m\}$ is Negative Multinomial with parameter vector $(k_0,\{p_1,\cdots,p_m\})$ . Notice that the degree-of-freedom here is actually m, not (m+1). That is why we only have a probability parameter vector of size m, not (m+1), as all probabilities add up to 1 (so this introduces one relation). Contrast this with the combinatorial interpretation of Negative Binomial (special case with m=1):

X ˜ N e g a t i v e B i n o m i a l (N u m b e r O f S u c c e s s e s = r, P r o b O f S u c c e s s = p)

,

X=Total # of experiments (n) to get r successes (and therefore n-r failures);

X ˜ N e g a t i v e M u l t i n o m i a l (k 0,{p 0, p 1})

,

X=Total # of experiments (n) to get $k 0$ (default variable, $X o$ ) and $n - k 0$ outcomes of the other possible outcome ( $X 1$ ).

Negative Multinomial Summary

Probability Mass Function: $P(k_1, \cdots, k_m|k_0,\{p_1,\cdots,p_m\}) = \left (\sum_{i=0}^m{k_i}-1\right)!\frac{p_0^{k_0}}{(k_0-1)!} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}}$ , or equivalently:

$P(k_1, \cdots, k_m|k_0,\{p_1,\cdots,p_m\}) = \Gamma\left(\sum_{i=1}^m{k_i}\right)\frac{p_0^{k_0}}{\Gamma(k_0)} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}}$ , where

Γ(x)

is the Gamma function.

Mean (vector): $\mu=E(X_1,\cdots,X_m)= (\mu_1=E(X_1), \cdots, \mu_m=E(X_m)) = \left ( \frac{k_0p_1}{p_0}, \cdots, \frac{k_0p_m}{p_0} \right)$ .
Variance-Covariance (matrix): $C o v (X i, X j) = {c o v [i, j]}$ , where

$cov[i,j] = \begin{cases} \frac{k_0 p_i p_j}{p_0^2},& i\not= j,\\ \frac{k_0 p_i (p_i + p_0)}{p_0^2},& i=j.\end{cases}$ .

Cancer Example

The Probability Theory Chapter of the EBook shows the following example using 400 Melanoma (skin cancer) Patients where the Type and Site of the cancer are recorded for each subject, as shown in the Table below.

Type	Site			Totals
Type	Head and Neck	Trunk	Extremities	Totals
Hutchinson's melanomic freckle	22	2	10	34
Superficial	16	54	115	185
Nodular	19	33	73	125
Indeterminant	11	17	28	56
Column Totals	68	106	226	400

The sites (locations) of the cancer may be independent, but there may be positive dependencies of the type of cancer for a given location (site). For example, localized exposure to radiation implies that elevated level of one type of cancer (at a given location) may indicate higher level of another cancer type at the same location. We want to use the Negative Multinomial distribution to model the sites cancer rates and try to measure some of the cancer type dependencies within each location.

Let's denote by $x i, j$ the cancer rates for each site ( $0\leq i \leq 2$ ) and each type of cancer ( $0\leq j \leq 3$ ). For each (fixed) site ( $0\leq i \leq 2$ ), the cancer rates are independent Negative Multinomial distributed random variables. That is, for each column index (site) the column-vector X has the following distribution:

X = {X 1, X 2, X 3}˜ N M D (k 0,{p 1, p 2, p 3})

.

Different columns (sites) are considered to be different instances of the random negative-multinomially distributed vector, X. Then we have the following estimates:

MLE estimate of the Mean: is given by:

$\hat{\mu}_{i,j} = \frac{x_{i,.}\times x_{.,j}}{x_{.,.}}$

$x_{i,.} = \sum_{j=0}^{3}{x_{i,j}}$

$x_{.,j} = \sum_{i=0}^{2}{x_{i,j}}$

$x_{.,.} = \sum_{i=0}^{2}\sum_{j=0}^{3}{{x_{i,j}}}$

Example: $\hat{\mu}_{1,1} = \frac{x_{1,.}\times x_{.,1}}{x_{.,.}}=\frac{34\times 68}{400}=5.78$

Variance-Covariance: For a single column vector, $X = {X 1, X 2, X 3}˜ N M D (k 0,{p 1, p 2, p 3})$ , covariance between any pair of Negative Multinomial counts ( $X i$ and $X j$ ) is:

$cov[X_i,X_j] = \begin{cases} \frac{k_0 p_i p_j}{p_0^2},& i\not= j,\\ \frac{k_0 p_i (p_i + p_0)}{p_0^2},& i=j.\end{cases}$ .

Example: For the first site (Head and Neck, j=0), suppose that $X=\left \{X_1=5, X_2=1, X_3=5\right \}$ and

X ˜ N M D (k 0 = 10,{p 1 = 0.2, p 2 = 0.1, p 3 = 0.2})

. Then:

$p_0 = 1 - \sum_{i=1}^3{p_i}=0.5$

N M D (X | k 0,{p 1, p 2, p 3}) = 0.00465585119998784

$cov[X_1,X_3] = \frac{10 \times 0.2 \times 0.2}{0.5^2}=1.6$

$\mu_2=\frac{k_0 p_2}{p_0} = \frac{10\times 0.1}{0.5}=2.0$

$\mu_3=\frac{k_0 p_3}{p_0} = \frac{10\times 0.2}{0.5}=4.0$

$corr[X_2,X_3] = \left (\frac{\mu_2 \times \mu_3}{(k_0+\mu_2)(k_0+\mu_3)} \right )^{\frac{1}{2}}$ and therefore, $corr[X_2,X_3] = \left (\frac{2 \times 4}{(10+2)(10+4)} \right )^{\frac{1}{2}} = 0.21821789023599242.$

You can also use the interactive SOCR negative multinomial distribution calculator to compute these quantities, as shown on the figure below.

There is no MLE estimate for the NMD $k 0$ parameter (see this reference). However, there are approximate protocols for estimating the $k 0$ parameter, see the example below.

Correlation: correlation between any pair of Negative Multinomial counts ( $X i$ and $X j$ ) is:

$Corr[X_i,X_j] = \begin{cases} \left (\frac{\mu_i \times \mu_j}{(k_0+\mu_i)(k_0+\mu_j)} \right )^{\frac{1}{2}} = \left (\frac{p_i p_j}{(p_0+p_i)(p_0+p_j)} \right )^{\frac{1}{2}}, & i\not= j, \\ 1, & i=j.\end{cases}$ .

The marginal distribution of each of the $X i$ variables is negative binomial, as the $X i$ count (considered as success) is measured against all the other outcomes (failure). But jointly, the distribution of $X=\{X_1,\cdots,X_m\}$ is negative multinomial, i.e., $X \sim NMD(k_0,\{p_1,\cdots,p_m\})$ .

Notice that the pair-wise NMD correlations are always positive, where as the correlations between multinomail counts are always negative. Also note that as the parameter $k 0$ increases, the paired correlations go to zero! Thus, for large $k 0$ , the Negative Multinomial counts $X i$ behave as independent Poisson random variables with respect to their means $\left ( \mu_i= k_0\frac{p_i}{p_0}\right )$ .

Parameter estimation

Estimation of the mean (expected) frequency counts ( $μ j$ ) of each outcome ( $X j$ ):

The MLE estimates of the NMD mean parameters

μ j

are easy to compute.

If we have a single observation vector $\{x_1, \cdots,x_m\}$ , then $\hat{\mu}_i=x_i.$

If we have several observation vectors, like in this case we have the cancer type frequencies for 3 different sites, then the MLE estimates of the mean counts are $\hat{\mu}_j=\frac{x_{j,.}}{I}$ , where $0\leq j \leq J$ is the cancer-type index and the summation is over the number of observed (sampled) vectors (I).

For the cancer data above, we have the following MLE estimates for the expectations for the frequency counts:

Hutchinson's melanomic freckle type of cancer (

X 0

) is $\hat{\mu}_0 = 34/3=11.33$ .

Superficial type of cancer (

X 1

) is $\hat{\mu}_1 = 185/3=61.67$ .

Nodular type of cancer (

X 2

) is $\hat{\mu}_2 = 125/3=41.67$ .

Indeterminant type of cancer (

X 3

) is $\hat{\mu}_3 = 56/3=18.67$ .

Estimation of the $k 0$ (gamma) parameter:

There is no MLE for the

k 0

parameter; however, there is a protocol for estimating

k 0

using the chi-squared goodness of fit statistic. In the usual chi-squared statistic:

$\Chi^2 = \sum_i{\frac{(x_i-\mu_i)^2}{\mu_i}}$ , we can replace the expected-means (

μ i

) by their estimates, $\hat{\mu_i}$ , and replace denominators by the corresponding negative multinomial variances. Then we get the following test statistic for negative multinomial distributed data:

$\Chi^2(k_0) = \sum_{i}{\frac{(x_i-\hat{\mu_i})^2}{\hat{\mu_i} \left (1+ \frac{\hat{\mu_i}}{k_0} \right )}}$ .

Now we can derive a simple method for estimating the

k 0

parameter by varying the values of

k 0

in the expression

Χ 2 (k 0)

and matching the values of this statistic with the corresponding asymptotic chi-squared distribution. The following protocol summarizes these steps using the cancer data above:

DF: The degree of freedom for the Chi-square distribution in this case is:

df = (# rows – 1)(# columns – 1) = (3-1)*(4-1) = 6

Median: The median of a chi-squared random variable with 6 df is 5.261948.

Mean Counts Estimates: The mean counts estimates ( $μ j$ ) for the 4 different cancer types are:

$\hat{\mu}_1 = 185/3=61.67$ ; $\hat{\mu}_2 = 125/3=41.67$ ; and $\hat{\mu}_3 = 56/3=18.67$ .

Thus, we can solve the equation above $Χ 2 (k 0) = 5.261948$ for the single variable of interest -- the unknown parameter $k 0$ . Suppose we are using the same example as before, $x = {x 1 = 5, x 2 = 1, x 3 = 5}$ . Then the solution is an asymptotic chi-squared distribution driven estimate of the parameter $k 0$ .

$\Chi^2(k_0) = \sum_{i=1}^3{\frac{(x_i-\hat{\mu_i})^2}{\hat{\mu_i} \left (1+ \frac{\hat{\mu_i}}{k_0} \right )}}$ . $\Chi^2(k_0) = \frac{(5-61.67)^2}{61.67(1+61.67/k_0)}+\frac{(1-41.67)^2}{41.67(1+41.67/k_0)}+\frac{(5-18.67)^2}{18.67(1+18.67/k_0)}=5.261948.$ Solving this equation for $k 0$ provides the desired estimate for the last parameter.

Mathematica provides 3 distinct (

k 0

) solutions to this equation: {50.5466, -21.5204, 2.40461}. Since

k 0 > 0

there are 2 candidate solutions.

Estimates of Probabilities: Assume $k 0 = 2$ and $\frac{\mu_i}{k_0}p_0=p_i$ , we have:

$\frac{61.67}{k_0}p_0=31p_0=p_1$

20 p 0 = p 2

9 p 0 = p 3

Hence,

1 - p 0 = p 1 + p 2 + p 3 = 60 p 0

. Therefore, $p_0=\frac{1}{61}$ , $p_1=\frac{31}{61}$ , $p_2=\frac{20}{61}$ and $p_3=\frac{9}{61}$ .

Therefore, the best model distribution for the observed sample

x = {x 1 = 5, x 2 = 1, x 3 = 5}

is $X \sim NMD\left (2, \left \{\frac{31}{61}, \frac{20}{61},\frac{9}{61}\right\} \right ).$

Notice that in this calculation, we explicitly used the complete cancer data table, not only the sample

x = {x 1 = 5, x 2 = 1, x 3 = 5}

, as we need multiple samples (multiple sites or columns) to estimate the

k 0

parameter.

SOCR Negative Multinomial Distribution Calculator

SOCR Negative Multinomial Distribution Calculator.

Problems

References

Negative-Binomial Activity
Le Gall, F. The modes of a negative multinomial distribution, Statistics & Probability Letters, 2005.
Johnson et al., 1997 Johnson, N.L., Kotz, S., Balakrishnan, N., 1997. Discrete Multivariate Distributions. Wiley Series in Probability and Mathematical Statistics.
Kotz and Johnson, 1982 In: S. Kotz and N.L. Johnson, Editors, Encyclopedia of Statistical Sciences, Wiley, New York (1982).

SOCR Home page: http://www.socr.ucla.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 6: / Line 6: @@
 *Mass Function: If the probability of successes on each trial is P(success)=p, then the probability that x trials are needed to get one success is <math>P(X = x) = (1 - p)^{x-1} \times p</math>, for x = 1, 2, 3, 4,....
-* Expectation: The [[AP_Statistics_Curriculum_2007_Distrib_MeanVar | Expected Value]] of a geometrically distributed random variable ''X'' is <math>{1\over p}.</math>
+* Expectation: The [[AP_Statistics_Curriculum_2007_Distrib_MeanVar | Expected Value]] of a geometrically distributed random variable ''X'' is <math>{1\over p}.</math> This is because [http://en.wikipedia.org/wiki/Geometric_progression geometric series have this property]:
+:<math>\sum_{k=0}^{n} p(1-p)^k = p(1-p)^0+p(1-p)^1+p(1-p)^2+p(1-p)^3+\cdots+p(1-p)^n.</math>
+: Let r=(1-p), then p=(1-r) and <math>\sum_{k=0}^{n} p(1-p)^k = \begin{align}
+(1-r) \sum_{k=0}^{n} r^k & = (1-r)(r^0 + r^1+r^2+r^3+\cdots+r^n) \\
+                          & = r^0 + r^1+r^2+r^3+\cdots+r^n \\
+                          & -( r^1+r^2+r^3+\cdots +r^n + r^{n+1}) \\
+                          & = r^0 - r^{n+1} = 1 - r^{n+1}.
+\end{align}</math>
+: Thus: <math>\sum_{k=0}^{n} p(1-p)^k = \frac{p - pr^{n+1}}{1-r} = 1-pr^{n+1},</math> which converges to 1, as <math>n\longrightarrow \infty,</math>, and hence the above geometric density is well defined.
+: Denote the geometric expectation by E = E(X) = <math>\sum_{k=0}^{\infty} kpr^k</math>, where r=1-p. Then <math>pE = E - (1-p)E = \sum_{k=0}^{\infty} kpr^k - (\sum_{k=0}^{\infty} kpr^{k+1})=</math> <math>\sum_{k=0}^{\infty} pr^k = 1</math>. Therefore, <math>E = \frac{1}{p}</math>.
 *Variance: The [[AP_Statistics_Curriculum_2007_Distrib_MeanVar | Variance]] is <math>{1-p\over p^2}.</math>
 *Example: See [[SOCR_EduMaterials_Activities_Binomial_Distributions | this SOCR Geometric distribution activity]].
+* The Geometric distribution gets its name because its probability mass function is a [http://en.wikipedia.org/wiki/Geometric_progression geometric progression]. It is the discrete analogue of the Exponential distribution and is also known as Furry distribution.
 ===HyperGeometric===
@@ Line 40: / Line 52: @@
 ====Examples====
-* SOCR Activity: The [[SOCR_EduMaterials_Activities_BallAndRunExperiment | SOCR Ball and Urn Experiment]] provides a hands-on demonstration of the utilization of Hypergeometric distribution in practice. This activity consists of selecting n balls at random from an urn with N balls, R of which are red and the other N - R green. The number of red balls Y in the sample is recorded on each update. The distribution and moments of Y are shown in blue in the distribution graph and are recorded in the distribution table. On each update, the empirical density and moments of Y are shown in red in the distribution graph and are recorded in the distribution table. Either of two sampling models can be selected with the list box: with replacement and without replacement. The parameters N, R, and n can be varied with scroll bars.
+* SOCR Activity: The [[SOCR_EduMaterials_Activities_BallAndRunExperiment | SOCR Ball and Urn Experiment]] provides a hands-on demonstration of the utilization of Hypergeometric distribution in practice. This activity consists of selecting n balls at random from an urn with N balls, R of which are red and the other N - R green. The number of red balls Y in the sample is recorded on each update. The distribution and moments of Y are shown in blue in the distribution graph and are recorded in the distribution table. On each update, the empirical density and moments of Y are shown in red in the distribution graph and are recorded in the distribution table. Either of two sampling models can be selected with the list box: with replacement and without replacement. The parameters N, R, and n can vary with scroll bars.
 <center>[[Image:SOCR_Activities_BallAndUrnExperiment_SubTopic_Chui_050307_Fig2.JPG|500px]]</center>
@@ Line 49: / Line 61: @@
 <center>[[Image:SOCR_EBook_Dinov_RV_HyperGeom_013008_Fig9.jpg|500px]]</center>
-* Hypergeometric distribution may also be used to estimate the population size: Suppose we are interested in determining the population size. Let N = number of fish in a particular isolated region. Suppose we catch, tag and release back M=200 fish. Several days later, when the fish are randomly mixed with the untagged fish, we take a sample of n=100 and observe m=5 tagged fish. Suppose p=200/N is the population proportion of tagged fish. Notice that when sampling fish we sample without replacement. Thus, hypergeometric is the exact model for this process. Assuming the sample-size (n) is < 5% of the population size(N), we can use [[AP_Statistics_Curriculum_2007_Limits_Bin2HyperG |binomial approximation to hypergeometric]]. Thus if the sample of n=100 fish had 5 tagged, the sample-proportion (estimate of the population proportion) will be <math>\hat{p}={5\over 100}=0.05</math>. Thus, we can estimate that <math>0.05=\hat{p}={200\over N}</math>, and <math>N\approx 4,000</math>, as shown on the figure below.
+* Hypergeometric distribution may also be used to estimate the population size: Suppose we are interested in determining the population size. Let N = number of fish in a particular isolated region. Suppose we catch, tag and release back M=200 fish. Several days later, when the fish are randomly mixed with the untagged fish, we take a sample of n=100 and observe m=5 tagged fish. Suppose p=200/N is the population proportion of tagged fish. Notice that when sampling fish, we sample without replacement. Thus, hypergeometric is the exact model for this process. Assuming the sample-size (n) is < 5% of the population size(N), we can use [[AP_Statistics_Curriculum_2007_Limits_Bin2HyperG |binomial approximation to hypergeometric]]. Thus if the sample of n=100 fish had 5 tagged, the sample-proportion (estimate of the population proportion) will be <math>\hat{p}={5\over 100}=0.05</math>. Thus, we can estimate that <math>0.05=\hat{p}={200\over N}</math>, and <math>N\approx 4,000</math>, as shown in the figure below.
 <center>[[Image:SOCR_EBook_Dinov_Prob_HyperG_041108_Fig9a.jpg|500px]]</center>
@@ Line 72: / Line 84: @@
 ====Application====
-Suppose Jane is promoting and fund-raising for a presidential candidate.  She wants to visit all 50 states and she's pledged to get all electoral votes of 6 states before she and the candidate she represents are satisfied.  In every state, there is a 30% chance that Jane will be able to secure all electoral votes and 70% chance that she'll fail.
+Suppose Jane is promoting and fund-raising for a presidential candidate.  She wants to visit all 50 states and she's pledged to get all electoral votes of 6 states before she and the candidate she represents are satisfied.  In every state, there is a 30% chance that Jane will be able to secure all electoral votes and a 70% chance that she'll fail.
 * ''What's the probability mass function of the number of failures (''k=n-r'') to get ''r=6'' successes''?''
-: In other words, ''What's the probability mass function that the last 6<sup>th</sup> state she succeeds to secure all electoral votes happens to be the at the ''n''<sup>th</sup> state she campaigns in?''
+: In other words, ''what's the probability mass function that the last 6<sup>th</sup> state she succeeds to secure all electoral votes happens to be at the ''n''<sup>th</sup> state she campaigns in?''
 NegBin(''r'', ''p'') distribution describes the probability of ''k'' failures and ''r'' successes in ''n''=''k''+''r'' Bernoulli(''p'') trials with success on the last trial.  Looking to secure the electoral votes for 6 states means Jane needs to get 6 successes before she (and her candidate) is happy.  The number of trials (i.e., states visited) needed is ''n''=''k+6''.  The random variable we are interested in is '''X={number of states visited to achieve 6 successes (secure all electoral votes within these states)}'''. So, ''n'' = ''k+6'', and <math>X\sim NegBin(r=6, p=0.3)</math>. Thus, for <math>n \geq 6</math>, the mass function (giving the probabilities that Jane will visit n states before her ultimate success is:
@@ Line 90: / Line 102: @@
 <center>[[Image:SOCR_EBook_Dinov_RV_NegBinomial_013008_Fig5.jpg|500px]]</center>
-* Suppose the success of getting all electoral votes within a state is reduced to only 10%, then '''X~NegBin(r=6, p=0.1)'''. Notice that the shape and domain the Negative-Binomial distribution significantly chance now (see image below)!
+* Suppose the success of getting all electoral votes within a state is reduced to only 10%, then '''X~NegBin(r=6, p=0.1)'''. Notice that the shape and domain the Negative-Binomial distribution significantly chance now (see image below).
 : ''What's the probability that Jane covers all 50 states but fails to get all electoral votes in any 6 states (as she had hoped for)?''
 :<math> P(X\geq 50) = 0.632391</math>
@@ Line 96: / Line 108: @@
 * SOCR Activity: If you want to see an interactive Negative-Binomial Graphical calculator you can go to [http://socr.ucla.edu/htmls/SOCR_Experiments.html this applet (select Negative Binomial)] and see [[SOCR_EduMaterials_Activities_NegativeBinomial |this activity]].
+====Normal approximation to Negative Binomial distribution====
+The [[AP_Statistics_Curriculum_2007_Limits_CLT|central limit theorem]] provides the foundation for approximation of negative binomial distribution by [[AP_Statistics_Curriculum_2007_Normal_Std| Normal distribution]]. Each negative binomial random variable, \(V_k \sim NB(k,p)\), may be expressed as a sum of '''k''' independent, identically distributed ([[AP_Statistics_Curriculum_2007_Distrib_Dists#Geometric|geometric]]) random variables \(\{X_i\}\), i.e., \( V_k = \sum_{i=1}^k{X_i}\), where [[AP_Statistics_Curriculum_2007_Distrib_Dists |\( X_i \sim Geometric(p)\)]]. In various scientific applications, given a large '''k''', the distribution of \(V_k\) is approximately normal with mean and variance given by \(\mu=k\frac{1}{p}\) and \(\sigma^2=k\frac{1-p}{p^2}\), as \(k \longrightarrow \infty\). Depending on the parameter '''p''', '''k''' may need to be rather large for the approximation to work well. Also, when using the normal approximation, we should remember to use the continuity correction, since the negative binomial and Normal distributions are discrete and continuous, respectively.
+In the above example, \(P(X\le 8)\), \(V_k \sim NegBin(k=r=6, p=0.3)\), the normal distribution approximation, \(N(\mu=\frac{k}{p}=20, \sigma=\sqrt{k\frac{1-p}{p^2}}=6.83)\), is shown it the following image and table:
+<center>[[Image:SOCR_EBook_Dinov_RV_NegBinomial_013008_Fig4a.png|500px]]</center>
+The probabilities of the real [http://socr.ucla.edu/htmls/dist/NegativeBinomial_Distribution.html Negative Binomial] and [http://socr.ucla.edu/htmls/dist/Normal_Distribution.html approximate Normal] distributions (on the range [2:4]) are not identical but are sufficiently close.
+<center>
+{| class="wikitable" style="text-align:center; width:75%" border="1"
+|-
+! Summary|| [http://socr.ucla.edu/htmls/dist/NegativeBinomial_Distribution.html \(NegativeBinomial(k=6,p=0.3)\) ] || [http://socr.ucla.edu/htmls/dist/Normal_Distribution.html \(Normal(\mu=20, \sigma=6.83)\) ]
+|-
+| Mean||20.0||20.0
+|-
+| Median||19.0||20.0
+|-
+| Variance||46.666667||46.6489
+|-
+| Standard Deviation||6.831301||6.83
+|-
+| Max Density|| 0.062439||0.058410
+|-
+! colspan=3|Probability Areas
+|-
+| \(\le 8\)|| .011292|| 0.039433
+|-
+| >8|| .988708||0.960537
+|}
+</center>
 ===Negative Multinomial Distribution (NMD)===
-The ''Negative Multinomial Distribution'' is a generalization of the two-parameter [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial|Negative Binomial distribution]] (NB(r,p)) to <math>m\ge 1</math> outcomes. Suppose we have an experiment that generates <math>m\ge 1</math> possible outcomes, <math>\{X_0,\cdots,X_m\}</math>, each occurring with probability <math>\{p_0,\cdots,p_m\}</math>, respectively, where with <math>0<p_i<1</math> and <math>\sum_{i=0}^m{p_i}=1</math>. That is, <math>p_0 = 1-\sum_{i=1}^m{p_i}</math>. If the experiment proceeds to generate independent outcomes until <math>\{X_0, X_1, \cdots, X_m\}</math> occur exactly <math>\{k_0, k_1, \cdots, k_m\}</math> times, the distribution of the (m+1)-tuple <math>\{k_0, k_1, \cdots, k_m\}</math> is Negative Multinomial with parameter vector <math>(k_0,\{p_0,\cdots,p_m\})</math>. Contrast this with the combinatorial interpretation of Negative Binomial (special case with m=1):
+The ''Negative Multinomial Distribution'' is a generalization of the two-parameter [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial|Negative Binomial distribution]] (NB(r,p)) to <math>m\ge 1</math> outcomes. Suppose we have an experiment that generates <math>m\ge 1</math> possible outcomes, <math>\{X_0,\cdots,X_m\}</math>, each occurring with probability <math>\{p_0,\cdots,p_m\}</math>, respectively, where with <math>0<p_i<1</math> and <math>\sum_{i=0}^m{p_i}=1</math>. That is, <math>p_0 = 1-\sum_{i=1}^m{p_i}</math>. If the experiment proceeds to generate independent outcomes until <math>\{X_0, X_1, \cdots, X_m\}</math> occur exactly <math>\{k_0, k_1, \cdots, k_m\}</math> times, then the distribution of the m-tuple <math>\{X_1, \cdots, X_m\}</math> is Negative Multinomial with parameter vector <math>(k_0,\{p_1,\cdots,p_m\})</math>. Notice that the degree-of-freedom here is actually m, not (m+1). That is why we only have a probability parameter vector of size m, not (m+1), as all probabilities add up to 1 (so this introduces one relation). Contrast this with the combinatorial interpretation of Negative Binomial (special case with m=1):
 : <math>X \sim NegativeBinomial(NumberOfSuccesses=r,ProbOfSuccess=p)</math>,
-::''X=Total # of experiments (n) to get r successes'';
+::''X=Total # of experiments (n) to get r successes'' (and therefore n-r failures);
 : <math>X \sim Negative Multinomial(k_0,\{p_0,p_1\})</math>,
-:: ''X=Total # of experiments (n) to get <math>k_0</math> and <math>n-k_0</math> outcomes of each of the m+1=2 possible outcomes''.
+:: ''X=Total # of experiments (n) to get <math>k_0</math> (default variable, <math>X_o</math>) and <math>n-k_0</math> outcomes of the other possible outcome (<math>X_1</math>)''.
 ====Negative Multinomial Summary====
-* Probability Mass Function: <math> P(k_0, \cdots, k_m) = \left (\sum_{i=0}^m{k_i}-1\right)!\frac{p_0^{k_0}}{(k_0-1)!} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}}</math>, or equivalently:
+* Probability Mass Function: <math> P(k_1, \cdots, k_m|k_0,\{p_1,\cdots,p_m\}) = \left (\sum_{i=0}^m{k_i}-1\right)!\frac{p_0^{k_0}}{(k_0-1)!} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}}</math>, or equivalently:
-: <math> P(k_0, \cdots, k_m) = \Gamma\left(\sum_{i=1}^m{k_i}\right)\frac{p_0^{k_0}}{\Gamma(k_0)} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}}</math>, where <math>\Gamma(x)</math> is the [http://en.wikipedia.org/wiki/Gamma_function Gamma function].
+: <math> P(k_1, \cdots, k_m|k_0,\{p_1,\cdots,p_m\}) = \Gamma\left(\sum_{i=1}^m{k_i}\right)\frac{p_0^{k_0}}{\Gamma(k_0)} \prod_{i=1}^m{\frac{p_i^{k_i}}{k_i!}}</math>, where <math>\Gamma(x)</math> is the [http://en.wikipedia.org/wiki/Gamma_function Gamma function].
-* Mean (vector): <math>E(k_0,\cdots,k_m)= (E(k_0), \cdots, E(k_m)) = (k_0*p_0, \cdots, k_m*p_m)</math>.
+* Mean (vector): <math>\mu=E(X_1,\cdots,X_m)= (\mu_1=E(X_1), \cdots, \mu_m=E(X_m)) = \left ( \frac{k_0p_1}{p_0}, \cdots, \frac{k_0p_m}{p_0} \right)</math>.
-* Variance-Covariance (matrix): <math>Cov(k_0,\cdots,k_m)= \{cov[i,j]\}</math>, where <math> cov[i,j] = \begin{cases} \frac{k_0 * p[i] * p[j]}{p_0 * p_0},& i\not= j,\\
+* Variance-Covariance (matrix): <math>Cov(X_i,X_j)= \{cov[i,j]\}</math>, where
-\frac{k_0* p[i] * (p[i] + p_0)}{p_0 * p_0},& i=j.\end{cases}</math>.
+: <math> cov[i,j] = \begin{cases} \frac{k_0 p_i p_j}{p_0^2},& i\not= j,\\
+\frac{k_0 p_i  (p_i + p_0)}{p_0^2},& i=j.\end{cases}</math>.
 ====Cancer Example====
-The [[AP_Statistics_Curriculum_2007_Prob_Rules| Probability Theory Chapter]] of the [[EBook]] shows the following example using 400 Melanoma (skin cancer) Patients where the Type and Site of the cancer are recorded for each subject, as in the Table below.
+The [[AP_Statistics_Curriculum_2007_Prob_Rules| Probability Theory Chapter]] of the [[EBook]] shows the following example using 400 Melanoma (skin cancer) Patients where the Type and Site of the cancer are recorded for each subject, as shown in the Table below.
 <center>
 {| class="wikitable" style="text-align:center; width:75%" border="1"
@@ Line 134: / Line 179: @@
 The sites (locations) of the cancer may be independent, but there may be positive dependencies of the type of cancer for a given location (site). For example, localized exposure to radiation implies that elevated level of one type of cancer (at a given location) may indicate higher level of another cancer type at the same location. We want to use the Negative Multinomial distribution to model the sites cancer rates and try to measure some of the cancer type dependencies within each location.
-Let's denote by <math>y_{i,j}</math> the cancer rates for each site (<math>0\leq i \leq 2</math>) and each type of cancer (<math>0\leq j \leq 3</math>). For each (fixed) site (<math>0\leq i \leq 2</math>), the cancer rates are independent Negative Multinomial distributed random variables. That is, for each column index (site) the column-vector X
+Let's denote by <math>x_{i,j}</math> the cancer rates for each site (<math>0\leq i \leq 2</math>) and each type of cancer (<math>0\leq j \leq 3</math>). For each (fixed) site (<math>0\leq i \leq 2</math>), the cancer rates are independent Negative Multinomial distributed random variables. That is, for each column index (site) the column-vector X has the following distribution:
-: <math>X=\{X_0, X_1, X_2, X_3\} \sim NMD(k_0,\{p_0,p_1,p_2,p_3\})</math>.
+: <math>X=\{X_1, X_2, X_3\} \sim NMD(k_0,\{p_1,p_2,p_3\})</math>.
-Different columns (sites) are considered to be different instances of the random multinomially distributed X vector. Then we have the following estimates:
+Different columns (sites) are considered to be different instances of the random negative-multinomially distributed vector, X. Then we have the following estimates:
 * [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE |MLE estimate]] of the Mean: is given by:
-: <math>\hat{\mu}_{i,j} = \frac{x_{i,.}\times x_{.,j}}{x_{.,.}}</math>, where
+: <math>\hat{\mu}_{i,j} = \frac{x_{i,.}\times x_{.,j}}{x_{.,.}}</math>
 :: <math>x_{i,.} = \sum_{j=0}^{3}{x_{i,j}}</math>
 :: <math>x_{.,j} = \sum_{i=0}^{2}{x_{i,j}}</math>
@@ Line 146: / Line 191: @@
 :: Example: <math>\hat{\mu}_{1,1} = \frac{x_{1,.}\times x_{.,1}}{x_{.,.}}=\frac{34\times 68}{400}=5.78</math>
-* Variance-Covariance: For a single column vector, <math>X=\{X_0, X_1, X_2, X_3\} \sim NMD(k_0,\{p_0,p_1,p_2,p_3\})</math>, covariance between any pair of Negative Multinomial counts (<math>X_i</math> and <math>X_j</math>) is:
+* Variance-Covariance: For a single column vector, <math>X=\{X_1, X_2, X_3\} \sim NMD(k_0,\{p_1,p_2,p_3\})</math>, covariance between any pair of Negative Multinomial counts (<math>X_i</math> and <math>X_j</math>) is:
-: <math> cov[X_i,X_j] = \begin{cases}k_0 * p[X_i] * p[X_j] / (p_0 * p_0),& i\not= j,\\
+: <math> cov[X_i,X_j] = \begin{cases} \frac{k_0 p_i p_j}{p_0^2},& i\not= j,\\
-k_0* p[X_i] * (p[X_i] + p_0) / (p_0 * p_0),& i=j.\end{cases}</math>.
+\frac{k_0 p_i (p_i + p_0)}{p_0^2},& i=j.\end{cases}</math>.
-:: ''Example'': For the first site (Head and Neck, i=0), suppose that <math>X=\left \{X_0=3, X_1=5, X_2=7, X_3=10\right \}</math> and <math>X \sim NMD(k_0=20, \{p_0=0.3, p_1=0.2, p_2=0.1, p_3=0.2 \})</math>. Then:
+:: ''Example'': For the first site (Head and Neck, j=0), suppose that <math>X=\left \{X_1=5, X_2=1, X_3=5\right \}</math> and <math>X \sim NMD(k_0=10, \{p_1=0.2, p_2=0.1, p_3=0.2 \})</math>. Then:
-:: <math>NMD(X|k_0,\{p_0, p_1, p_2, p_3\})=1.5395445930162726E-9</math>
+:: <math>p_0 = 1 - \sum_{i=1}^3{p_i}=0.5</math>
-:: <math>cov[X_1,X_3] = \frac{20 * 0.2 * 0.2}{0.3^2}=8.89</math>
+:: <math>NMD(X|k_0,\{p_1, p_2, p_3\})= 0.00465585119998784 </math>
+:: <math>cov[X_1,X_3] = \frac{10 \times 0.2 \times 0.2}{0.5^2}=1.6</math>
+:: <math>\mu_2=\frac{k_0 p_2}{p_0} = \frac{10\times 0.1}{0.5}=2.0</math>
+:: <math>\mu_3=\frac{k_0 p_3}{p_0} = \frac{10\times 0.2}{0.5}=4.0</math>
+:: <math>corr[X_2,X_3] = \left (\frac{\mu_2 \times \mu_3}{(k_0+\mu_2)(k_0+\mu_3)} \right )^{\frac{1}{2}}</math> and therefore, <math>corr[X_2,X_3] = \left (\frac{2 \times 4}{(10+2)(10+4)} \right )^{\frac{1}{2}} = 0.21821789023599242. </math>
 :: You can also use the interactive [http://socr.ucla.edu/htmls/dist/NegativeMultinomial_Distribution.html SOCR negative multinomial distribution calculator] to compute these quantities, as shown on the figure below.
-<center>[[Image:SOCR_EBook_Dinov_RV_NegMultinomial_Fig7.png|500px]]</center>
+<center>[[Image:SOCR_EBook_Dinov_RV_NegMultinomial_Fig8.png|500px]]</center>
 * There is no [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|MLE estimate]] for the NMD <math>k_0</math> parameter ([http://books.google.com/books?id=V7w7dDEKfuoC&pg=PA12&lpg=PA12&dq=example+%22negative+multinomial%22&source=bl&ots=fFRF5X3Dug&sig=7-qbOZhv5ysHI_QVdVDxYZPL8Eg&hl=en&ei=g2jzStaMEo3QtgPF36QD&sa=X&oi=book_result&ct=result&resnum=4&ved=0CBQQ6AEwAw#v=onepage&q=example%20%22negative%20multinomial%22&f=false see this reference]). However, there are approximate protocols for estimating the <math>k_0</math> parameter, see the example below.
-* Correlation: correlation between any pair of Negative Binomial counts (<math>X_i</math> and <math>X_j</math>) is:
+* Correlation: correlation between any pair of Negative Multinomial counts (<math>X_i</math> and <math>X_j</math>) is:
-: <math> Corr[X_i,X_j] = \begin{cases} \left (\frac{\mu_i \times \mu_j}{(k_0+\mu_i)(k_0+\mu_j)} \right )^{\frac{1}{2}}, & i\not= j, \\
+: <math> Corr[X_i,X_j] = \begin{cases} \left (\frac{\mu_i \times \mu_j}{(k_0+\mu_i)(k_0+\mu_j)} \right )^{\frac{1}{2}} =
+ \left (\frac{p_i p_j}{(p_0+p_i)(p_0+p_j)} \right )^{\frac{1}{2}}, & i\not= j, \\
 , & i=j.\end{cases}</math>.
-* The [http://en.wikipedia.org/wiki/Marginal_distribution marginal distribution] of each of the <math>X_i</math> variables is [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial |negative binomial]], as the <math>X_i</math> count (considered as success) is measured against all the other outcomes (failure). But jointly, the distribution of <math>X=\{X_0,\cdots,X_m\}</math> is negative multinomial.
+* The [http://en.wikipedia.org/wiki/Marginal_distribution marginal distribution] of each of the <math>X_i</math> variables is [[AP_Statistics_Curriculum_2007_Distrib_Dists#Negative_Binomial |negative binomial]], as the <math>X_i</math> count (considered as success) is measured against all the other outcomes (failure). But jointly, the distribution of <math>X=\{X_1,\cdots,X_m\}</math> is negative multinomial, i.e., <math>X \sim NMD(k_0,\{p_1,\cdots,p_m\})</math> .
-Notice that the pair-wise NMD correlations are always positive, where as the correlations between [[AP_Statistics_Curriculum_2007_Distrib_Multinomial |multinomail counts]] are always negative. Also note that as the parameter <math>k_0</math> increases, the paired correlations go to zero! Thus, for large <math>k_0</math>, the Negative Multinomial counts <math>X_i</math> behave as ''independent'' [[AP_Statistics_Curriculum_2007_Distrib_Poisson |Poisson random variables]] with respect to their means (<math>\mu_i</math>).
+Notice that the pair-wise NMD correlations are always positive, where as the correlations between [[AP_Statistics_Curriculum_2007_Distrib_Multinomial |multinomail counts]] are always negative. Also note that as the parameter <math>k_0</math> increases, the paired correlations go to zero! Thus, for large <math>k_0</math>, the Negative Multinomial counts <math>X_i</math> behave as ''independent'' [[AP_Statistics_Curriculum_2007_Distrib_Poisson |Poisson random variables]] with respect to their means <math>\left ( \mu_i= k_0\frac{p_i}{p_0}\right )</math>.
-====<math>k_0</math> Parameter estimation====
+====Parameter estimation====
-We already discussed that for the NMD the MLE estimates of the mean parameters <math>\mu_i</math> are easy to compute. There is no MLE for the <math>k_0</math> parameter; however, there is a protocol for estimating <math>k_0</math> using the [[AP_Statistics_Curriculum_2007_Contingency_Fit |chi-squared goodness of fit statistic]]. In the usual chi-squared statistic:
+* Estimation of the mean (expected) frequency counts (<math>\mu_j</math>) of each outcome (<math>X_j</math>):
-: <math>\Chi^2 = \sum_i{\frac{(x_i-\mu_i)^2}{\mu_i}}</math>
+: The MLE estimates of the NMD mean parameters <math>\mu_j</math> are easy to compute.
-we can replace the expected-means (<math>\mu_i</math>) by sample-means ()<math>\hat{\mu_i}</math> and replace denominators by the corresponding negative multinomial variances. Then we get the following test statistic for negative multinomial distributed data:
+:: If we have a single observation vector <math>\{x_1, \cdots,x_m\}</math>, then <math>\hat{\mu}_i=x_i.</math>
+:: If we have several observation vectors, like in this case we have the cancer type frequencies for 3 different sites, then the MLE estimates of the mean counts are <math>\hat{\mu}_j=\frac{x_{j,.}}{I}</math>, where <math>0\leq j \leq J</math> is the cancer-type index and the summation is over the number of observed (sampled) vectors (I).
+:: For the cancer data above, we have the following MLE estimates for the expectations for the frequency counts:
+::: Hutchinson's melanomic freckle type of cancer (<math>X_0</math>) is <math>\hat{\mu}_0 = 34/3=11.33</math>.
+::: Superficial type of cancer (<math>X_1</math>) is <math>\hat{\mu}_1 = 185/3=61.67</math>.
+::: Nodular type of cancer (<math>X_2</math>) is <math>\hat{\mu}_2 = 125/3=41.67</math>.
+::: Indeterminant type of cancer (<math>X_3</math>) is <math>\hat{\mu}_3 = 56/3=18.67</math>.
+* Estimation of the <math>k_0</math> (gamma) parameter:
+: There is no MLE for the <math>k_0</math> parameter; however, there is a protocol for estimating <math>k_0</math> using the [[AP_Statistics_Curriculum_2007_Contingency_Fit |chi-squared goodness of fit statistic]]. In the usual chi-squared statistic:
+: <math>\Chi^2 = \sum_i{\frac{(x_i-\mu_i)^2}{\mu_i}}</math>, we can replace the expected-means (<math>\mu_i</math>) by their estimates, <math>\hat{\mu_i}</math>, and replace denominators by the corresponding negative multinomial variances. Then we get the following test statistic for negative multinomial distributed data:
 : <math>\Chi^2(k_0) = \sum_{i}{\frac{(x_i-\hat{\mu_i})^2}{\hat{\mu_i} \left (1+ \frac{\hat{\mu_i}}{k_0} \right )}}</math>.
-Now we can derive a simple method for estimating the <math>k_0</math> parameter by varying the
+: Now we can derive a simple method for estimating the <math>k_0</math> parameter by varying the values of <math>k_0</math> in the expression <math>\Chi^2(k_0)</math> and matching the values of this statistic with the corresponding asymptotic chi-squared distribution. The following protocol summarizes these steps using the cancer data above:
-values of <math>k_0</math> in the expression <math>\Chi^2(k_0)</math> and matching the values of this statistic with the corresponding asymptotic chi-squared distribution. The following protocol summarizes these steps using the cancer data above:
 * ''DF'': The [[AP_Statistics_Curriculum_2007_Contingency_Indep#Calculations |degree of freedom for the Chi-square distribution]] in this case is:
@@ Line 179: / Line 238: @@
 * ''Median'': The [http://socr.ucla.edu/htmls/dist/ChiSquare_Distribution.html median of a chi-squared random variable with 6 df] is 5.261948.
-* ''Mean Counts Estimates'': The mean counts estimates (<math>\mu_i</math>) for the 4 different cancer types are:
+* ''Mean Counts Estimates'': The mean counts estimates (<math>\mu_j</math>) for the 4 different cancer types are:
-::
+::<math>\hat{\mu}_1 = 185/3=61.67</math>; <math>\hat{\mu}_2 = 125/3=41.67</math>; and <math>\hat{\mu}_3 = 56/3=18.67</math>.
+* Thus, we can solve the equation above <math>\Chi^2(k_0) = 5.261948</math> for the single variable of interest -- the unknown parameter <math>k_0</math>. Suppose we are using the same example as before, <math>x=\{x_1=5,x_2=1,x_3=5\}</math>. Then the solution is an asymptotic chi-squared distribution driven estimate of the parameter <math>k_0</math>.
+<math>\Chi^2(k_0) = \sum_{i=1}^3{\frac{(x_i-\hat{\mu_i})^2}{\hat{\mu_i} \left (1+ \frac{\hat{\mu_i}}{k_0} \right )}}</math>.
+<math>\Chi^2(k_0) = \frac{(5-61.67)^2}{61.67(1+61.67/k_0)}+\frac{(1-41.67)^2}{41.67(1+41.67/k_0)}+\frac{(5-18.67)^2}{18.67(1+18.67/k_0)}=5.261948.</math> Solving this equation for <math>k_0</math> provides the desired estimate for the last parameter.
+:: [http://www.mathematica.com/ Mathematica] provides 3 distinct (<math>k_0</math>) solutions to this equation: {'''50.5466''', -21.5204, '''2.40461'''}. Since <math>k_0>0</math> there are 2 candidate solutions.
+* '''Estimates of Probabilities''': Assume <math>k_0=2</math> and <math>\frac{\mu_i}{k_0}p_0=p_i</math>, we have:
+: <math>\frac{61.67}{k_0}p_0=31p_0=p_1</math>
+: <math>20p_0=p_2</math>
+: <math>9p_0=p_3</math>
+: Hence, <math>1-p_0=p_1+p_2+p_3=60p_0</math>. Therefore, <math>p_0=\frac{1}{61}</math>, <math>p_1=\frac{31}{61}</math>, <math>p_2=\frac{20}{61}</math> and <math>p_3=\frac{9}{61}</math>.
+: Therefore, the best model distribution for the observed sample <math>x=\{x_1=5,x_2=1,x_3=5\}</math> is <math>X \sim NMD\left (2, \left \{\frac{31}{61}, \frac{20}{61},\frac{9}{61}\right\} \right ).</math>
-* Thus we can solve the equation above <math>\Chi^2(k_0) = 5.261948</math> for the single variable of interest -- the unknown parameter <math>k_0</math>. This solution is an asymptotic chi-squared distribution driven estimate of the parameter <math>k_0</math>.
+: Notice that in this calculation, we explicitly used the complete cancer data table, not only the sample <math>x=\{x_1=5,x_2=1,x_3=5\}</math>, as we need multiple samples (multiple sites or columns) to estimate the <math>k_0</math> parameter.
 ====SOCR Negative Multinomial Distribution Calculator====

AP Statistics Curriculum 2007 Distrib Dists

From Socr

Current revision as of 19:35, 23 June 2012

Contents

General Advance-Placement (AP) Statistics Curriculum - Geometric, HyperGeometric, Negative Binomial Random Variables and Experiments

Geometric

HyperGeometric

Examples

Negative Binomial

X=Trial index (n) of the r^th success, or Total # of experiments (n) to get r successes

Y = Number of failures (k) to get r successes

SOCR Negative Binomial Experiment

Application

Normal approximation to Negative Binomial distribution

Negative Multinomial Distribution (NMD)

Negative Multinomial Summary

Cancer Example

Parameter estimation

SOCR Negative Multinomial Distribution Calculator

Problems

References

Views

Personal tools

Navigation

Search

Toolbox

AP Statistics Curriculum 2007 Distrib Dists

From Socr

Current revision as of 19:35, 23 June 2012

Contents

General Advance-Placement (AP) Statistics Curriculum - Geometric, HyperGeometric, Negative Binomial Random Variables and Experiments

Geometric

HyperGeometric

Examples

Negative Binomial

X=Trial index (n) of the rth success, or Total # of experiments (n) to get r successes

Y = Number of failures (k) to get r successes

SOCR Negative Binomial Experiment

Application

Normal approximation to Negative Binomial distribution

Negative Multinomial Distribution (NMD)

Negative Multinomial Summary

Cancer Example

Parameter estimation

SOCR Negative Multinomial Distribution Calculator

Problems

References

Views

Personal tools

Navigation

Search

Toolbox

X=Trial index (n) of the r^th success, or Total # of experiments (n) to get r successes