# AP Statistics Curriculum 2007 MultivariateNormal

(Difference between revisions)
 Revision as of 05:33, 14 December 2010 (view source)IvoDinov (Talk | contribs)m (→Definition)← Older edit Current revision as of 00:02, 22 July 2012 (view source)IvoDinov (Talk | contribs) (→Bivariate (2D) case) (8 intermediate revisions not shown) Line 24: Line 24: ===Bivariate (2D) case=== ===Bivariate (2D) case=== + : See the SOCR Bivariate Normal Distribution [[SOCR_BivariateNormal_JS_Activity| Activity]] and corresponding [http://socr.ucla.edu/htmls/HTML5/BivariateNormal/ Webapp]. + In 2-dimensions, the nonsingular bi-variate Normal distribution with ($k=rank(\Sigma) = 2$), the probability density function of a (bivariate) vector (X,Y) is In 2-dimensions, the nonsingular bi-variate Normal distribution with ($k=rank(\Sigma) = 2$), the probability density function of a (bivariate) vector (X,Y) is : $: [itex] Line 50: Line 52: ====Two normally distributed random variables need not be jointly bivariate normal==== ====Two normally distributed random variables need not be jointly bivariate normal==== - The fact that two random variables ''X'' and ''Y'' both have a normal distribution does not imply that the pair (''X'', ''Y'') has a joint normal distribution. A simple example is one in which X has a normal distribution with expected value 0 and variance 1, and ''Y'' = ''X'' if |''X''| > ''c'' and ''Y'' = −''X'' if |''X''| < ''c'', where ''c'' is about 1.54. + The fact that two random variables ''X'' and ''Y'' both have a normal distribution does not imply that the pair (''X'', ''Y'') has a joint normal distribution. A simple example is provided below: + : Let X ~ N(0,1). + : Let [itex]Y = \begin{cases} X,& |X| > 1.33,\\ + -X,& |X| \leq 1.33.\end{cases}$ + + Then, both X and Y are individually Normally distributed; however, the pair (X,Y) is '''not''' jointly bivariate Normal distributed (of course, the constant c=1.33 is not special, any other non-trivial constant also works). + + Furthermore, as X and Y are not independent, the sum Z = X+Y is not guaranteed to be a (univariate) Normal variable. In this case, it's clear that Z is not Normal: + : $Z = \begin{cases} 0,& |X| \leq 1.33,\\ + 2X,& |X| > 1.33.\end{cases}$ + ===Applications=== + [[SOCR_EduMaterials_Activities_2D_PointSegmentation_EM_Mixture| This SOCR activity demonstrates the use of 2D Gaussian distribution, expectation maximization and mixture modeling for classification of points (objects) in 2D]]. ===[[EBook_Problems_MultivariateNormal|Problems]]=== ===[[EBook_Problems_MultivariateNormal|Problems]]===

## EBook - Multivariate Normal Distribution

The multivariate normal distribution, or multivariate Gaussian distribution, is a generalization of the univariate (one-dimensional) normal distribution to higher dimensions. A random vector is said to be multivariate normally distributed if every linear combination of its components has a univariate normal distribution. The multivariate normal distribution may be used to study different associations (e.g., correlations) between real-valued random variables.

### Definition

In k-dimensions, a random vector $X = (X_1, \cdots, X_k)$ is multivariate normally distributed if it satisfies any one of the following equivalent conditions (Gut, 2009):

• Every linear combination of its components Y = a1X1 + … + akXk is normally distributed. In other words, for any constant vector $a\in R^k$, the linear combination (which is univariate random variable) $Y = a^TX = \sum_{i=1}^{k}{a_iX_i}$ has a univariate normal distribution.
• There exists a random -vector Z, whose components are independent normal random variables, a k-vector μ, and a k×ℓ matrix A, such that X = AZ + μ. Here is the rank of the variance-covariance matrix.
• There is a k-vector μ and a symmetric, nonnegative-definite k×k matrix Σ, such that the characteristic function of X is
$\varphi_X(u) = \exp\Big( iu^T\mu - \tfrac{1}{2} u^T\Sigma u \Big).$
• When the support of X is the entire space Rk, there exists a k-vector μ and a symmetric positive-definite k×k variance-covariance matrix Σ, such that the probability density function of X can be expressed as
$f_X(x) = \frac{1}{ (2\pi)^{k/2}|\Sigma|^{1/2} } \exp\!\Big( {-\tfrac{1}{2}}(x-\mu)'\Sigma^{-1}(x-\mu) \Big)$, where |Σ| is the determinant of Σ, and where (2π)k/2|Σ|1/2 = |2πΣ|1/2. This formulation reduces to the density of the univariate normal distribution if Σ is a scalar (i.e., a 1×1 matrix).

If the variance-covariance matrix is singular, the corresponding distribution has no density. An example of this case is the distribution of the vector of residual-errors in the ordinary least squares regression. Note also that the Xi are in general not independent; they can be seen as the result of applying the matrix A to a collection of independent Gaussian variables Z.

### Bivariate (2D) case

See the SOCR Bivariate Normal Distribution Activity and corresponding Webapp.

In 2-dimensions, the nonsingular bi-variate Normal distribution with (k = rank(Σ) = 2), the probability density function of a (bivariate) vector (X,Y) is

$f(x,y) = \frac{1}{2 \pi \sigma_x \sigma_y \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(x-\mu_x)^2}{\sigma_x^2} + \frac{(y-\mu_y)^2}{\sigma_y^2} - \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y} \right] \right),$

where ρ is the correlation between X and Y. In this case,

$\mu = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sigma_x^2 & \rho \sigma_x \sigma_y \\ \rho \sigma_x \sigma_y & \sigma_y^2 \end{pmatrix}.$

In the bivariate case, the first equivalent condition for multivariate normality is less restrictive: it is sufficient to verify that countably many distinct linear combinations of X and Y are normal in order to conclude that the vector [X,Y]T is bivariate normal.

### Properties

#### Normally distributed and independent

If X and Y are normally distributed and independent, this implies they are "jointly normally distributed", hence, the pair (XY) must have bivariate normal distribution. However, a pair of jointly normally distributed variables need not be independent - they could be correlated.

#### Two normally distributed random variables need not be jointly bivariate normal

The fact that two random variables X and Y both have a normal distribution does not imply that the pair (XY) has a joint normal distribution. A simple example is provided below:

Let X ~ N(0,1).
Let $Y = \begin{cases} X,& |X| > 1.33,\\ -X,& |X| \leq 1.33.\end{cases}$

Then, both X and Y are individually Normally distributed; however, the pair (X,Y) is not jointly bivariate Normal distributed (of course, the constant c=1.33 is not special, any other non-trivial constant also works).

Furthermore, as X and Y are not independent, the sum Z = X+Y is not guaranteed to be a (univariate) Normal variable. In this case, it's clear that Z is not Normal:

$Z = \begin{cases} 0,& |X| \leq 1.33,\\ 2X,& |X| > 1.33.\end{cases}$