# SOCR BivariateNormal JS Activity

(Difference between revisions)
 Revision as of 00:57, 25 July 2012 (view source)IvoDinov (Talk | contribs) (→Experiment 2: Inflation vs. HPI)← Older edit Revision as of 20:18, 30 July 2012 (view source)IvoDinov (Talk | contribs) Newer edit → Line 5: Line 5: ==Goals== ==Goals== The aims of this activity are to: The aims of this activity are to: - * To clarify the definitions and interplay between marginal, conditional and joint probability distributions (in the bivariate Normal case) + * Provide a visualization tool for better understanding of the bivariate normal distribution. - * To learn how to calculate Normal marginal conditional and joint probabilities. + * Clarify the definitions and interplay between marginal, conditional and joint probability distributions (in the bivariate Normal case). - * To demonstrate that when X and Y have joint bivariate normal distribution with zero correlation, then X and Y must be independent. + * To learn how to calculate Normal marginal, conditional and joint probabilities. + * Show how correlation influences the distribution of two normally-distributed variables. Demonstrate that when X and Y have joint bivariate normal distribution with zero correlation, then X and Y must be independent. + * Build a framework for generalizing the univariate normal distribution to higher dimensions. ==Background== ==Background== Line 30: Line 32: ::: $$q(x,y) = \frac{1}{2} \frac{1}{1-\rho^2} \left ( \left ( \frac{X-\mu_X}{\sigma_X} \right )^2 -2\rho\frac{X-\mu_X}{\sigma_X}\frac{Y-\mu_Y}{\sigma_Y} +\left ( \frac{Y-\mu_Y}{\sigma_Y} \right )^2 \right )$$. ::: $$q(x,y) = \frac{1}{2} \frac{1}{1-\rho^2} \left ( \left ( \frac{X-\mu_X}{\sigma_X} \right )^2 -2\rho\frac{X-\mu_X}{\sigma_X}\frac{Y-\mu_Y}{\sigma_Y} +\left ( \frac{Y-\mu_Y}{\sigma_Y} \right )^2 \right )$$. - ==Requirements== + ==Requirements & usability== A modern web-browser with HTML and JavaScript support is required (mobile devices should be fine). The [http://socr.ucla.edu/htmls/HTML5/BivariateNormal/ 3D view of the bivariate Normal distribution] requires [http://www.khronos.org/webgl/ WebGL] support, however this is not absolutely necessary. If you toggle off the "Use WebGL" check-box in the Settings panel you can view the 3D grid/mesh representation of the 2D Normal/Gaussian distribution without WebGL. A modern web-browser with HTML and JavaScript support is required (mobile devices should be fine). The [http://socr.ucla.edu/htmls/HTML5/BivariateNormal/ 3D view of the bivariate Normal distribution] requires [http://www.khronos.org/webgl/ WebGL] support, however this is not absolutely necessary. If you toggle off the "Use WebGL" check-box in the Settings panel you can view the 3D grid/mesh representation of the 2D Normal/Gaussian distribution without WebGL. # Go to the [http://socr.ucla.edu/htmls/HTML5/BivariateNormal/ SOCR Bivariate Normal Distribution Webapp]. # Go to the [http://socr.ucla.edu/htmls/HTML5/BivariateNormal/ SOCR Bivariate Normal Distribution Webapp]. Line 41: Line 43: # Probability Results are reported in the bottom text area. # Probability Results are reported in the bottom text area. - ==Experiment 1: Height vs. Weight== + ==Learning Activity: Human Height and Weight== + In this interactive activity, we will use [[SOCR_Data_Dinov_020108_HeightsWeights|Height vs. Weight data for a random sample of 200 adolescents]]. + + ===Parameter Estimation=== + Use the [http://socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] & the [[SOCR_EduMaterials_ModelerActivities_NormalBetaModelFit|modeler activity]] to estimate the mean and standard deviation (SD) of the Height and Weight variables. Also, investigate the distribution of the two variables. + + For Height variable: +
[[Image:SOCR_BivariateNormal_JS_Activity_Fig1.png|300px]]
+ + + ==Practice experiments== + ===Height vs. Weight=== Use the [[SOCR_Data_Dinov_020108_HeightsWeights |SOCR Height vs. Weight dataset]]. Use the [[SOCR_Data_Dinov_020108_HeightsWeights |SOCR Height vs. Weight dataset]]. * Motivation: Human heights and weights are correlated, how do the marginal parameters for each of the height and weight distributions, and their correlation, affect the joint and conditional probabilities? * Motivation: Human heights and weights are correlated, how do the marginal parameters for each of the height and weight distributions, and their correlation, affect the joint and conditional probabilities? Line 51: Line 64: ** Joint (e.g., $$P(Height>60 \cap Weight<160)$$). ** Joint (e.g., $$P(Height>60 \cap Weight<160)$$). - ==Experiment 2: Inflation vs. HPI== + ===Inflation vs. HPI=== Use the [[SOCR_Data_MonetaryBaseStocksInterest1959_2009|SOCR Inflation vs. Housing Price Index (HPI) dataset]]. Use the [[SOCR_Data_MonetaryBaseStocksInterest1959_2009|SOCR Inflation vs. Housing Price Index (HPI) dataset]]. * Motivation: There are intricate associations between different social and economic factors like inflation, interest rate, consumer price index and housing price index. We can explore how marginal parameters for each of the ''Inflation'' and ''HPI'' distributions, and their correlation, affect their joint and conditional probabilities? * Motivation: There are intricate associations between different social and economic factors like inflation, interest rate, consumer price index and housing price index. We can explore how marginal parameters for each of the ''Inflation'' and ''HPI'' distributions, and their correlation, affect their joint and conditional probabilities? Line 61: Line 74: ** Conditional (e.g., $$P(Inflation>5.0 \vert HPI <108)$$), ** Conditional (e.g., $$P(Inflation>5.0 \vert HPI <108)$$), ** Joint (e.g., $$P(Inflation>4.0 \cap HPI<110)$$). ** Joint (e.g., $$P(Inflation>4.0 \cap HPI<110)$$). - -
[[Image:SOCR_BivariateNormal_JS_Activity_Fig1.png|300px]] - [[Image:SOCR_BivariateNormal_JS_Activity_Fig2.png|300px]] - [[Image:SOCR_BivariateNormal_JS_Activity_Fig3.png|300px]] -
==References== ==References==

## SOCR Educational Materials - Activities - SOCR Bivariate Normal Distribution Activity

This activity represents a 3D rendering of the Bivariate Normal Distribution. It is implemented in HTML5/JavaScript and should be portable on any computer, operating system and web-browser.

## Goals

The aims of this activity are to:

• Provide a visualization tool for better understanding of the bivariate normal distribution.
• Clarify the definitions and interplay between marginal, conditional and joint probability distributions (in the bivariate Normal case).
• To learn how to calculate Normal marginal, conditional and joint probabilities.
• Show how correlation influences the distribution of two normally-distributed variables. Demonstrate that when X and Y have joint bivariate normal distribution with zero correlation, then X and Y must be independent.
• Build a framework for generalizing the univariate normal distribution to higher dimensions.

## Background

• In general, when X and Y are jointly continuous random variables with a joint density $$ƒ_{X,Y}(x,y)$$, if A and B (non-trivial) are subsets of the ranges of X and Y (e.g., intervals), then:
$$P(X \in A \mid Y \in B) = \frac{\int_{y\in B}\int_{x\in A} f_{X,Y}(x,y)\,dx\,dy}{\int_{y\in B}\int_{x\in\Omega} f_{X,Y}(x,y)\,dx\,dy}.$$
• In the special case where B={y0}, representing a single point, the conditional probability is:
$$P(X \in A \mid Y = y_0) = \frac{\int_{x\in A} f_{X,Y}(x,y_0)\,dx}{\int_{x\in\Omega} f_{X,Y}(x,y_0)\,dx}$$. If the set (range) A is trivial, then the conditional probability is zero.
• Suppose that X has normal distribution, the conditional mean of X given $$Y=y_o$$, $$E(X|Y=y_o)$$, is linear in Y, and the conditional variance of X given $$y_o$$, $$Var(X|y_0)$$, is constant. Then, the conditional probability distribution of X given Y = $$y_0$$, $$f_{X|Y=y_o}$$, is given by:
$$f_{X|y_o} \sim N \left ( \mu_{X|y_o} = \mu_X +\rho \frac{\sigma_X}{\sigma_Y}(y_o-\mu_Y), \sigma_{X|y_o}^2 = \sigma_X^2(1-\rho^2) \right)$$, where
$$X \sim N (\mu_X, \sigma_X^2)$$,
$$E(Y)=\mu_Y$$, and $$VAR(Y)=E(Y^2)-\mu_Y^2 = \sigma_Y^2$$, but this does not necessarily require that Y is normally distributed itself!
$$\rho = Corr(X,Y)$$ is the correlation between X and Y.
This expression of the density assumes that the conditional mean of X given $$y_o$$ is linear in y and the conditional variance of X given $$y_o$$ is constant.
• The above does not make assumption about the distribution of Y. Now assume Y is also normally distributed with $$Y \sim N (\mu_Y, \sigma_Y^2)$$. We have 3 important observations:
1. The density of Y is:
$$f_Y = \frac{1}{\sigma_Y \sqrt{2\pi}} e^{-\frac{(y-\mu_y)^2}{2\sigma_Y^2}}$$,
2. The conditional distribution of $$X$$ given $$Y = y_o$$ is:
$$g_{X|Y}(x|y) = \frac{1}{\sigma_{X|Y} \sqrt{2\pi}} e^{-\frac{(x-\mu_{X|Y})^2}{2\sigma_{X|Y}^2}}$$,
$$= \frac{1}{\sigma_X\sqrt{1-\rho^2} \sqrt{2\pi}} e^{-\frac{(x-\mu_X-\rho\frac{\sigma_X}{\sigma_Y}(Y-\mu_Y))^2}{2\sigma_X^2(1-\rho^2)}}$$
3. The joint probability density function of $$X$$ and $$Y$$ is:
$$f_{X,Y}(x,y) = g_{X|Y}(x|y)f_Y(y) = \frac{1}{\sigma_X\sigma_Y 2\pi\sqrt{1-\rho^2}} e^{-q(x,y)}$$, where
$$q(x,y) = \frac{1}{2} \frac{1}{1-\rho^2} \left ( \left ( \frac{X-\mu_X}{\sigma_X} \right )^2 -2\rho\frac{X-\mu_X}{\sigma_X}\frac{Y-\mu_Y}{\sigma_Y} +\left ( \frac{Y-\mu_Y}{\sigma_Y} \right )^2 \right )$$.

## Requirements & usability

A modern web-browser with HTML and JavaScript support is required (mobile devices should be fine). The 3D view of the bivariate Normal distribution requires WebGL support, however this is not absolutely necessary. If you toggle off the "Use WebGL" check-box in the Settings panel you can view the 3D grid/mesh representation of the 2D Normal/Gaussian distribution without WebGL.

1. Go to the SOCR Bivariate Normal Distribution Webapp.
2. Use the Settings to initialize the web-app.
3. In the Control panel:
1. Select the appropriate bivariate limits for the X and Y variables.
2. Choose desired Marginal or Conditional probability function.
3. 1D Normal Distribution graph will be shown to the right.
4. You can rotate and manipulate the bivariate normal distribution in 3D by clicking and dragging on the graph below.
5. Probability Results are reported in the bottom text area.

## Learning Activity: Human Height and Weight

In this interactive activity, we will use Height vs. Weight data for a random sample of 200 adolescents.

### Parameter Estimation

Use the SOCR Modeler & the modeler activity to estimate the mean and standard deviation (SD) of the Height and Weight variables. Also, investigate the distribution of the two variables.

For Height variable:

## Practice experiments

### Height vs. Weight

Use the SOCR Height vs. Weight dataset.

• Motivation: Human heights and weights are correlated, how do the marginal parameters for each of the height and weight distributions, and their correlation, affect the joint and conditional probabilities?
• Use the SOCR Modeler and the SOCR Modeler activity to estimate the mean and standard deviation of each of the 2 variables (people's heights and weights).
• Use the SOCR Simple Linear Regression applet, and the corresponding activity, to estimate the correlation ($$\rho=Corr(Height, Weight)$$).
• Use these 5 estimated quantities to apply the SOCR BVN Webapp to compute various probabilities of interest (phrased in the context of the data itself!):
• Marginal (e.g., $$P(Weight<150)$$),
• Conditional (e.g., $$P(Weight<150 \vert Height<63)$$),
• Joint (e.g., $$P(Height>60 \cap Weight<160)$$).

### Inflation vs. HPI

• Motivation: There are intricate associations between different social and economic factors like inflation, interest rate, consumer price index and housing price index. We can explore how marginal parameters for each of the Inflation and HPI distributions, and their correlation, affect their joint and conditional probabilities?
• Caution: This example is a little different from the human height and weight experiment above. In general, HPI and inflation may not follow normal distributions and may be skewed. Use the SOCR Histogram Chart to plot their distributions. Can the Bivariate Normal Distribution be used as an approximate model of the bivariate relation/probabilities of inflation and HPI? How about if we apply a data transformation? For example, the figure below shows the result of applying a square-root-transformation to the inflation variable ($$\lambda=0.5$$). The blue distribution of the transformed data is closer to Normal (note the skewness and kurtosis) compared to the red histogram of the raw inflation values.
• Use the SOCR Modeler and the SOCR Modeler activity to estimate the mean and standard deviation of each of the 2 variables (inflation and HPI).
• Use the SOCR Simple Linear Regression applet, and the corresponding activity, to estimate the correlation ($$\rho=Corr(Inflation,HPI)$$).
• Use these 5 estimated quantities to apply the SOCR BVN Webapp to compute various probabilities of interest (phrased in the context of the data itself!):
• Marginal (e.g., $$P(Inflation<2.0)$$),
• Conditional (e.g., $$P(Inflation>5.0 \vert HPI <108)$$),
• Joint (e.g., $$P(Inflation>4.0 \cap HPI<110)$$).