SOCR EduMaterials ModelerActivities NormalBetaModelFit

From Socr

(Difference between revisions)
Jump to: navigation, search
Line 2: Line 2:
=== Summary===
=== Summary===
-
This activity describes the process of [[SOCR]] model fitting in the case of using Normal or Beta distribution models. ''Model fitting'' is the process of determining the parameters for an analytical model in usch a way that we obtain optimal parameter estimates according to some critirion. There are many strategies for [http://en.wikipedia.org/wiki/Estimating_Parameters parameter estimation]. The differences between most of these are the underlying cost-functions and the optimization strategies applied to maximize/minimize the cost-function.
+
This activity describes the process of [[SOCR]] model fitting in the case of using Normal or Beta distribution models. ''Model fitting'' is the process of determining the parameters for an analytical model in such a way that we obtain optimal parameter estimates according to some criterion. There are many strategies for [http://en.wikipedia.org/wiki/Estimating_Parameters parameter estimation]. The differences between most of these are the underlying cost-functions and the optimization strategies applied to maximize/minimize the cost-function.
===Goals===
===Goals===
Line 11: Line 11:
===Background & Motivation===
===Background & Motivation===
-
Suppose we are given the sequence of numbers {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and asked to find the best [[About_pages_for_SOCR_Distributions | (Continuous) Uniform Distribution]] that fits that data. In this case there are two parameters that need to be estimated - the minimum (''m'') and the maximum (''M'') of the data. These parameters determine exactly the support (domain) of the continuous distribution and we can explicitely write the density for the (best fit) continuous uniform distribution as:
+
Suppose we are given the sequence of numbers {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and asked to find the best [[About_pages_for_SOCR_Distributions | (Continuous) Uniform Distribution]] that fits that data. In this case, there are two parameters that need to be estimated - the minimum (''m'') and the maximum (''M'') of the data. These parameters determine exactly the support (domain) of the continuous distribution and we can explicitly write the density for the (best fit) continuous uniform distribution as:
<center><math>f(x) = {{1}\over{M-m}}</math>, for <math>m \le x \le M</math> and <math>f(x)=0</math>, for <math>x \notin [m:M]</math>.</center>
<center><math>f(x) = {{1}\over{M-m}}</math>, for <math>m \le x \le M</math> and <math>f(x)=0</math>, for <math>x \notin [m:M]</math>.</center>
-
Having this model distribution, we can use it's analytical form, <math>f(x)</math>, to compute probabilities of events, critical functional values and, in general, do inference on the native process withiout acquirying additional data. Hence a good strategy for model fitting is extremely useful in data analysis and statitical inference. Of course, any inference based on models is only going to be as good as the data and the optimization strategy used to generate the model.
+
Having this model distribution, we can use it's analytical form, <math>f(x)</math>, to compute probabilities of events, critical functional values and, in general, do inference on the native process without acquiring additional data. Hence, a good strategy for model fitting is extremely useful in data analysis and statistical inference. Of course, any inference based on models is only going to be as good as the data and the optimization strategy used to generate the model.
-
Let's look at another motivational example. This time, suppose we have recorded the following (sample) measurements from some procces {1.2, 1.7, 3.4, 1.5, 1.1, 1.7, 3.5, 2.5}. Taking bin-size of 1, we can easily calculate the frequency histogram for this sample, {6, 1, 2}, as there are 6 observations in the interval [1:2), 1 measurement in the interval [2:3) and 2 measurements in the interval [3:4).
+
Let's look at another motivational example. This time, suppose we have recorded the following (sample) measurements from some process {1.2, 1.7, 3.4, 1.5, 1.1, 1.7, 3.5, 2.5}. Taking bin-size of 1, we can easily calculate the frequency histogram for this sample, {6, 1, 2}, as there are 6 observations in the interval [1:2), 1 measurement in the interval [2:3) and 2 measurements in the interval [3:4).
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig1.png|400px]]</center>
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig1.png|400px]]</center>
We can now ask about the ''best Beta distribution model fit to the histogram of the data''!
We can now ask about the ''best Beta distribution model fit to the histogram of the data''!
-
Most of the time when we study natural processes using [http://en.wikipedia.org/wiki/Probability_distribution probability distributions], it makes sense to fit distribution models to the frequency histogram of a sample, not the actual sample. This is because our general goals are to model the behaviour of the native process, understand its distribution and quantify likelihoods of various events of interest (e.g., probbaility of observing an outcome in the interval [1.50:2.15), as in the example above).
+
Most of the time when we study natural processes using [http://en.wikipedia.org/wiki/Probability_distribution probability distributions], it makes sense to fit distribution models to the frequency histogram of a sample, not the actual sample. This is because our general goals are to model the behavior of the native process, understand its distribution and quantify likelihoods of various events of interest (e.g., in terms of the example above, we may be interested in the probability of observing an outcome in the interval [1.50:2.15) or the chance that an observation exceeds 2.8).
===Exercises===
===Exercises===
====Exercise 1====
====Exercise 1====
-
Let's first solve the challenge we presented in the background section, where we calculated the frequency histogram for a sample to be {6, 1, 2}. Go to the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and click on the '''Data''' tab. Paste in the two columns of data. Column 1 {1, 2, 3} - these are the ranges of the sample values and correspond to measurements in the intervals [1:2), [2:3) and [3:4). The second colum represent the actual frequency counts of measurements within each of these 3 histogram bins - these are the values {6, 1, 2}. Then Press the '''Graphs''' tab. You should see an image like the one below. Then choose '''Beta_Fit_Modeler''' from the drop-down list of models in the top-left and click the estimate parameters check-box, also on the top-left. The graph now shows you the best Beta distribution model fit to the frequency histogram {6, 1, 2}. Click the '''Results''' tab to see the actual estimates of the two parameters of the corresponsding Beta distribution (''Left Parameter = 0.0446428571428572; Right Parameter = 0.11607142857142871; Left Limit = 1.0; Right Limit = 3.0'').
+
Let's first solve the challenge we presented in the background section, where we calculated the frequency histogram for a sample to be {6, 1, 2}. Go to the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and click on the '''Data''' tab. Paste in the two columns of data. Column 1 {1, 2, 3} - these are the ranges of the sample values and correspond to measurements in the intervals [1:2), [2:3) and [3:4). The second column represents the actual frequency counts of measurements within each of these 3 histogram bins - these are the values {6, 1, 2}. Now press the '''Graphs''' tab. You should see an image like the one below. Then choose '''Beta_Fit_Modeler''' from the drop-down list of models in the top-left and click the estimate parameters check-box, also on the top-left. The graph now shows you the best Beta distribution model fit to the frequency histogram {6, 1, 2}. Click the '''Results''' tab to see the actual estimates of the two parameters of the corresponding Beta distribution (''Left Parameter = 0.0446428571428572; Right Parameter = 0.11607142857142871; Left Limit = 1.0; Right Limit = 3.0'').
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig2.png|400px]]</center>
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig2.png|400px]]</center>
-
You can also see how the (general) Beta distribution degenerates to this shape by going to [http://www.socr.ucla.edu/htmls/SOCR_Distirbutions.html SOCR Distributions], selecting the '''(Generalized) Beta Distribution''' from the top-left and setting the 4 parameters to the 4 values we computed above. Notice how the shape of the Beta dsitribution changes with each change of the parameters. This is also a good demonstration of why we did the distribution model fit to the frequency histogram in the first place - precisely to obtain an analytic model for studying the general process without acquirying mode data. Notice how we can compute the odds (probability) of any event of interest, once we have an analytical model for the distribution of the process. For example, this figure depics the probabilities that a random observation from this process exceeds 2.8 (the right limit). This peobablity is computed to be 0.756
+
You can also see how the (general) Beta distribution degenerates to this shape by going to [http://www.socr.ucla.edu/htmls/SOCR_Distirbutions.html SOCR Distributions], selecting the '''(Generalized) Beta Distribution''' from the top-left and setting the 4 parameters to the 4 values we computed above. Notice how the shape of the Beta distribution changes with each change of the parameters. This is also a good demonstration of why we did the distribution model fitting to the frequency histogram in the first place - precisely to obtain an analytic model for studying the general process without acquiring mode data. Notice how we can compute the odds (probability) of any event of interest, once we have an analytical model for the distribution of the process. For example, this figure depicts the probabilities that a random observation from this process exceeds 2.8 (the right limit). This probability is computed to be 0.756
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig3.png|400px]]</center>
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig3.png|400px]]</center>
====Exercise 2====
====Exercise 2====
-
Go to the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and click on the '''Data Generation''' tab. Select 200 observations from the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta Distribution], as shown on the image below. Choose this four-tuple for the parameters <math> \alpha=1.5; \beta=3; A=0; B=7</math>. Copy these 200 values in your mouse buffer (CNT-C) and paste them in the '''Data''' tab of the '''LineCharts --> PowerTransformHistogramChart''' under [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]. Then ''Map'' this column to ''XYValue'' (under the '''MAP''' tab) and click '''Update_Chart'''. This will generate the histogram of the 200 observations. Indeed, this graph should look like a discrete analog of the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta] density curve. You can see exactly what the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta Distribution] looks like by going to [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions] and selecting <math> Beta(\alpha=1.5; \beta=3; A=0; B=7)</math>.
+
Go to the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and select the '''Graphs''' tab and click the "Scale Up" check-box. Then select '''Normal_Model_Fit''' from the drop-down list of models and begin clicking on the graph panel. The latter allows you to construct manually a histogram of interest. Notice that these are not random measurements, but rather frequency counts that you are manually constructing the histogram of. Try to make the histogram bins form a unimodal, bell-shaped and symmetric graph. Observe that as you click, new histogram bins will appear and the model fit will update. Now click the Estimate Parameters check-box on the top-left and see the best-fit Normal curve appear superimposed on the manually constructed histogram. Under the '''Results''' tab you can find the maximum likelihood estimates for the mean and the standard deviation for the best Normal distribution fit to this specific frequency histogram.
-
 
+
<center>[[Image:SOCR_Activities_NormalBetaModelFit_Dinov_070507_Fig4.png|400px]]</center>
-
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig10.jpg|400px]]
+
-
[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig9.jpg|400px]]
+
-
</center>
+
===Applications===
===Applications===
-
TBD
+
* [[SOCR_EduMaterials_Activities_RNG | Here you can see more instances]] of using the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] to fit distribution models to real data.
 +
* [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] allows one to fit in distribution, polynomial or spectral models to real data - more information about these is available at the [[SOCR_EduMaterials_ModelerActivities | SOCR Modeler Activities]]
<hr>
<hr>

Revision as of 23:29, 5 July 2007

Contents

SOCR Educational Materials - Activities - SOCR Normal and Beta Distribution Model Fit Activity

Summary

This activity describes the process of SOCR model fitting in the case of using Normal or Beta distribution models. Model fitting is the process of determining the parameters for an analytical model in such a way that we obtain optimal parameter estimates according to some criterion. There are many strategies for parameter estimation. The differences between most of these are the underlying cost-functions and the optimization strategies applied to maximize/minimize the cost-function.

Goals

The aims of this activity are to:

  • motivate the need for (analytical) modeling of natural processes
  • illustrate how to use the SOCR Modeler to fit models to real data
  • present applications of model fitting

Background & Motivation

Suppose we are given the sequence of numbers {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and asked to find the best (Continuous) Uniform Distribution that fits that data. In this case, there are two parameters that need to be estimated - the minimum (m) and the maximum (M) of the data. These parameters determine exactly the support (domain) of the continuous distribution and we can explicitly write the density for the (best fit) continuous uniform distribution as:

f(x) = {{1}\over{M-m}}, for m \le x \le M and f(x) = 0, for x \notin [m:M].

Having this model distribution, we can use it's analytical form, f(x), to compute probabilities of events, critical functional values and, in general, do inference on the native process without acquiring additional data. Hence, a good strategy for model fitting is extremely useful in data analysis and statistical inference. Of course, any inference based on models is only going to be as good as the data and the optimization strategy used to generate the model.

Let's look at another motivational example. This time, suppose we have recorded the following (sample) measurements from some process {1.2, 1.7, 3.4, 1.5, 1.1, 1.7, 3.5, 2.5}. Taking bin-size of 1, we can easily calculate the frequency histogram for this sample, {6, 1, 2}, as there are 6 observations in the interval [1:2), 1 measurement in the interval [2:3) and 2 measurements in the interval [3:4).

We can now ask about the best Beta distribution model fit to the histogram of the data!

Most of the time when we study natural processes using probability distributions, it makes sense to fit distribution models to the frequency histogram of a sample, not the actual sample. This is because our general goals are to model the behavior of the native process, understand its distribution and quantify likelihoods of various events of interest (e.g., in terms of the example above, we may be interested in the probability of observing an outcome in the interval [1.50:2.15) or the chance that an observation exceeds 2.8).

Exercises

Exercise 1

Let's first solve the challenge we presented in the background section, where we calculated the frequency histogram for a sample to be {6, 1, 2}. Go to the SOCR Modeler and click on the Data tab. Paste in the two columns of data. Column 1 {1, 2, 3} - these are the ranges of the sample values and correspond to measurements in the intervals [1:2), [2:3) and [3:4). The second column represents the actual frequency counts of measurements within each of these 3 histogram bins - these are the values {6, 1, 2}. Now press the Graphs tab. You should see an image like the one below. Then choose Beta_Fit_Modeler from the drop-down list of models in the top-left and click the estimate parameters check-box, also on the top-left. The graph now shows you the best Beta distribution model fit to the frequency histogram {6, 1, 2}. Click the Results tab to see the actual estimates of the two parameters of the corresponding Beta distribution (Left Parameter = 0.0446428571428572; Right Parameter = 0.11607142857142871; Left Limit = 1.0; Right Limit = 3.0).

You can also see how the (general) Beta distribution degenerates to this shape by going to SOCR Distributions, selecting the (Generalized) Beta Distribution from the top-left and setting the 4 parameters to the 4 values we computed above. Notice how the shape of the Beta distribution changes with each change of the parameters. This is also a good demonstration of why we did the distribution model fitting to the frequency histogram in the first place - precisely to obtain an analytic model for studying the general process without acquiring mode data. Notice how we can compute the odds (probability) of any event of interest, once we have an analytical model for the distribution of the process. For example, this figure depicts the probabilities that a random observation from this process exceeds 2.8 (the right limit). This probability is computed to be 0.756

Exercise 2

Go to the SOCR Modeler and select the Graphs tab and click the "Scale Up" check-box. Then select Normal_Model_Fit from the drop-down list of models and begin clicking on the graph panel. The latter allows you to construct manually a histogram of interest. Notice that these are not random measurements, but rather frequency counts that you are manually constructing the histogram of. Try to make the histogram bins form a unimodal, bell-shaped and symmetric graph. Observe that as you click, new histogram bins will appear and the model fit will update. Now click the Estimate Parameters check-box on the top-left and see the best-fit Normal curve appear superimposed on the manually constructed histogram. Under the Results tab you can find the maximum likelihood estimates for the mean and the standard deviation for the best Normal distribution fit to this specific frequency histogram.

Applications




Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools