AP Statistics Curriculum 2007 ANOVA 1Way

From Socr

(Difference between revisions)
Jump to: navigation, search
(New page: == General Advance-Placement (AP) Statistics Curriculum - One-Way Analysis of Variance (ANOVA) == === One-Way ANOVA === Example on how to attach images ...)
Line 1: Line 1:
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - One-Way Analysis of Variance (ANOVA) ==
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - One-Way Analysis of Variance (ANOVA) ==
-
=== One-Way ANOVA ===
+
In the [[EBook#Chapter_IX:_Inferences_from_Two_Samples |two-sample inference chapter]] we considered the comparisons of two independent group means using the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep#Independent_T-test_Validity |independent T-test]]. Now, we expand our inference methods to study and compare ''k'' independent samples. In this case, we will be decomposing the entire variation in the data into (independent/orthogonal) components - i.e., we'll be anzlyzing the variance of the data. Hence, this procedure called '''Analysis of Variance (ANOVA)'''.
-
Example on how to attach images to Wiki documents in included below (this needs to be replaced by an appropriate figure for this section)!
+
 
-
<center>[[Image:AP_Statistics_Curriculum_2007_IntroVar_Dinov_061407_Fig1.png|500px]]</center>
+
===Motivational Example===
 +
Suppose 5 varieties of peas are currently being tested by a large agribusiness cooperative to determine which is best suited for production.  A field was divided into 20 plots, with each variety of peas planted in four plots.  The yields (in bushels of peas) produced from each plot are shown in two identical forms in the tables below.
 +
 
 +
<center>
 +
{| class="wikitable" style="text-align:center; width:30%" border="1"
 +
|-
 +
| colspan=5| Variety of Pea
 +
|-
 +
| A || B || C || D || E
 +
|-
 +
| 26.2 || 29.2 || 29.1 || 21.3 || 20.1
 +
|-
 +
| 24.3 || 28.1 || 30.8 || 22.4 || 19.3
 +
|-
 +
| 21.8 || 27.3 || 33.9 || 24.3 || 19.9
 +
|-
 +
| 28.1 || 31.2 || 32.8 || 21.8 || 22.1
 +
|}
 +
<br> <br>
 +
{| class="wikitable" style="text-align:center; width:30%" border="1"
 +
|-
 +
| A || 26.2,24.3,21.8,28.1
 +
|-
 +
| B || 29.2,28.1,27.3,31.2
 +
|-
 +
| C || 29.1,30.8,33.9,32.8
 +
|-
 +
| D || 21.3,22.4,24.3,21.8
 +
|-
 +
| E || 20.1,19.3,19.9,22.1
 +
|}
 +
</center>
 +
 
 +
Using the [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] (see [[SOCR_EduMaterials_Activities_BoxAndWhiskerChart | SOCR Box-and-Whisker Plot Activity]] and [[SOCR_EduMaterials_Activities_DotChart | Dot Plot Activity]]) we can generate plots that enable us to compare visually the yields of the 5 different types peas.
 +
 
 +
<center>[[Image:SOCR_EBook_Dinov_ANOVA1_021708_Fig1.jpg|500px]]</center>
 +
 
 +
 
 +
Using ANOVA, the data are regarded as random samples from ''k'' populations. Suppose the population means of the samples are <math>\mu_1, \mu_2, \mu_3, \mu_4, \mu_5</math> and their population standard deviations are: <math>\sigma_1, \sigma_2, \sigma_3, \sigma_4, \sigma_5</math>.  We have 5 group means to compare. Why not just carry out <math>{5\choose 2}=10</math> T-tests comparing all (independent) pairs of groups?
 +
 
 +
Repeated T-tests would mean testing hull hypotheses of the type <math>H_o: \mu_i = \mu_j, \forall i\not= j</math>. What is the problem with this approach? Suppose each test is carried out at <math>\alpha = 0.05</math>, so a [[AP_Statistics_Curriculum_2007_Hypothesis_Basics | type I error]] is 5% for each test.
 +
Then, the overall risk of a type I error is larger than 0.05 and gets much larger as the number of groups (''k'') gets larger. To solve this problem, we need to make multiple comparisons with an overall error of <math>\alpha = 0.05</math> (or whichever level is specified initially).
 +
 
 +
 
 +
The main idea behind ANOVA is that we need to know how much inherent variability there is in the data before we can judge whether there is a difference in the sample means - i.e., presence of a grouping effect.  To make an inference about means we compare two types of variability:
 +
 
 +
: variability between sample means
 +
 
 +
: variability within each group
 +
 
 +
It is very important that we keep these two types of variability in mind as we work through the following formulas. It is our goal to come up with a numerical recipe that describes/computes each of these variabilities.
 +
 
 +
 
 +
===One-Way ANOVA Calculations===
 +
 
 +
Let's  make the following notation:
 +
: <math>y_{i,j}</math> = the measurement from ''group i'', ''observation-index j''.
 +
: k = number of groups
 +
: <math>n_i</math> = number of observations in group ''i''
 +
: n = total number of observations, <math>n= n_1 + n_2 + \cdots + n_k</math>
 +
: The group mean for group i is: <math>y_{i,.} = {\sum_{j=1}^{n_i}{y_{i,j}} \over n_i}</math>
 +
: The grand mean is: <math>\bar{y}=y_{.,.} = {\sum_i=1}^k {\sum_{j=1}^{n_i}{y_{i,j}} \over n}}</math>
 +
 
 +
 
 +
To compute the difference between the means we will compare each group mean to the grand mean
 +
 
===Approach===
===Approach===

Revision as of 05:02, 19 February 2008

Contents

General Advance-Placement (AP) Statistics Curriculum - One-Way Analysis of Variance (ANOVA)

In the two-sample inference chapter we considered the comparisons of two independent group means using the independent T-test. Now, we expand our inference methods to study and compare k independent samples. In this case, we will be decomposing the entire variation in the data into (independent/orthogonal) components - i.e., we'll be anzlyzing the variance of the data. Hence, this procedure called Analysis of Variance (ANOVA).

Motivational Example

Suppose 5 varieties of peas are currently being tested by a large agribusiness cooperative to determine which is best suited for production. A field was divided into 20 plots, with each variety of peas planted in four plots. The yields (in bushels of peas) produced from each plot are shown in two identical forms in the tables below.

Variety of Pea
A B C D E
26.2 29.2 29.1 21.3 20.1
24.3 28.1 30.8 22.4 19.3
21.8 27.3 33.9 24.3 19.9
28.1 31.2 32.8 21.8 22.1



A 26.2,24.3,21.8,28.1
B 29.2,28.1,27.3,31.2
C 29.1,30.8,33.9,32.8
D 21.3,22.4,24.3,21.8
E 20.1,19.3,19.9,22.1

Using the SOCR Charts (see SOCR Box-and-Whisker Plot Activity and Dot Plot Activity) we can generate plots that enable us to compare visually the yields of the 5 different types peas.


Using ANOVA, the data are regarded as random samples from k populations. Suppose the population means of the samples are μ12345 and their population standard deviations are: σ12345. We have 5 group means to compare. Why not just carry out {5\choose 2}=10 T-tests comparing all (independent) pairs of groups?

Repeated T-tests would mean testing hull hypotheses of the type H_o: \mu_i = \mu_j, \forall i\not= j. What is the problem with this approach? Suppose each test is carried out at α = 0.05, so a type I error is 5% for each test. Then, the overall risk of a type I error is larger than 0.05 and gets much larger as the number of groups (k) gets larger. To solve this problem, we need to make multiple comparisons with an overall error of α = 0.05 (or whichever level is specified initially).


The main idea behind ANOVA is that we need to know how much inherent variability there is in the data before we can judge whether there is a difference in the sample means - i.e., presence of a grouping effect. To make an inference about means we compare two types of variability:

variability between sample means
variability within each group

It is very important that we keep these two types of variability in mind as we work through the following formulas. It is our goal to come up with a numerical recipe that describes/computes each of these variabilities.


One-Way ANOVA Calculations

Let's make the following notation:

yi,j = the measurement from group i, observation-index j.
k = number of groups
ni = number of observations in group i
n = total number of observations, n= n_1 + n_2 + \cdots + n_k
The group mean for group i is: y_{i,.} = {\sum_{j=1}^{n_i}{y_{i,j}} \over n_i}
The grand mean is: Failed to parse (syntax error): \bar{y}=y_{.,.} = {\sum_i=1}^k {\sum_{j=1}^{n_i}{y_{i,j}} \over n}}


To compute the difference between the means we will compare each group mean to the grand mean


Approach

Models & strategies for solving the problem, data understanding & inference.

  • TBD

Model Validation

Checking/affirming underlying assumptions.

  • TBD

Computational Resources: Internet-based SOCR Tools

  • TBD

Examples

Computer simulations and real observed data.

  • TBD

Hands-on activities

Step-by-step practice problems.

  • TBD

References

  • TBD



Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools