# AP Statistics Curriculum 2007 EDA Center

(Difference between revisions)
 Revision as of 16:46, 13 March 2008 (view source)IvoDinov (Talk | contribs) (→Other Measures of Centrality)← Older edit Revision as of 04:09, 7 April 2008 (view source)IvoDinov (Talk | contribs) (→Other Measures of Centrality: revised the Geometric/Harmonic mean sections)Newer edit → Line 46: Line 46: If you remove the student with the long jump distance of 106 and recalculate the median and mean, which one is altered less (therefore is more resistant)? Notice that the mean is very sensitive to outliers and atypical observations, and hence less resistant than the median. If you remove the student with the long jump distance of 106 and recalculate the median and mean, which one is altered less (therefore is more resistant)? Notice that the mean is very sensitive to outliers and atypical observations, and hence less resistant than the median. - ===Other Measures of Centrality=== + ===Resistant Mean-related Measures of Centrality=== The following two sample measures of population centrality estimate resemble the calculations of the [[AP_Statistics_Curriculum_2007_Distrib_MeanVar#Expectation_.28Mean.29 | mean]], however they are much more ''resistant'' to change in the presence of outliers. The following two sample measures of population centrality estimate resemble the calculations of the [[AP_Statistics_Curriculum_2007_Distrib_MeanVar#Expectation_.28Mean.29 | mean]], however they are much more ''resistant'' to change in the presence of outliers. Line 54: Line 54: ===='''Windsorized k-times mean'''==== ===='''Windsorized k-times mean'''==== The Windsorirized k-times mean is defined similarly by $\bar{y}_{w,k}={1\over n}( k\times y_{(k)}+\sum_{i=k+1}^{n-k-1}{y_{(i)}}+k\times y_{(n-k)})$, where $k\geq 0$ is the trim-factor and $y_{(i)}$ are the order statistics (small to large). In this case, before we compute the arithmetic average, we replace the ''k'' smallest and the ''k'' largest observations with the kth and (n-k)th largest observations, respectively. The Windsorirized k-times mean is defined similarly by $\bar{y}_{w,k}={1\over n}( k\times y_{(k)}+\sum_{i=k+1}^{n-k-1}{y_{(i)}}+k\times y_{(n-k)})$, where $k\geq 0$ is the trim-factor and $y_{(i)}$ are the order statistics (small to large). In this case, before we compute the arithmetic average, we replace the ''k'' smallest and the ''k'' largest observations with the kth and (n-k)th largest observations, respectively. + + ===Other Centrality Measures=== + The ''arithmetic'' mean answers the question, ''if all observations were equal, what would that value (''center'') have to be in order to achieve the same total?'' + : $n\times \bar{x}=\sum_{i=1}^n{x_i}$ + + In some situations, there is a need to think of the average in different terms, not in terms of arithmetic average. ====Harmonic Mean==== ====Harmonic Mean==== - In some situations, there is a need to think of the average in different terms, not in terms of arithmetic average. For instance if we study speeds (velocities) the ''arithmetic'' mean is inappropriate, however the [http://en.wikibooks.org/wiki/Statistics/Summary/Averages/Harmonic_Mean harmonic mean (computed differently)] gives the most intuitive answer to what is the "''middle''" for a process. + If we study speeds (velocities) the ''arithmetic'' mean is inappropriate, however the [http://en.wikibooks.org/wiki/Statistics/Summary/Averages/Harmonic_Mean harmonic mean (computed differently)] gives the most intuitive answer to what is the "''middle''" for a process. The harmonic mean answers the question ''if all the observations were equal, what would that value have to be in order to achieve the same sample sum of reciprocals?'' : ''Harmonic mean'': $\hat{\hat{x}}= \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + \frac{1}{x_3} + \ldots + \frac{1}{x_n}}$ : ''Harmonic mean'': $\hat{\hat{x}}= \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + \frac{1}{x_3} + \ldots + \frac{1}{x_n}}$ ====Geometric Mean==== ====Geometric Mean==== - The ''arithmetic'' mean answers the question, ''if all observations were equal, what would that value (''center'') have to be in order to achieve the same total?'' + In contrast, the [http://en.wikibooks.org/wiki/Statistics/Summary/Averages/Geometric_Mean geometric mean] answers the question, ''if all the observations were equal, what would that value have to be in order to achieve the same sample product?'' - : $n\times \bar{x}=\sum_{i=1}^n{x_i}$ + - + - In contrast, the [http://en.wikibooks.org/wiki/Statistics/Summary/Averages/Geometric_Mean geometric mean] answers the question, ''if all the observations were equal, what would that value have to be in order to achieve the same sample product ?'' + : ''Geometric mean'': $\tilde{x}^n={\prod_{i=1}^n x_i}$ : ''Geometric mean'': $\tilde{x}^n={\prod_{i=1}^n x_i}$

## General Advance-Placement (AP) Statistics Curriculum - Central Tendency

### Measurements of Central Tendency

There are three main features of all populations (or data samples) that are always critical in understanding and interpreting their distributions. These characteristics are Center, Spread and Shape. The main measure of centrality are mean, median and mode.

Suppose we are interested in the long-jump performance of some students. We can carry an experiment by randomly selecting 8 male statistics students and ask them to perform the standing long jump. In reality every student participated, but for the ease of calculations below we will focus on these eight students. The long jumps were as follows:

 74 78 106 80 68 64 60 76

### Mean

The sample-mean is the arithmetic average of a finite sample of numbers. In the long-jump example, the sample-mean is calculated as follows:

$\overline{y} = {1 \over 8} (74+78+106+80+68+64+60+76)=75.75 in.$

### Median

The sample-median can be thought of as the point that divides a distribution in half (50/50). The following steps are used to find the sample-median:

• Arrange the data in ascending order
• If the sample size is odd, the median is the middle value of the ordered collection
• If the sample size is even, the median is the average of the middle two values in the ordered collection.

For the long-jump data above we have:

• Ordered data:
 60 64 68 74 76 78 80 106
• $Median = {74+76 \over 2} = 75$.

### Mode(s)

The modes represent the most frequently occurring values (The numbers that appear the most). The term mode is applied both to probability distributions and to collections of experimental data.

For instance, for the Hot dogs data file, there appear to be 3 modes for the calorie variable! This is evident by the histogram of the Calorie content of all hotdogs, shown in the image below. Note the clear separation of the calories into 3 distinct sub-populations - the highest points in these three sub-populations are the three modes for these data.

### Resistance

A statistic is said to be resistant if the value of the statistic is relatively unchanged by changes in a small portion of the data. Referencing the formulas for the median, mean and mode which statistic seems to be more resistant?

If you remove the student with the long jump distance of 106 and recalculate the median and mean, which one is altered less (therefore is more resistant)? Notice that the mean is very sensitive to outliers and atypical observations, and hence less resistant than the median.

### Resistant Mean-related Measures of Centrality

The following two sample measures of population centrality estimate resemble the calculations of the mean, however they are much more resistant to change in the presence of outliers.

#### K-times trimmed mean

$\bar{y}_{t,k}={1\over n-2k}\sum_{i=k+1}^{n-k}{y_{(i)}}$, where $k\geq 0$ is the trim-factor (large k, yield less variant estimates of center), and y(i) are the order statistics (small to large). That is, we remove the smallest and the largest k observations from the sample, before we compute the arithmetic average.

#### Windsorized k-times mean

The Windsorirized k-times mean is defined similarly by $\bar{y}_{w,k}={1\over n}( k\times y_{(k)}+\sum_{i=k+1}^{n-k-1}{y_{(i)}}+k\times y_{(n-k)})$, where $k\geq 0$ is the trim-factor and y(i) are the order statistics (small to large). In this case, before we compute the arithmetic average, we replace the k smallest and the k largest observations with the kth and (n-k)th largest observations, respectively.

### Other Centrality Measures

The arithmetic mean answers the question, if all observations were equal, what would that value (center) have to be in order to achieve the same total?

$n\times \bar{x}=\sum_{i=1}^n{x_i}$

In some situations, there is a need to think of the average in different terms, not in terms of arithmetic average.

#### Harmonic Mean

If we study speeds (velocities) the arithmetic mean is inappropriate, however the harmonic mean (computed differently) gives the most intuitive answer to what is the "middle" for a process. The harmonic mean answers the question if all the observations were equal, what would that value have to be in order to achieve the same sample sum of reciprocals?

Harmonic mean: $\hat{\hat{x}}= \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + \frac{1}{x_3} + \ldots + \frac{1}{x_n}}$

#### Geometric Mean

In contrast, the geometric mean answers the question, if all the observations were equal, what would that value have to be in order to achieve the same sample product?

Geometric mean: $\tilde{x}^n={\prod_{i=1}^n x_i}$
Alternatively: $\tilde{x}= \exp \left( \frac{1}{n} \sum_{i=1}^n\log(x_i) \right)$