AP Statistics Curriculum 2007 EDA Var

From Socr

(Difference between revisions)

Revision as of 23:52, 13 May 2010

General Advance-Placement (AP) Statistics Curriculum - Measures of Variation

Measures of Variation and Dispersion

There are many measures of (population or sample) variation, e.g., the range, the variance, the standard deviation, mean absolute deviation, etc. These are used to assess the dispersion or spread of the population.

Suppose we are interested in the long-jump performance of some students. We can carry an experiment by randomly selecting 8 male statistics students and ask them to perform the standing long jump. In reality every student participated, but for the ease of calculations below we will focus on these eight students. The long jumps were as follows:

Long-Jump (inches) Sample Data
60	64	68	74	76	78	80	106

Range

The range is the easiest measure of dispersion to calculate, yet, perhaps not the best measure. The Range = max - min. For example, for the Long Jump data, the range is calculated by:

Range = 106 – 60 = 46.

Note that the range is only sensitive to the extreme values of a sample and ignores all other information. So, two completely different distributions may have the same range.

Quartiles and IQR

The first quartile ( $Q 1$ ) and the third quartile ( $Q 3$ ) are defined values that split the dataset into bottom-25% vs. top-75% and bottom-75% vs. top-25%, respectively. This the inter-quartile range (IQR), which is the difference $Q 3 - Q 1$ , represents the central 50% for the data and can be considered as a measure of data dispersion or variation. The wider the IQR, the more variant the data.

For example, $Q 1 = (64 + 68) / 2 = 66$ , $Q 3 = (78 + 80) / 2 = 79$ and $I Q R = Q 3 - Q 1 = 13$ , for the Long-Jump data shown above. Thus we expect the middle half of all long jumps (for that population) to be between 66 and 79 inches.

Five-number summary

The five-number summary for a dataset is the 5-tuple ${m i n, Q 1, Q 2, Q 3, m a x}$ , containing the sample minimum, first-quartile, second-quartile (median), third-quartile, and maximum.

Variance and Standard Deviation

The logic behind the variance and standard deviation measures is to measure the difference between each observation and the mean (i.e., dispersion). Suppose we have n > 1 observations, $\left \{ y_1, y_2, y_3, ..., y_n \right \}$ . The deviation of the $i t h$ measurement, $y i$ , from the mean ( $\overline{y}$ ) is defined by $(y_i - \overline{y})$ .

Does the average of these deviations seem like a reasonable way to find an average deviation for the sample or the population? No, because the sum of all deviations is trivial:

$\sum_{i=1}^n{(y_i - \overline{y})}=0.$

To solve this problem we employ different versions of the mean absolute deviation:

${1 \over n-1}\sum_{i=1}^n{|y_i - \overline{y}|}.$

In particular, the variance is defined as:

${1 \over n-1}\sum_{i=1}^n{(y_i - \overline{y})^2}.$

And the standard deviation is defined as:

$\sqrt{{1 \over n-1}\sum_{i=1}^n{(y_i - \overline{y})^2}}.$

For the long-jump sample of 8 measurements, the standard deviation is:

$\sqrt{{1 \over 8-1} \left \{(60-75.75)^2 + (64-75.75)^2 + (68-75.75)^2 + (74-75.75)^2 + (76-75.75)^2 + (78-75.75)^2 + (80-75.75)^2 + (106-75.75)^2 \right \} } = 14.079.$

Activities

Try to pair each of the 4 samples whose numerical summaries are reported below with one of the 4 frequency plots below. Explain your answers.

Long-Jump (inches) Sample Data
Sample	Mean	Median	StdDev
A	4.688	5.000	1.493
B	4.000	4.000	1.633
C	3.933	4.000	1.387
D	4.000	4.000	2.075

Notes

Some software packages may use ${1 \over n}$ , instead of the ${1 \over n-1}$ , which we used above. Note that for large sample-sizes this difference becomes increasingly smaller. Also, there are theoretical properties of the sample variance, as defined above (e.g., sample-variance is an unbiased estimate of the population-variance!)

Most of the SOCR Charts and SOCR Analyses compute the variance or standard deviation for the sample. You can see these examples of Charts Activities and Analyses Activities and you can test these using hotdogs dataset.

Problems

References

Lecture notes on EDA

SOCR Home page: http://www.socr.ucla.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 20: / Line 20: @@
 ===Quartiles and IQR===
-The first quartile (<math>Q_1</math>) and the third quartile (<math>Q_3</math>) are defined values that split the dataset into ''bottom-25% vs. top-75%'' and ''bottom-75% vs. top-25%'', respectively. This the inter-quartile range (IQR), which is the difference <math>Q_3 - Q_1</math>, represents the central 50% for the data and can be considered as a measure of data dispersion or variation. The wider the IQR is the more variant the data is.
+The first quartile (<math>Q_1</math>) and the third quartile (<math>Q_3</math>) are defined values that split the dataset into ''bottom-25% vs. top-75%'' and ''bottom-75% vs. top-25%'', respectively. This the inter-quartile range (IQR), which is the difference <math>Q_3 - Q_1</math>, represents the central 50% for the data and can be considered as a measure of data dispersion or variation. The wider the IQR, the more variant the data.
 For example, <math>Q_1=(64+68)/2=66</math>,  <math>Q_3=(78+80)/2=79</math> and <math>IQR=Q_3-Q_1=13</math>, for the Long-Jump data shown above. Thus we expect the middle half of all long jumps (for that population) to be between 66 and 79 inches.

AP Statistics Curriculum 2007 EDA Var

From Socr

Revision as of 23:52, 13 May 2010

Contents

General Advance-Placement (AP) Statistics Curriculum - Measures of Variation

Measures of Variation and Dispersion

Range

Quartiles and IQR

Five-number summary

Variance and Standard Deviation

Activities

Notes

Problems

References

Views

Personal tools

Navigation

Search

Toolbox