AP Statistics Curriculum 2007 NonParam 2MedianIndep

(Difference between revisions)
 Revision as of 23:56, 20 June 2007 (view source)IvoDinov (Talk | contribs)← Older edit Revision as of 19:33, 24 February 2008 (view source)IvoDinov (Talk | contribs) Newer edit → Line 1: Line 1: - [[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] -  Medians of Two Independent Samples + [[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] -  Difference of Medians of Two Independent Samples - ==Differences of Medians of Two Independent Samples== + As we discusse in the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair | paired case]], non-parametric statistical methods provide alternatives to the (standard) [[EBook#Chapter_VIII:_Hypothesis_Testing | parametric tests that we saw earlier]], and they are applicable when the distribution of the data is unknown. - TBD + - ==Approach== + ==Motivational Example== - TBD + Nine observations of surface soil [http://en.wikipedia.org/wiki/PH pH] were made at two different (independent) locations.  Does the data suggest that the true mean soil [http://en.wikipedia.org/wiki/PH pH] values differ for the two locations?  Note that there is no pairing in this design, even though this is a balanced design with 9 observation in each (independent) group. Test using $\alpha = 0.05$, and be sure to check any necessary assumptions for the validity of your test. - ==Model Validation== +
- TBD + {| class="wikitable" style="text-align:center; width:35%" border="1" + |- + | '''Location 1''' || '''Location 2''' + |- + | 8.10 || 7.85 + |- + | 7.89 || 7.30 + |- + | 8.00 || 7.73 + |- + | 7.85 || 7.27 + |- + | 8.01 || 7.58 + |- + | 7.82 || 7.27 + |- + | 7.99 || 7.50 + |- + | 7.80 || 7.23 + |- + | 7.93 || 7.41 + |} +
- ==Computational Resources: Internet-based SOCR Tools== + We see the clear analogy of this study design to the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |independent 2-sample designs]] we saw before. However, if we were to plot these data we can see that their distributions may be different or not even symmetric, unimodal and bell-shaped (i.e., not Normal). Therefore, we can not use the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |independent T-test]] to test a Null-hypothesis that the centers of the two distributions (that the 2 samples came from) are identical, using this parametric test. - TBD + - ==Examples== +
[[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig1.jpg|600px]] - TBD + [[Image:SOCR_EBook_Dinov_NonParam_Wilcoxon_022408_Fig2.jpg|600px]]
- ==Hands-on Activities== + ==The Sign-Test== - TBD + The '''sign test''' is a non-parametric alternative to the [[AP_Statistics_Curriculum_2007_Infer_2Means_Dep | one-sample and paired T-test]]. The sign test has no requirements for the data to be Normally distributed. It assigns a positive (+) or negative (-) sign to each observation according to whether it is greater or less than some hypothesized value. Then it measures the difference between the $\pm$ signs and how distinct is this difference from what we would expect to observe by chance alone. For example, if there were no effect of developing acute renal failure on the outcome from sepsis, about half of the 16 studies above would be expected to have a relative risk less than 1.0 (a "-" sign) and the remaining 8 would be expected to have a relative risk greater than 1.0 (a "+" sign). In the actual data, 3 studies had "-" signs and the remaining 13 studies had "+" signs. Intuitively, this difference of 10 appears large to be simply due to random variation. If so, the effect of developing acute renal failure would be significant on the outcome from sepsis. -
+ + ===Calculations=== + Suppose N+ is the number of "+" signs and  we fix a significance level of \alpha= 0.05[/itex]. And consider the following two hypotheses: + + : $H_o: N_+=8$ (equivalent to $N_-=8$):  The effect of developing acute renal failure is not significant on the outcome from sepsis. + : $H_1: N_+ \not=8$:  The effect of developing acute renal failure is significant on the outcome from sepsis. + + Define the following test-statistics + :$B_s = \max{(N_+ , N_-)}$, where $N_+$ and $N_-$ are the number of positive and negative signs, respectively. + + Then the distribution of $B_s \sim Binomial(n=16, p=8/16=0.5)$. + + For our data, $B_s = \max{(N_+ , N_-)}=\max{13,3}=13$ and the probability that such [[AP_Statistics_Curriculum_2007_Distrib_Binomial |binomial variable]] exceeds 13 is $P(Bin(16,0.5,13))=0.010635$. Therefore, we can reject the null hypothesis $H_o$ and regard as significant the effect of developing acute renal failure on the outcome from sepsis. + +
[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig2.jpg|600px]]
+ + ===The Sign test using SOCR Analyses=== + It is much quicker to use [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses] to compute the statistical significance of the sign test. This [[SOCR_EduMaterials_AnalysisActivities_TwoPairedSign | SOCR Sign test activity]] may also be helpful in understanding how to use the sign test method in SOCR. + + ===Example=== + A set of 12 identical twins are given psychological tests to determine whether the ''first born'' of the set tends to be more aggressive than the ''second born''.  Each twin is scored according to aggressiveness; a higher score indicates greater aggressiveness. Because of the natural pairing in a set of twins these data can be considered paired. + +
+ {| class="wikitable" style="text-align:center; width:40%" border="1" + |- + | Twin-Index || 1st Born || 2nd Born || Sign + |- + | 1 || 86 || 88 || - + |- + | 2 || 71 || 77 || - + |- + | 3 || 77 || 76 || + + |- + | 4 || 68 || 64 || + + |- + | 5 || 91 || 96 || - + |- + | 6 || 72 || 72 || 0 (Drop) + |- + | 7 || 77 || 65 || + + |- + | 8 || 91 || 90 || + + |- + | 9 || 70 || 65 || + + |- + | 10 || 71 || 80 || - + |- + | 11 || 88 || 81 || + + |- + | 12 || 87 || 72 || + + |} +
+ + We first plot the data using [[SOCR_EduMaterials_Activities_LineChart | the SOCR Line Chart]]. Visually there does not seem to be a strong effect of the order of birth on baby's aggression. + +
[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig3.jpg|600px]]
+ + Next we can use the [[SOCR_EduMaterials_AnalysisActivities_TwoPairedSign | SOCR Sign Test Analysis]] to quantitatively evaluate the evidence to reject the null hypothesis that there is no birth-order effect on baby's aggressiveness. + +
[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig4.jpg|600px]]
+ + Clearly the p-value reported is 0.274, and our data can not reject the null hypothesis. + + ==The Wilcoxon signed rank test== + + Like the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Sign-Test | sign test]] and the [[AP_Statistics_Curriculum_2007_Hypothesis_S_Mean | T-test]], the [http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed rank test] involves comparisons of differences between measurements. It requires that the data are measured at an interval level of measurement, but does not require assumptions about the form of the distribution of the measurements. It should therefore be used whenever the distributional assumptions of the T-test are not satisfied. + + ===Example=== + [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#References | Whitley and Ball reported]] data on the '''central venous oxygen saturation''' (SvO2 (%)) from 10 consecutive patients at 2 time points; at admission and 6 hours after admission to the intensive care unit (ICU). The null hypothesis is that there is no effect of 6 hours of ICU treatment on SvO2. Under the null hypothesis, the mean of the differences between SvO2 at admission and that at 6 hours after admission should be zero. + +
+ {| class="wikitable" style="text-align:center; width:40%" border="1" + |- + | '''Patient''' || '''On Admission''' || '''At 6 Hours''' || '''Difference''' || '''Rank''' + |- + | 2 || 59.1 || 56.7 || -2.4 || 1 + |- + | 7 || 58.2 || 60.7 || 2.5 || 2 + |- + | 9 || 56.0 || 59.5 || 3.5 || 3 + |- + | 10 || 65.3 || 59.8 || -5.5 || 4 + |- + | 3 || 56.1 || 61.9 || 5.8 || 5 + |- + | 5 || 60.6 || 67.7 || 7.1 || 6 + |- + | 6 || 37.8 || 50.0 || 12.2 || 7 + |- + | 1 || 39.7 || 52.9 || 13.2 || 8 + |- + | 4 || 57.7 || 71.4 || 13.7 || 9 + |- + | 8 || 33.6 || 51.3 || 17.7 || 10 + |} +
+ +
[[Image:SOCR_EBook_Dinov_NonParam_SignTest_022308_Fig5.jpg|600px]]
+ + Clearly, we can reject the null-hypothesys at $\alpha=0.05$, as the one- and two-sided alternative hypotheses p-values for the [http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed rank test] reported by the [[SOCR_EduMaterials_AnalysisActivities_TwoPairedRank | SOCR Analysis]] are respectively + : One-Sided p-value = 0.011 + : Two-Sided p-value = 0.022 + + ==Practice Problems== + Suppose 10 randomly selected rats were chosen to see if they could be trained to escape a maze.  The rats were released and timed (sec.) before and after 2 weeks of training (N means the rat did not complete the maze-test).  Do the data provide evidence to suggest that the escape time of rats is different after 2 weeks of training?  Test using  $\alpha= 0.05$. + +
+ {| class="wikitable" style="text-align:center; width:40%" border="1" + |- + | '''Rat''' || '''Before''' || '''After''' || '''Sign''' + |- + | 1 || 100 || 50 || + + |- + | 2 || 38 || 12 || + + |- + | 3 || N || 45 || + + |- + | 4 || 122 || 62 || + + |- + | 5 || 95 || 90 || + + |- + | 6 || 116 || 100 || + + |- + | 7 || 56 || 75 || - + |- + | 8 || 135 || 52 || + + |- + | 9 || 104 || 44 || + + |- + | 10 || N || 50 || + + |} +
==References== ==References== - * TBD + * Whitley, E. and Ball, J. (2002) [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=153434 Statistics review 6: Nonparametric methods]. Critical Care, 6(6): 509–513.

Revision as of 19:33, 24 February 2008

General Advance-Placement (AP) Statistics Curriculum - Difference of Medians of Two Independent Samples

As we discusse in the paired case, non-parametric statistical methods provide alternatives to the (standard) parametric tests that we saw earlier, and they are applicable when the distribution of the data is unknown.

Motivational Example

Nine observations of surface soil pH were made at two different (independent) locations. Does the data suggest that the true mean soil pH values differ for the two locations? Note that there is no pairing in this design, even though this is a balanced design with 9 observation in each (independent) group. Test using α = 0.05, and be sure to check any necessary assumptions for the validity of your test.

 Location 1 Location 2 8.10 7.85 7.89 7.30 8.00 7.73 7.85 7.27 8.01 7.58 7.82 7.27 7.99 7.50 7.80 7.23 7.93 7.41

We see the clear analogy of this study design to the independent 2-sample designs we saw before. However, if we were to plot these data we can see that their distributions may be different or not even symmetric, unimodal and bell-shaped (i.e., not Normal). Therefore, we can not use the independent T-test to test a Null-hypothesis that the centers of the two distributions (that the 2 samples came from) are identical, using this parametric test.

The Sign-Test

The sign test is a non-parametric alternative to the one-sample and paired T-test. The sign test has no requirements for the data to be Normally distributed. It assigns a positive (+) or negative (-) sign to each observation according to whether it is greater or less than some hypothesized value. Then it measures the difference between the $\pm$ signs and how distinct is this difference from what we would expect to observe by chance alone. For example, if there were no effect of developing acute renal failure on the outcome from sepsis, about half of the 16 studies above would be expected to have a relative risk less than 1.0 (a "-" sign) and the remaining 8 would be expected to have a relative risk greater than 1.0 (a "+" sign). In the actual data, 3 studies had "-" signs and the remaining 13 studies had "+" signs. Intuitively, this difference of 10 appears large to be simply due to random variation. If so, the effect of developing acute renal failure would be significant on the outcome from sepsis.

Calculations

Suppose N+ is the number of "+" signs and we fix a significance level of α = 0.05. And consider the following two hypotheses:

Ho:N + = 8 (equivalent to N = 8): The effect of developing acute renal failure is not significant on the outcome from sepsis.
$H_1: N_+ \not=8$: The effect of developing acute renal failure is significant on the outcome from sepsis.

Define the following test-statistics

Bs = max(N + ,N), where N + and N are the number of positive and negative signs, respectively.

Then the distribution of $B_s \sim Binomial(n=16, p=8/16=0.5)$.

For our data, Bs = max(N + ,N) = max13,3 = 13 and the probability that such binomial variable exceeds 13 is P(Bin(16,0.5,13)) = 0.010635. Therefore, we can reject the null hypothesis Ho and regard as significant the effect of developing acute renal failure on the outcome from sepsis.

The Sign test using SOCR Analyses

It is much quicker to use SOCR Analyses to compute the statistical significance of the sign test. This SOCR Sign test activity may also be helpful in understanding how to use the sign test method in SOCR.

Example

A set of 12 identical twins are given psychological tests to determine whether the first born of the set tends to be more aggressive than the second born. Each twin is scored according to aggressiveness; a higher score indicates greater aggressiveness. Because of the natural pairing in a set of twins these data can be considered paired.

 Twin-Index 1st Born 2nd Born Sign 1 86 88 - 2 71 77 - 3 77 76 + 4 68 64 + 5 91 96 - 6 72 72 0 (Drop) 7 77 65 + 8 91 90 + 9 70 65 + 10 71 80 - 11 88 81 + 12 87 72 +

We first plot the data using the SOCR Line Chart. Visually there does not seem to be a strong effect of the order of birth on baby's aggression.

Next we can use the SOCR Sign Test Analysis to quantitatively evaluate the evidence to reject the null hypothesis that there is no birth-order effect on baby's aggressiveness.

Clearly the p-value reported is 0.274, and our data can not reject the null hypothesis.

The Wilcoxon signed rank test

Like the sign test and the T-test, the Wilcoxon signed rank test involves comparisons of differences between measurements. It requires that the data are measured at an interval level of measurement, but does not require assumptions about the form of the distribution of the measurements. It should therefore be used whenever the distributional assumptions of the T-test are not satisfied.

Example

Whitley and Ball reported data on the central venous oxygen saturation (SvO2 (%)) from 10 consecutive patients at 2 time points; at admission and 6 hours after admission to the intensive care unit (ICU). The null hypothesis is that there is no effect of 6 hours of ICU treatment on SvO2. Under the null hypothesis, the mean of the differences between SvO2 at admission and that at 6 hours after admission should be zero.

 Patient On Admission At 6 Hours Difference Rank 2 59.1 56.7 -2.4 1 7 58.2 60.7 2.5 2 9 56.0 59.5 3.5 3 10 65.3 59.8 -5.5 4 3 56.1 61.9 5.8 5 5 60.6 67.7 7.1 6 6 37.8 50.0 12.2 7 1 39.7 52.9 13.2 8 4 57.7 71.4 13.7 9 8 33.6 51.3 17.7 10

Clearly, we can reject the null-hypothesys at α = 0.05, as the one- and two-sided alternative hypotheses p-values for the Wilcoxon signed rank test reported by the SOCR Analysis are respectively

One-Sided p-value = 0.011
Two-Sided p-value = 0.022

Practice Problems

Suppose 10 randomly selected rats were chosen to see if they could be trained to escape a maze. The rats were released and timed (sec.) before and after 2 weeks of training (N means the rat did not complete the maze-test). Do the data provide evidence to suggest that the escape time of rats is different after 2 weeks of training? Test using α = 0.05.

 Rat Before After Sign 1 100 50 + 2 38 12 + 3 N 45 + 4 122 62 + 5 95 90 + 6 116 100 + 7 56 75 - 8 135 52 + 9 104 44 + 10 N 50 +