| Multiple Comparisons |
Article Index for Multiple |
Website Links For Multiple |
Information AboutMultiple Comparisons |
| CATEGORIES ABOUT MULTIPLE COMPARISONS | |
| hypothesis testing | |
| SHOPPER'S DELIGHT | |
|
In Statistics , the multiple comparisons problem occurs when one subjects a number of independent observations to the same acceptance criterion that would be used when considering a single event. Typically, an acceptance criterion of a single event takes the form of a requirement that the observed data be highly unlikely under a default assumption ( Null Hypothesis ). As the number of independent applications of the acceptance criterion begins to outweigh the high unlikelihood associated with each individual test, it becomes increasingly likely that one observe data that satisfy the acceptance criterion by chance alone (even if the default assumption is true in all cases). These errors are considered False Positive s because they positively identify a set of observations as satisfying the acceptance criterion while that data in fact represents the null hypothesis. Many mathematical techniques have been developed to counter the false positive error rate associated with making multiple statistical comparisons. FLIPPING COINS For example, one might declare that a coin was biased if in 10 flips it landed heads at least 9 times. Indeed, if one assumes as a null hypothesis that the coin is fair, then the likelihood that a fair coin would come up heads at least 9 out of 10 times is 11/210=0.0107. This is relatively unlikely, and under most Statistical Criteria (such as P-value <0.05), one would declare that the null hypothesis should be rejected - i.e. the coin is unfair. A multiple comparisons problem arises if one wanted to use this test (which is appropriate for testing the fairness of a single coin), to test the fairness of many coins. Imagine if one was to test 100 fair coins by this method. Given that the probability of a fair coin coming up 9 or 10 heads in 10 flips is 0.0107, one would expect that in flipping 100 fair coins ten times each, to see a particular coin come up heads 9 or 10 times would be a relatively likely event. Precisely, the likelihood that all 100 fair coins are identified as fair by this criterion is (1-0.0107)100=0.34. Therefore the application of our single-test coin-fairness criterion to multiple comparisons would more likely than not falsely identify a fair coin as unfair. FORMALISM Technically, the problem of multiple comparisons (also known as ''multiple testing problem'') can be described as the potential increase in ''α'' (alpha) is given by : and it increases as the number of comparisons increases. METHODS In order to retain the same overall rate of false positives (rather than a higher rate) in a test involving more than one comparison, the standards for each comparison must be more stringent. Intuitively, reducing the size of the allowable error (alpha) for each comparison by the number of comparisons will result in an overall alpha which does not exceed the desired limit, and this can be mathematically proved to be true. For instance, to obtain the usual alpha of 0.05 with ten comparisons, requires an alpha of 0.005 for each comparison to result in an overall alpha which does not exceed 0.05. However, it can be demonstrated that this technique is overly conservative, ''i.e.'' it will actually result in a true alpha significantly less than 0.05; thereby raising the proportion of false negatives, failing to identify an unnecessarily high percentage of actual significant differences in the data. This can have important real world consequences; for instance, it may result in failure to approve a Drug which is in fact superior to existing drugs, thereby both depriving the world of an improved therapy, and also causing the drug company to lose the substantial investment in research and development up to that point. Similarly in fMRI the test is extremely conservative since tests are done over 100000 voxels in the brain. This demands significance values should be unrealistically low. For this reason, there has been a great deal of attention paid to developing better techniques for multiple comparisons, such that the overall rate of false positives can be maintained without inflating the rate of false negatives unnecessarily. Such methods can be divided into general categories:
The advent of computerized Resampling methods, such as Bootstrapping and Monte Carlo Simulation s, has given rise to many techniques in the latter category. In some cases where exhaustive permutation resampling is performed, these tests provide exact, strong control of Type I error rates; in other cases, such as bootstrap sampling, they provide only approximate control. POST HOC TESTING OF ANOVAS Multiple comparison procedures are commonly used after obtaining a significant ANOVA F-test . The significant ANOVA result suggests rejecting the global null hypothesis H0 = "means are the same". Multiple comparison procedures are then used to determine which means are different from each other. Comparing ''K'' means involves ''K''(''K'' − 1)/2 pairwise comparisons. Non parametric Friedman Test is useful when doing multiple test on an hypothesis.
SEE ALSO ;Key concepts
;General methods of alpha adjustment for multiple comparisons
;Single-step procedures
;Two-step procedures
;Multi-step procedures based on Studentized Range statistic
;Bayesian methods BIBLIOGRAPHY
|