Skip to main content
Log in

Resampling-based multiple testing for microarray data analysis

  • Published:
Test Aims and scope Submit manuscript

Abstract

The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. Westfall and Young (1993) propose resampling-basedp-value adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of West-fall and Young (1993) and (b) the false discovery rate developed by Benjamini and Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002a), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjustedp-values for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O., andStaudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.Nature, 403:503–511.

    Article  Google Scholar 

  • Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., andLevine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.Proceedings of the National Academy of Sciences, 96:6745–6750.

    Article  Google Scholar 

  • Benjamini, Y. andBraun, H. (2002). John W. Tukey's contributions to multiple comparisons.The Annals of Statistics, 30(6):1576–1594.

    Article  MATH  MathSciNet  Google Scholar 

  • Benjamini, Y. andHochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society, Series B, 57:289–300.

    MATH  MathSciNet  Google Scholar 

  • Benjamini, Y. andHochberg, Y. (2000). The adaptive control of the false discovery rate in multiple hypotheses testing with independent statistics.Journal of Educational and Behavioral Statistics, 25(1):60–83.

    Article  Google Scholar 

  • Benjamini, Y. andYekutieli, D. (2001). The control of the false discovery rate in multiple hypothesis testing under dependency.The Annals of Statistics, 29(4):1165–1188.

    Article  MATH  MathSciNet  Google Scholar 

  • Beran, R. (1988). Balanced simultaneous confidence sets.Journal of the American Statistical Association, 83(403):679–686.

    Article  MATH  MathSciNet  Google Scholar 

  • Berry, D. (1988). Multiple comparisons, multiple tests, and data dredging: A bayesian perspective. In J. Bernardo, M. DeGroot, D. Lindley, and A. Smith, eds.,Bayesian Statistics, vol. 3, pp. 79–94. Oxford University Press.

  • Boldrick, J. C., Alizadeh, A. A., Diehn, M., Dudoit, S., Liu, C. L., Belcher, C. E., Botstein, D., Staudt, L. M., Brown, P. O., andRelman, D. A. (2002). Stereotyped and specific gene expression programs in human innate immune responses to bacteria.Proceedings of the National Academy of Sciences, 99(2):972–977.

    Article  Google Scholar 

  • Buckley, M. J. (2000).The Spot user's guide. CSIRO Mathematical and Information Sciences. http://www.cmis.csiro.au/IAP/Spot/spotmanual.htm.

  • Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P., andRubin, E. M. (2000). Microarray expression profiling identifies genes with altered expression in HDL deficient mice.Genome Research, 10(12):2022–2029.

    Article  Google Scholar 

  • DeRisi, J. L., Iyer, V. R., andBrown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale.Science, 278:680–685.

    Article  Google Scholar 

  • Dudoit, S., Shaffer, J. P., andBoldrick, J. C. (2002a). Multiple hypothesis testing in microarray experiments. Submitted, available UC Berkeley, Division Biostatistics working paper series: 2002-110, http://www.bepress.com/ucbbiostat/paper110.

  • Dudoit, S., Yang, Y. H., Callow, M. J., andSpeed, T. P. (2002b). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments.Statistica Sinica, 12(1):111–139.

    MATH  MathSciNet  Google Scholar 

  • Dunn, O. J. (1958). Estimation of the means of dependent variables.The Annals of Mathematical Statistics, 29:1095–1111.

    MathSciNet  Google Scholar 

  • Efron, B. andTibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays.Genetic Epidemiology, 23:70–86.

    Article  Google Scholar 

  • Efron, B., Tibshirani, R., Goss, V., andChu, G. (2000). Microarrays and their use in a comparative experiment. Tech. Rep. 37B/213, Department of Statistics, Stanford University.

  • Efron, B., Tibshirani, R., Storey, J. D., andTusher, V. (2001). Empirical Bayes analysis of a microarray experiment.Journal of the American Statistical Association, 96(456):1151–1160.

    Article  MATH  MathSciNet  Google Scholar 

  • Finner, H. andRoters, M. (2001). On the false discovery rate and expected type I errors.Biometrical Journal, 8:985–1005.

    Article  MathSciNet  Google Scholar 

  • Genovese, C. andWasserman, L. (2001). Operating characteristics and extensions of the FDR procedure.Journal of the Royal Statistical Society, Series B, 57:499–517.

    Google Scholar 

  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., andLander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.Science, 286:531–537.

    Article  Google Scholar 

  • Holm, S. (1979). A simple sequentially rejective multiple test procedure.Scandinavian Journal of Statistics, 6:65–70.

    MathSciNet  Google Scholar 

  • Ihaka, R. andGentleman, R. (1996). R: A language for data analysis and graphics.Journal of Computational and Graphical Statistics, 5(3):299–314.

    Article  Google Scholar 

  • Jogdeo, K. (1977). Association and probability inequalities.Annals of Statistics, 5(3):495–504.

    MATH  MathSciNet  Google Scholar 

  • Kendziorski, C., Newton, M., Lan, H., andGould, M. (2003). On parametric empirical bayes methods for comparing multiple groups using replicated gene expression profiles. In press.

  • Kerr, M. K., Martin, M., andChurchill, G. A. (2000). Analysis of variance for gene expression microarray data.Journal of Computational Biology, 7(6):819–837.

    Article  Google Scholar 

  • Korn, E. L., Troendle, J. F., McShane, L. M., andSimon, R. (2001). Controlling the number of false discoveries: Application to high dimensional genomic data. Tech. Rep. 003, National Cancer Institute, Division of Cancer Treatment and Diagnosis. http://linus.nci.nih.gov/~brb/TechReport.htm.

  • Lehmann, E. L. (1986).Testing Statistical Hypotheses. Springer Verlag, New York, 2nd ed.

    MATH  Google Scholar 

  • Lockhart, D. J., Dong, H. L., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., andBrown, E. L. (1996). Expression monitoring by hybridization to high-density oligonucleotide arrays.Nature Biotechnology, 14:1675–1680.

    Article  Google Scholar 

  • Manduchi, E., Grant, G. R., McKenzie, S. E., Overton, G. C., Surrey, S., andStoeckert Jr., C. J. (2000). Generation of patterns from gene expression data by assigning confidence to differentially expressed genes.Bioinformatics, 16:685–698.

    Article  Google Scholar 

  • Marcus, R., Peritz, E., andGabriel, K. R. (1976). On closed testing procedures with special reference to ordered analysis of variance.Biometrics, 63:655–660.

    Article  MATH  MathSciNet  Google Scholar 

  • Morton, N. E. (1955). Sequential the tests for detection of linkage.American Journal of Human Genetics, 7:277–318.

    Google Scholar 

  • Müller, P., Parmigiani, G., Robert, C., andRousseau, J. (2003). Optimal sample size for multiple testing: the case of gene expression microarrays, technical report, department of biostatistics. Tech. rep., The University of Texas M.D. Anderson Cancer Center.

  • Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., Pergamenschikov, A., Williams, C. F., Zhu, S. X., Lee, J. C. F., Lashkari, D., Shalon, D., Brown, P. O., andBotstein, D. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.Proceedings of the National Academy of Sciences, 96:9212–9217.

    Article  Google Scholar 

  • Pesarin, F. (2001).Multivariate permutation tests with applications in biostatistics, John Wiley and Sons, Chichester.

    Google Scholar 

  • Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D., andBrown, P. O. (1999). Genome-wide analysis of DNA copynumber changes using cDNA microarrays.Nature Genetics 23:41–46.

    Article  Google Scholar 

  • Pollard, K. andvan der Laan, M. (2002). Resampling-based methods for identification of significant subsets of genes in expression data.,U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 121, http://www.bepress.com/ucbbiostat.

  • Pollard, K. andvan der Laan, M. (2003). Parametric and nonparametric methods to identify significantly differentially expressed genes. Manuscript.

  • Puri, M., andSen, P. (1971).Nonparametric Methods in Multivariate Analysis. Wiley, New York.

    MATH  Google Scholar 

  • Ross, D. T., Scherf, U., Eisen, M. B., Perou, C. M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S. S., van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J. C. F., Lashkari, D., Shalon, D., Myers, T. G., Weinstein, J. N., Botstein, D., andBrown, P. O. (2000). Systematic variation in gene expression patterns in human cancer cell lines.Nature Genetics, 24:227–234.

    Article  Google Scholar 

  • Seeger, P. (1968). A note on a method for the analysis of significance en masse.Technometrics, 10(3):586–593.

    Article  MathSciNet  Google Scholar 

  • Shaffer, J. P. (1995). Multiple hypothesis testing.Annu. Rev. Psychol., 46:561–584.

    Article  Google Scholar 

  • Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions.Journal of the American Statistical Association, 62:626–633.

    Article  MathSciNet  Google Scholar 

  • Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance.Biometrika, 73(3):751–754.

    Article  MATH  MathSciNet  Google Scholar 

  • Sorić, B. (1989). Statistical “discoveries” and effect-size estimation.Journal of the American Statistical Association, 84(406):608–610.

    Article  Google Scholar 

  • Storey, J. D. (2001). The positive false discovery rate: A Bayesian interpretation and theq-value.Annals of Statistics. In press.

  • Storey, J. D. (2002a). A direct approach to false discovery rates.Journal of the Royal Statistical Society, Series B, 64:479–498.

    Article  MATH  MathSciNet  Google Scholar 

  • Storey, J. D. (2002b).False Discovery Rates: Theory and Applications to DNA Microarrays. Ph.D. thesis, Department of Statistics, Stanford University.

  • Storey, J. D., Taylor, J. E., andSiegmund, D. (2002). Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach.Journal of the Royal Statistical Society, Series B. In press.

  • Storey, J. D. andTibshirani, R. (2001). Estimating false discovery rates under dependence, with applications to DNA microarrays. Tech. Rep. 2001-28, Department of Statistics, Stanford University.

  • Tusher, V. G., Tibshirani, R. andChu, G. (2001). Significance analysis of microarrays applied to ionizing radiation response.Proceedings of the National Academy of Sciences, 98:5116–5121.

    Article  MATH  Google Scholar 

  • Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal.Biometrika, 29:350–362.

    MATH  Google Scholar 

  • Westfall, P., Krishen, A., andYoung, S. (1998). Using prior information to allocate significance levels for multiple endpoints.Statistics in Medicine, 17:12107–2119.

    Article  Google Scholar 

  • Westfall, P., Kropf, S., andFinos, L. (2003). Weighted fwecontrolling methods in high-dimensional situations Manuscript.

  • Westfall, P., Lin, Y., andYoung, S. (1989). A procedure for the analysis of multivariate binomial data with adjustments for multiplicity. InProceedings of the 14th Annual SAS ® User's Group International Conference, pp. 1385–1392.

  • Westfall, P., andSoper, K. (2001). Using priors to improve multiple animal carcinogenicity tests.Journal of the American Statistical Association, 96:827–834.

    Article  MATH  MathSciNet  Google Scholar 

  • Westfall, P., andWolfinger, R. (1997). Multiple tests with discrete distributions.The American Statistician, 51:3–8.

    Article  Google Scholar 

  • Westfall, P. H. andYoung, S. S. (1993).Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons, New York.

    Google Scholar 

  • Westfall, P. H., Zaykin, D. V., andYoung, S. S. (2001). Multiple tests for genetic effects in association studies. In S. Looney, ed.,Methods in Molecular Biology, Biostatistical Methods vol. 184: pp. 143–168. Humana Press, Toloway, NJ.

    Google Scholar 

  • Yekutieli, D., andBenjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics.Journal of Statistical Planning and Inference, 82:171–196.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youngchao Ge.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ge, Y., Dudoit, S. & Speed, T.P. Resampling-based multiple testing for microarray data analysis. Test 12, 1–77 (2003). https://doi.org/10.1007/BF02595811

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02595811

Key Words

AMS subject classification

Navigation