Statistical Analysis of Metabolomics Data

De Livera, Alysha M.; Olshansky, Moshe; Speed, Terence P.

doi:10.1007/978-1-62703-577-4_20

Alysha M. De Livera⁴,
Moshe Olshansky⁵ &
Terence P. Speed⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1055))

6709 Accesses
18 Citations

Abstract

Statistical matters form an integral part of a metabolomics experiment. In this chapter we describe several important aspects in the analysis of metabolomics data such as the removal of unwanted variation and the identification of differentially abundant metabolites, along with a number of other essential statistical considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fiehn O (2002) Metabolomics—the link between genotypes and phenotypes. Plant Mol Biol 48:155–171
Article PubMed CAS Google Scholar
Roessner U, Bowne J (2009) What is metabolomics all about? Biotechniques 46(5):363–365
Article PubMed CAS Google Scholar
Roessner U, Beckles DM (2009) Metabolite measurements. Springer, New York
Google Scholar
De Livera AM, Dias DA, De Souza D, Rupasinghe T, Pyke J, Tull D, Roessner U, McConville M, Speed TP (2012) Normalising and integrating metabolomics data. Anal Chem 84(24):10768–10776. DOI:10.1021/ac302748b
Google Scholar
Glass DJ (2007) Experimental design for biologists. Cold Spring Harbor Laboratory, New York
Google Scholar
Montgomery DC (2008) Design and analysis of experiments. Wiley, Hoboken
Google Scholar
O’Callaghan S, Desouza DP, Isaac A, Wang Q, Hodkinson L, Olshansky M, Erwin T, Appelbe B, Tull DL, Roessner U, Bacic A, McConville MJ, Likic VA (2012) PyMS: a Python toolkit for processing of gas chromatography–mass spectrometry (GC–MS) data. Application and comparative study of selected tools. BMC Bioinformatics 13(1):115
Google Scholar
Schleif F-M (2007) Preprocessing of nuclear magnetic resonance spectrometry data. Technical report, August 2007
Google Scholar
Katajamaa M, Orešič M (2007) Data processing for mass spectrometry-based metabolomics. J Chromatogr A 1158:318–328
Article PubMed CAS Google Scholar
Xia J, Psychogios N, Young N, Wishart DS (2009) MetaboAnalyst: a web server for metabolomic data analysis and interpretation. Nucleic Acids Res 37:W652–W660
Article PubMed CAS Google Scholar
Hrydziuszko O, Viant MR (2012) Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline. Metabolomics 8(1):161–174
Article CAS Google Scholar
Katajamaa M, Oresic M (2005) Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics 6:179
Article PubMed Google Scholar
Steuer R, Morgenthal K, Weckwerth W, Selbig J (2007) A gentle guide to the analysis of metabolomic data. Methods Mol Biol (Clifton, NJ) 358:105–126
Google Scholar
Smilde AK, van der Werf MJ, Bijlsma S, van der Werff-van der Vat BJC, Jellema RH (2005) Fusion of mass spectrometry-based metabolomics data. Anal Chem 77(20):6729–6736
Google Scholar
van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7:142
Article PubMed Google Scholar
Temmerman L, De Livera AM, Bowne J, Sheedy RJ, Callahan DL, Nahid A, De Souza DP, Schoofs L, Tull DL, McConville JM, Roessner U, Wentworth JM (2012) Cross-platform urine metabolomics of experimental hyperglycemia in type 2 diabetes. Diab Metab vol S6:002. DOI:10.4172/2155-6156.S6-002
Google Scholar
Roessner U, Nahid A, Chapman B, Hunter A, Bellgard M (2011) Metabolomics—the combination of analytical biochemistry, biology, and informatics, vol 1, 2nd edn. Elsevier B.V., New York
Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics (Oxford, England) 17(6):520–525
Article CAS Google Scholar
Oba S, Sato M, Takemasa I, Monden M, Matsubara K, Ishii S (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096
Article PubMed CAS Google Scholar
van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Static Softw 45(3):1–67
Google Scholar
Goodacre R, Broadhurst D, Smilde AK, Kristal BS, Baker JD, Beger R, Bessant C, Connor S, Capuani G, Craig A, Ebbels T, Kell DB, Manetti C, Newton J, Paternostro G, Somorjai R, Sjöström M, Trygg J, Wulfert F (2007) Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 3(3):231–241
Article CAS Google Scholar
Schlesselman J (1971) Power families: a note on the Box and Cox transformation. J R Stat Soc Ser B (Methodol) 307–311
Google Scholar
Callahan DL, Roessner U, Dumontet V, De Livera AM, Doronila A, Baker AJM, Kolev SD (2012) Elemental and metabolite profiling of nickel hyperaccumulators from New Caledonia. Phytochemistry 81:80–89
Article PubMed CAS Google Scholar
Gullberg J, Jonsson P, Nordström A, Sjöström M, Moritz T (2004) Design of experiments: an efficient strategy to identify factors influencing extraction and derivatization of Arabidopsis thaliana samples in metabolomic studies with gas chromatography/mass spectrometry. Anal Biochem 331(2):283–295
Article PubMed CAS Google Scholar
Bijlsma S, Bobeldijk I, Verheij ER, Ramaker R, Kochhar S, Macdonald I, Van Ommen B, Smilde AK (2006) Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. Anal Chem 78(2):567–574
Article PubMed CAS Google Scholar
Redestig H, Fukushima A, Stenlund H, Moritz T, Arita M, Saito K, Kusano M (2009) Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. Anal Chem 81(19):7974–7980
Article PubMed CAS Google Scholar
Sysi-Aho M, Katajamaa M, Laxman Y, Oresic M (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8:93
Article PubMed Google Scholar
Crawford LR, Morrison JD (1968) Computer methods in analytical mass spectrometry. Identification of an unknown compound in a catalog. Anal Chem 40(4):1464–1469
CAS Google Scholar
Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR, Norton S, Kumar P, Anderle M, Becker CH (2003) Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem 75(18):481848–26
Article Google Scholar
Scholz M, Gatzek S, Sterling A, Fiehn O, Selbig J (2004) Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics (Oxford, England) 20(15):2447–2454
Article CAS Google Scholar
Cairns DA, Thompson D, Perkins DN, Stanley AJ, Selby PJ, Banks RE (2008) Proteomic profiling using mass spectrometry—does normalising by total ion current potentially mask some biological differences? Proteomics 8(1):21–27
Article PubMed CAS Google Scholar
Gika HG, Macpherson E, Theodoridis GA, Wilson ID (2008) Evaluation of the repeatability of ultra-performance liquid chromatography-TOF-MS for global metabolic profiling of human urine samples. J Chromatogr B Anal Technol Biomed Life Sci 871(2):299–305
Article CAS Google Scholar
Zelena E, Dunn WB, Broadhurst D, Francis-McIntyre S, Carroll KM, Begley P, O’Hagan S, Knowles JD, Halsall A, Wilson ID, Kell DB (2009) Development of a robust and repeatable UPLC-MS method for the long-term metabolomic study of human serum. Anal Chem 81(4):1357–1364
Article PubMed CAS Google Scholar
Lai L, Michopoulos F, Gika H, Theodoridis G, Wilkinson RW, Odedra R, Wingate J, Bonner R, Tate S, Wilson ID (2010) Methodological considerations in the development of HPLC-MS methods for the analysis of rodent plasma for metabolomic studies. Mol Biosyst 6(1):108–120
Article PubMed CAS Google Scholar
Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N, Brown M, Knowles JD, Halsall A, Haselden JN, Nicholls AW, Wilson ID, Kell DB, Goodacre R (2011) Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc 6(7):1060–1083
Article PubMed CAS Google Scholar
Kamleh MA, Ebbels TMD, Spagou K, Masson P, Want EJ (2012) Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. Anal Chem 84(6):2670–2677
Article PubMed CAS Google Scholar
Gagnon-Bartsch JA, Speed TP (2011) Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3):539–552
Article PubMed Google Scholar
Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3(9):1724–1735
Article PubMed CAS Google Scholar
Leek JT, Storey JD (2008) A general framework for multiple testing dependence. Proc Natl Acad Sci USA 105(48):18718–18723
Article PubMed CAS Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116
Article PubMed CAS Google Scholar
Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102(477):93–103
Article CAS Google Scholar
Lonnstedt I, Speed TP (2002) Replicated microarray data. Stat Sin 12:31–46
Google Scholar
Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1):1544–6115
Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70
Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Google Scholar
Westfall PH, Young SS (1993) Resampling-based multiple testing: examples and methods for p-value adjustment. Wiley-Interscience, New York
Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Article Google Scholar
Storey JD, Tibshirani R (2001) Estimating false discovery rates under dependence, with applications to DNA microarrays. Technical report
Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, 2nd edn. Springer, New York
Google Scholar
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Article Google Scholar
Wold S, Sjostrom M (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58(2):109–130
Article CAS Google Scholar
Vapnik V (1999) The nature of statistical learning theory. Springer, Berlin
Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group, Belmont
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Google Scholar
Cox TF, Cox MAA (2001) Multidimensional scaling. Chapman and Hall, Boca Raton
Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297
Google Scholar
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Article Google Scholar
De Livera AM, Bowne J (2013) metabolomics: A collection of functions for analysing metabolomics data. R package version 0.1.1
Google Scholar

Download references

Author information

Authors and Affiliations

Metabolomics Australia, Bio21 Institute (Molecular Science and Biotechnology Institute), The University of Melbourne, Melbourne, Australia
Alysha M. De Livera
Bioinformatics Division, Walter and Eliza Hall Institute, Parkville, VIC, Australia
Moshe Olshansky & Terence P. Speed

Authors

Alysha M. De Livera
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Olshansky
View author publications
You can also search for this author in PubMed Google Scholar
Terence P. Speed
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Parkville, Australia
Ute Roessner
School of Botany, The University of Melbourne, Parkville, Victoria, Australia
Daniel Anthony Dias

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

De Livera, A.M., Olshansky, M., Speed, T.P. (2013). Statistical Analysis of Metabolomics Data. In: Roessner, U., Dias, D. (eds) Metabolomics Tools for Natural Product Discovery. Methods in Molecular Biology, vol 1055. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-577-4_20

Download citation

DOI: https://doi.org/10.1007/978-1-62703-577-4_20
Published: 22 July 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-576-7
Online ISBN: 978-1-62703-577-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics