Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling

Abstract

Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning–based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the Raman spectroscopic analysis protocol.
Fig. 2: Detailed overview of the Raman spectroscopic analysis protocol.
Fig. 3: Workflow of the experimental design.
Fig. 4: Workflow of spectral preprocessing.
Fig. 5: Illustrative examples of spectra corrupted by different effects/artifacts.
Fig. 6: Results of baseline correction and the corresponding mean sensitivities from a three-group classification.
Fig. 7: Workflow of data learning. It starts from statistical sampling, which splits the whole dataset into training, validation and testing data.
Fig. 8: Workflow of data learning based on two-layer validation.
Fig. 9: Validation and testing results of the mice data.
Fig. 10: Mean spectra of single cell spectra from the group B. mycoides measured using the four devices.
Fig. 11: PCA score plot of bacteria spore spectra.
Fig. 12: Prediction of data from different devices in a leave-one-device-out CV with different model transfer methods.
Fig. 13: Prediction on the data of the first device based on the PLS regression or SVM classifiers.
Fig. 14: Results of relative Pearson’s correlation coefficients.

Similar content being viewed by others

Data availability

One example dataset used to demonstrate the protocol has been made openly accessible within the GitHub repository: https://github.com/Bocklitz-Lab/Example-Raman-spectral-analysis. Other data can be found with this protocol and the supporting primary research papers. Source data are provided with this paper.

Code availability

Code can be found in various open-source packages. One example analysis code can be found in GitHub: https://github.com/Bocklitz-Lab/Example-Raman-spectral-analysis.

References

  1. Popp, J. et al. Handbook of Biophotonics Vol. 1 (Wiley-VCH, 2011).

  2. McCreery, R. L. Raman Spectroscopy for Chemical Analysis Vol. 225 (John Wiley & Sons, 2005).

  3. Cheng, J.-X. & Xie, X. S. Vibrational spectroscopic imaging of living systems: an emerging platform for biology and medicine. Science 350, aaa8870 (2015).

    Article  PubMed  Google Scholar 

  4. Bocklitz, T. W. et al. Raman based molecular imaging and analytics: a magic bullet for biomedical applications!? Anal. Chem. 88, 133–151 (2016).

    Article  CAS  PubMed  Google Scholar 

  5. Lorenz, B. et al. Cultivation-free Raman spectroscopic investigations of bacteria. Trends Microbiol. 25, 413–424 (2017).

    Article  CAS  PubMed  Google Scholar 

  6. Liu, C.-Y. et al. Rapid bacterial antibiotic susceptibility test based on simple surface-enhanced Raman spectroscopic biomarkers. Sci. Rep. 6, 23375 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Prochazka, D. et al. Combination of laser-induced breakdown spectroscopy and Raman spectroscopy for multivariate classification of bacteria. Spectrochim. Acta B. Spectrosc. 139, 6–12 (2018).

    Article  CAS  Google Scholar 

  8. Silge, A. et al. The application of UV resonance Raman spectroscopy for the differentiation of clinically relevant Candida species. Anal. Bioanal. Chem. 410, 5839–5847 (2018).

    Article  CAS  PubMed  Google Scholar 

  9. Hanson, C. et al. Simultaneous isolation and label-free identification of bacteria using contactless dielectrophoresis and Raman spectroscopy. Electrophoresis 40, 1446–1456 (2019).

    Article  CAS  PubMed  Google Scholar 

  10. Van Nest, S. J. et al. Raman spectroscopy detects metabolic signatures of radiation response and hypoxic fluctuations in non-small cell lung cancer. BMC Cancer 19, 474 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Marro, M. et al. Unravelling the metabolic progression of breast cancer cells to bone metastasis by coupling Raman spectroscopy and a novel use of MCR-ALS algorithm. Anal. Chem. 90, 5594–5602 (2018).

    Article  CAS  PubMed  Google Scholar 

  12. Aljakouch, K. et al. Raman microspectroscopic evidence for the metabolism of a tyrosine kinase inhibitor, neratinib, in cancer cells. Angew. Chem. Int. Ed. 57, 7250–7254 (2018).

    Article  CAS  Google Scholar 

  13. Pence, I. & Mahadevan-Jansen, A. Clinical instrumentation and applications of Raman spectroscopy. Chem. Soc. Rev. 45, 1958–1979 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kong, K. et al. Raman spectroscopy for medical diagnostics—from in-vitro biofluid assays to in-vivo cancer detection. Adv. Drug Deliv. Rev. 89, 121–134 (2015).

    Article  CAS  PubMed  Google Scholar 

  15. Koo, K. M. et al. Design and clinical verification of surface-enhanced Raman spectroscopy diagnostic technology for individual cancer risk prediction. ACS Nano 12, 8362–8371 (2018).

    Article  CAS  PubMed  Google Scholar 

  16. Doty, K. C. & Lednev, I. K. Raman spectroscopy for forensic purposes: recent applications for serology and gunshot residue analysis. TrAC Trends Anal. Chem. 103, 215–222 (2018).

    Article  CAS  Google Scholar 

  17. Khandasammy, S. R. et al. Bloodstains, paintings, and drugs: Raman spectroscopy applications in forensic science. Forensic Chem. 8, 111–133 (2018).

    Article  CAS  Google Scholar 

  18. de Oliveira Penido, C. A. F. et al. Raman spectroscopy in forensic analysis: identification of cocaine and other illegal drugs of abuse. J. Raman Spectrosc. 47, 28–38 (2016).

    Article  Google Scholar 

  19. Guo, S., Ryabchykov, O., Ali, N., Houhou, R. & Bocklitz, T. Comprehensive chemometrics. in Comprehensive Chemometrics: Chemical and Biochemical Data Analysis (eds Brown, S. D. et al.) 333–360 (Elsevier, 2020).

  20. Ryabchykov, O., Guo, S. & Bocklitz, T. Analyzing Raman spectroscopic data. in Micro-Raman Spectroscopy: Theory and Application (eds Popp, J. & Mayerhöfer, T.) 81–106 (De Gruyter, 2020).

  21. Guo, S. et al. Comparability of Raman spectroscopic configurations: a large scale cross-laboratory study. Anal. Chem. 92, 15745–15756 (2020).

    Article  CAS  PubMed  Google Scholar 

  22. Morais, C. L. et al. Tutorial: multivariate classification for vibrational spectroscopy in biological samples. Nat. Protoc. 15, 2143–2162 (2020).

    Article  CAS  PubMed  Google Scholar 

  23. Baker, M. J. et al. Using Fourier transform IR spectroscopy to analyze biological materials. Nat. Protoc. 9, 1771 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Ryabchykov, O., Guo, S. & Bocklitz, T. Analyzing Raman spectroscopic data. Phys. Sci. Rev. https://doi.org/10.1515/psr-2017-0043 (2019).

  25. Butler, H. J. et al. Using Raman spectroscopy to characterize biological materials. Nat. Protoc. 11, 664 (2016).

    Article  CAS  PubMed  Google Scholar 

  26. Smith, E. & Dent, G. Modern Raman Spectroscopy: A Practical Approach (Wiley, 2019).

  27. Quinn, G. P. & Keough, M. J. Experimental Design and Data Analysis for Biologists (Cambridge University Press, 2002).

  28. Shreve, A. P., Cherepy, N. J. & Mathies, R. A. Effective rejection of fluorescence interference in Raman spectroscopy using a shifted excitation difference technique. Appl. Spectrosc. 46, 707–711 (1992).

    Article  CAS  Google Scholar 

  29. Zhao, J., Carrabba, M. M. & Allen, F. S. Automated fluorescence rejection using shifted excitation Raman difference spectroscopy. Appl. Spectrosc. 56, 834–845 (2002).

    Article  CAS  Google Scholar 

  30. Guo, S. et al. Spectral reconstruction for shifted-excitation Raman difference spectroscopy (SERDS). Talanta 186, 372–380 (2018).

    Article  CAS  PubMed  Google Scholar 

  31. Matousek, P. et al. Subsurface probing in diffusely scattering media using spatially offset Raman spectroscopy. Appl. Spectrosc. 59, 393–400 (2005).

    Article  CAS  PubMed  Google Scholar 

  32. Bocklitz, T. et al. Spectrometer calibration protocol for Raman spectra recorded with different excitation wavelengths. Spectrochim. Acta A Mol. Biomol. Spectrosc. 149, 544–549 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Dörfer, T. et al. Checking and improving calibration of Raman spectra using chemometric approaches. Z. Phys. Chem. 225, 753–764 (2011).

    Article  Google Scholar 

  34. ASTM E1840–96(2014): Standard Guide for Raman Shift Standards for Spectrometer Calibration (ASTM International, 2014).

  35. Carrabba, M. M. Wavenumber standards for Raman Spectrometry. in Handbook of Vibrational Spectroscopy Vol 1 (Wiley, 2006).

  36. Hajian-Tilaki, K. Sample size estimation in diagnostic test studies of biomedical informatics. J. Biomed. Inform. 48, 193–204 (2014).

    Article  PubMed  Google Scholar 

  37. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Gy, P. Sampling for Analytical Purposes (John Wiley & Sons, 1998).

  39. Saccenti, E. & Timmerman, M. E. Approaches to sample size determination for multivariate data: Applications to PCA and PLS-DA of omics data. J. Proteome Res. 15, 2379–2393 (2016).

    Article  CAS  PubMed  Google Scholar 

  40. Cohen, J. Statistical power analysis. Curr. Dir. Psychol. Sci. 1, 98–101 (1992).

    Article  Google Scholar 

  41. Nakagawa, S. & Cuthill, I. C. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. 82, 591–605 (2007).

    Article  PubMed  Google Scholar 

  42. Ali, N. et al. Sample-size planning for multivariate data: a Raman-spectroscopy-based example. Anal. Chem. 90, 12485–12492 (2018).

    Article  CAS  PubMed  Google Scholar 

  43. Beleites, C. et al. Sample size planning for classification models. Anal. Chim. Acta 760, 25–33 (2013).

    Article  CAS  PubMed  Google Scholar 

  44. Bocklitz, T. et al. How to pre-process Raman spectra for reliable and stable models? Anal. Chim. Acta 704, 47–56 (2011).

    Article  CAS  PubMed  Google Scholar 

  45. Heraud, P. et al. Effects of pre-processing of Raman spectra on in vivo classification of nutrient status of microalgal cells. J. Chemom. 20, 193–197 (2006).

    Article  CAS  Google Scholar 

  46. Penny, K. I. & Jolliffe, I. T. A comparison of multivariate outlier detection methods for clinical laboratory safety data. J. R. Stat. Soc. D. 50, 295–307 (2001).

    Google Scholar 

  47. Brownfield, B. & Kalivas, J. H. Consensus outlier detection using sum of ranking differences of common and new outlier measures without tuning parameter selections. Anal. Chem. 89, 5087–5094 (2017).

    Article  CAS  PubMed  Google Scholar 

  48. Ryabchykov, O. et al. Automatization of spike correction in Raman spectra of biological samples. Chemom. Intell. Lab. Syst. 155, 1–6 (2016).

    Article  CAS  Google Scholar 

  49. Guo, S. et al. Towards an improvement of model transferability for Raman spectroscopy in biological applications. Vib. Spectrosc. 91, 111–118 (2017).

    Article  CAS  Google Scholar 

  50. Bloemberg, T. G. et al. Warping methods for spectroscopic and chromatographic signal alignment: a tutorial. Anal. Chim. Acta 781, 14–32 (2013).

    Article  CAS  PubMed  Google Scholar 

  51. Tomasi, G., Van Den Berg, F. & Andersson, C. Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. 18, 231–241 (2004).

    Article  CAS  Google Scholar 

  52. Liu, Y.-J. et al. Multivariate statistical process control (MSPC) using Raman spectroscopy for in-line culture cell monitoring considering time-varying batches synchronized with correlation optimized warping (COW). Anal. Chim. Acta 952, 9–17 (2017).

    Article  CAS  PubMed  Google Scholar 

  53. Beier, B. D. & Berger, A. J. Method for automated background subtraction from Raman spectra containing known contaminants. Analyst 134, 1198–1202 (2009).

    Article  CAS  PubMed  Google Scholar 

  54. McLaughlin, G., Sikirzhytski, V. & Lednev, I. K. Circumventing substrate interference in the Raman spectroscopic identification of blood stains. Forensic Sci. Int. 231, 157–166 (2013).

    Article  CAS  PubMed  Google Scholar 

  55. McLaughlin, G. et al. Universal detection of body fluid traces in situ with Raman hyperspectroscopy for forensic purposes: evaluation of a new detection algorithm (HAMAND) using semen samples. J. Raman Spectrosc. 50, 1147–1153 (2019).

    Article  CAS  Google Scholar 

  56. Ryan, C. et al. SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrum. Methods Phys. Res. B 34, 396–402 (1988).

    Article  Google Scholar 

  57. Eilers, P. H. & Boelens, H. F. Baseline correction with asymmetric least squares smoothing. Leiden-. Univ. Med. Cent. Rep. 1, 5 (2005).

    Google Scholar 

  58. Lieber, C. A. & Mahadevan-Jansen, A. Automated method for subtraction of fluorescence from biological Raman spectra. Appl. Spectrosc. 57, 1363–1367 (2003).

    Article  CAS  PubMed  Google Scholar 

  59. Afseth, N. K. & Kohler, A. Extended multiplicative signal correction in vibrational spectroscopy, a tutorial. Chemom. Intell. Lab. Syst. 117, 92–99 (2012).

    Article  CAS  Google Scholar 

  60. Knorr, F., Smith, Z. J. & Wachsmann-Hogiu, S. Development of a time-gated system for Raman spectroscopy of biological samples. Opt. Express 18, 20049–20058 (2010).

    Article  CAS  PubMed  Google Scholar 

  61. Praveen, B. B. et al. Fluorescence suppression using wavelength modulated Raman spectroscopy in fiber-probe-based tissue analysis. J. Biomed. Opt. 17, 077006 (2012).

    Article  PubMed  Google Scholar 

  62. Engel, J. et al. Breaking with trends in pre-processing? TrAC Trends Anal. Chem. 50, 96–106 (2013).

    Article  CAS  Google Scholar 

  63. Gerretzen, J. et al. Boosting model performance and interpretation by entangling preprocessing selection and variable selection. Anal. Chim. Acta 938, 44–52 (2016).

    Article  CAS  PubMed  Google Scholar 

  64. Guo, S., Bocklitz, T. & Popp, J. Optimization of Raman-spectrum baseline correction in biological application. Analyst 141, 2396–2404 (2016).

    Article  CAS  PubMed  Google Scholar 

  65. Morishita, A., Imaging device and image processing program for estimating fixed pattern noise from partial noise output of available pixel area. Google Patents (2012).

  66. Brown, C. D. & Wentzell, P. D. Hazards of digital smoothing filters as a preprocessing tool in multivariate calibration. J. Chemom. 13, 133–152 (1999).

    Article  CAS  Google Scholar 

  67. Theodoridis, S. and Koutroumbas, K. Pattern Recognition 4th edn (Academic Press, 2008).

  68. Hastie, T. et al. The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27, 83–85 (2005).

    Article  Google Scholar 

  69. Guo, S. et al. Common mistakes in cross-validating classification models. Anal. Methods 9, 4410–4417 (2017).

    Article  Google Scholar 

  70. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence Vol. 2, 1137–1145 (1995).

  71. de Boves Harrington, P. Statistical validation of classification and calibration models using bootstrapped Latin partitions. TrAC Trends Anal. Chem. 25, 1112–1124 (2006).

    Article  Google Scholar 

  72. Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003).

    Google Scholar 

  73. Liu, J. et al. Deep convolutional neural networks for Raman spectrum recognition: a unified solution. Analyst 142, 4067–4074 (2017).

    Article  CAS  PubMed  Google Scholar 

  74. Hedegaard, M. et al. Spectral unmixing and clustering algorithms for assessment of single cells by Raman microscopic imaging. Theor. Chem. Acc. 130, 1249–1260 (2011).

    Article  CAS  Google Scholar 

  75. Nascimento, J. M. & Dias, J. M. Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43, 898–910 (2005).

    Article  Google Scholar 

  76. Li, R. & Wang, X. Dimension reduction of process dynamic trends using independent component analysis. Comput. Chem. Eng. 26, 467–473 (2002).

    Article  CAS  Google Scholar 

  77. Zhang, Z., Chow, T. W. & Zhao, M. M-Isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans. Cybern. 43, 180–191 (2012).

    Article  PubMed  Google Scholar 

  78. de Silva, V. & Tenenbaum, J. B. Global versus local methods in nonlinear dimensionality reduction. in Advances in Neural Information Processing Systems (2003).

  79. Shan, R., Cai, W. & Shao, X. Variable selection based on locally linear embedding mapping for near-infrared spectral analysis. Chemom. Intell. Lab. Syst. 131, 31–36 (2014).

    Article  CAS  Google Scholar 

  80. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    Article  CAS  PubMed  Google Scholar 

  81. Wold, S. Pattern recognition by means of disjoint principal components models. Pattern Recognit. 8, 127–139 (1976).

    Article  Google Scholar 

  82. Barker, M. & Rayens, W. Partial least squares for discrimination. J. Chemom. 17, 166–173 (2003).

    Article  CAS  Google Scholar 

  83. Copas, J. B. Regression, prediction and shrinkage. J. R. Stat. Soc. B Methodol. 45, 311–335 (1983).

    Google Scholar 

  84. Szymańska, E. et al. Chemometrics and qualitative analysis have a vibrant relationship. TrAC Trends Anal. Chem. 69, 34–51 (2015).

    Article  Google Scholar 

  85. Ballabio, D., Grisoni, F. & Todeschini, R. Multivariate comparison of classification performance measures. Chemom. Intell. Lab. Syst. 174, 33–44 (2018).

    Article  CAS  Google Scholar 

  86. Olivieri, A. C. Analytical figures of merit: from univariate to multiway calibration. Chem. Rev. 114, 5358–5378 (2014).

    Article  CAS  PubMed  Google Scholar 

  87. Petersen, L., Minkkinen, P. & Esbensen, K. H. Representative sampling for reliable data analysis: theory of sampling. Chemom. Intell. Lab. Syst. 77, 261–277 (2005).

    Article  CAS  Google Scholar 

  88. Esbensen, K. H. & Geladi, P. Principles of proper validation: use and abuse of re-sampling for validation. J. Chemom. 24, 168–187 (2010).

    Article  CAS  Google Scholar 

  89. Kalivas, J. H. et al. Calibration maintenance and transfer using Tikhonov regularization approaches. Appl. Spectrosc. 63, 800–809 (2009).

    Article  CAS  PubMed  Google Scholar 

  90. Fernández Pierna, J. et al. Standardization of NIR microscopy spectra obtained from inter-laboratory studies by using a standardization cell. Biotechnol. Agron. Soc. Environ. 17, 547–555 (2013).

    Google Scholar 

  91. Sjöblom, J. et al. An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra. Chemom. Intell. Lab. Syst. 44, 229–244 (1998).

    Article  Google Scholar 

  92. Wang, Y., Veltkamp, D. J. & Kowalski, B. R. Multivariate instrument standardization. Anal. Chem. 63, 2750–2756 (1991).

    Article  CAS  Google Scholar 

  93. Guo, S. et al. Model transfer for Raman-spectroscopy-based bacterial classification. J. Raman Spectrosc. 49, 627–637 (2018).

    Article  CAS  Google Scholar 

  94. Guo, S. et al. Extended multiplicative signal correction based model transfer for Raman spectroscopy in biological applications. Anal. Chem. 90, 9787–9795 (2018).

    Article  CAS  PubMed  Google Scholar 

  95. Morais, C. L. et al. Standardization of complex biologically derived spectrochemical datasets. Nat. Protoc. 14, 1546–1577 (2019).

    Article  CAS  PubMed  Google Scholar 

  96. Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936).

    Article  Google Scholar 

  97. Neugebauer, U. et al. Towards detection and identification of circulating tumour cells using Raman spectroscopy. Analyst 135, 3178–3182 (2010).

    Article  CAS  PubMed  Google Scholar 

  98. Stöckel, S. et al. Identification of Bacillus anthracis via Raman spectroscopy and chemometric approaches. Anal. Chem. 84, 9873–9880 (2012).

    Article  PubMed  Google Scholar 

  99. Vogler, N. et al. Systematic evaluation of the biological variance within the Raman based colorectal tissue diagnostics. J. Biophotonics 9, 533–541 (2016).

    Article  PubMed  Google Scholar 

  100. Kumar, B. N. V. et al. Demonstration of carbon catabolite repression in naphthalene degrading soil bacteria via Raman spectroscopy based stable isotope probing. Anal. Chem. 88, 7574–7582 (2016).

    Article  CAS  Google Scholar 

  101. Héberger, K. & Kollár-Hunek, K. Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers. J. Chemom. 25, 151–158 (2011).

    Article  Google Scholar 

Download references

Acknowledgements

The research in this contribution was supported by the Free State of Thuringia under the number 2019 FGR 0083 and cofinanced by European Union funds within the framework of the European Social Fund (ESF) via the TAB-FG MorphoTox. The authors highly acknowledge the financial support from the BMBF for the project LPI-BT1 (FKZ 13N15466) and the scholarship from China Scholarship Council (CSS) for SG. Part of the protocol relates to the NFDI4Chem project (441958208) funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Contributions

T.B. conceived the project. S.G., T.B. and J.P. performed the conception and design of the protocol. T.B. and J.P. oversaw the overall planning of the project. J.P. supervised the experimental part, while T.B. supervised the computational part. S.G. performed the computations and the data analysis. S.G. and T.B. wrote the first draft of the protocol. All authors discussed the results and contributed to the manuscript review.

Corresponding author

Correspondence to Thomas Bocklitz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Luiz Fernando Cappa De Oliveira, Igor Lednev and Alejandro Olivieri for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Ali, N. et al. Anal. Chem. 90, 12485–12492 (2018): https://pubs.acs.org/doi/10.1021/acs.analchem.8b02167

Neugebauer, U. et al. Analyst 135, 3178–3182 (2010): https://pubs.rsc.org/en/content/articlelanding/2010/AN/c0an00608d

Stöckel, S. et al. Anal. Chem. 84, 9873–9880 (2012): https://pubs.acs.org/doi/abs/10.1021/ac302250t

Vogler, N. et al. J. Biophoton. 9, 533–541 (2016): https://onlinelibrary.wiley.com/doi/10.1002/jbio.201500237

Butler, H. J. et al. Nat. Prot. 11, 664–687 (2016): https://doi.org/10.1038/nprot.2016.036

Extended data

Extended Data Fig. 1 An example of data structure.

The data is structured hierarchically following device-replicate-group. The calibration files are saved along with the sample spectra under the folder each group. The date and time information of the measurement is marked in file names in a format ‘ddmmyy_hhmmss’. The ‘Info’ files in each folder contain necessary records of the measurement.

Extended Data Fig. 2 Results of model validation and evaluation based on two dimensional reduction methods and different mechanisms of sampling for the bacterial dataset (Dataset 2).

The classification was performed using two dimension reduction methods and four classifiers in the framework of different sample sampling. Each box contains 9 values representing the mean sensitivity of the validation and testing results produced during the 9 iterations of the 9-fold/9-replicate external validation. The internal validation is considered unbiased if the testing and validation results are comparable, otherwise it is biased.

Extended Data Fig. 3 Results of model validation and evaluation based on two dimensional reduction methods and different mechanisms of sampling for the cell’s dataset (Dataset 1).

Each box contains 9 values representing the mean sensitivity of the validation and testing results produced during the 9 iterations of the 9-fold/9-replicate external validation. The internal validation is considered unbiased if the testing and validation results are comparable, otherwise it is biased.

Supplementary information

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, S., Popp, J. & Bocklitz, T. Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling. Nat Protoc 16, 5426–5459 (2021). https://doi.org/10.1038/s41596-021-00620-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-021-00620-3

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics