Abstract
Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning–based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
One example dataset used to demonstrate the protocol has been made openly accessible within the GitHub repository: https://github.com/Bocklitz-Lab/Example-Raman-spectral-analysis. Other data can be found with this protocol and the supporting primary research papers. Source data are provided with this paper.
Code availability
Code can be found in various open-source packages. One example analysis code can be found in GitHub: https://github.com/Bocklitz-Lab/Example-Raman-spectral-analysis.
References
Popp, J. et al. Handbook of Biophotonics Vol. 1 (Wiley-VCH, 2011).
McCreery, R. L. Raman Spectroscopy for Chemical Analysis Vol. 225 (John Wiley & Sons, 2005).
Cheng, J.-X. & Xie, X. S. Vibrational spectroscopic imaging of living systems: an emerging platform for biology and medicine. Science 350, aaa8870 (2015).
Bocklitz, T. W. et al. Raman based molecular imaging and analytics: a magic bullet for biomedical applications!? Anal. Chem. 88, 133–151 (2016).
Lorenz, B. et al. Cultivation-free Raman spectroscopic investigations of bacteria. Trends Microbiol. 25, 413–424 (2017).
Liu, C.-Y. et al. Rapid bacterial antibiotic susceptibility test based on simple surface-enhanced Raman spectroscopic biomarkers. Sci. Rep. 6, 23375 (2016).
Prochazka, D. et al. Combination of laser-induced breakdown spectroscopy and Raman spectroscopy for multivariate classification of bacteria. Spectrochim. Acta B. Spectrosc. 139, 6–12 (2018).
Silge, A. et al. The application of UV resonance Raman spectroscopy for the differentiation of clinically relevant Candida species. Anal. Bioanal. Chem. 410, 5839–5847 (2018).
Hanson, C. et al. Simultaneous isolation and label-free identification of bacteria using contactless dielectrophoresis and Raman spectroscopy. Electrophoresis 40, 1446–1456 (2019).
Van Nest, S. J. et al. Raman spectroscopy detects metabolic signatures of radiation response and hypoxic fluctuations in non-small cell lung cancer. BMC Cancer 19, 474 (2019).
Marro, M. et al. Unravelling the metabolic progression of breast cancer cells to bone metastasis by coupling Raman spectroscopy and a novel use of MCR-ALS algorithm. Anal. Chem. 90, 5594–5602 (2018).
Aljakouch, K. et al. Raman microspectroscopic evidence for the metabolism of a tyrosine kinase inhibitor, neratinib, in cancer cells. Angew. Chem. Int. Ed. 57, 7250–7254 (2018).
Pence, I. & Mahadevan-Jansen, A. Clinical instrumentation and applications of Raman spectroscopy. Chem. Soc. Rev. 45, 1958–1979 (2016).
Kong, K. et al. Raman spectroscopy for medical diagnostics—from in-vitro biofluid assays to in-vivo cancer detection. Adv. Drug Deliv. Rev. 89, 121–134 (2015).
Koo, K. M. et al. Design and clinical verification of surface-enhanced Raman spectroscopy diagnostic technology for individual cancer risk prediction. ACS Nano 12, 8362–8371 (2018).
Doty, K. C. & Lednev, I. K. Raman spectroscopy for forensic purposes: recent applications for serology and gunshot residue analysis. TrAC Trends Anal. Chem. 103, 215–222 (2018).
Khandasammy, S. R. et al. Bloodstains, paintings, and drugs: Raman spectroscopy applications in forensic science. Forensic Chem. 8, 111–133 (2018).
de Oliveira Penido, C. A. F. et al. Raman spectroscopy in forensic analysis: identification of cocaine and other illegal drugs of abuse. J. Raman Spectrosc. 47, 28–38 (2016).
Guo, S., Ryabchykov, O., Ali, N., Houhou, R. & Bocklitz, T. Comprehensive chemometrics. in Comprehensive Chemometrics: Chemical and Biochemical Data Analysis (eds Brown, S. D. et al.) 333–360 (Elsevier, 2020).
Ryabchykov, O., Guo, S. & Bocklitz, T. Analyzing Raman spectroscopic data. in Micro-Raman Spectroscopy: Theory and Application (eds Popp, J. & Mayerhöfer, T.) 81–106 (De Gruyter, 2020).
Guo, S. et al. Comparability of Raman spectroscopic configurations: a large scale cross-laboratory study. Anal. Chem. 92, 15745–15756 (2020).
Morais, C. L. et al. Tutorial: multivariate classification for vibrational spectroscopy in biological samples. Nat. Protoc. 15, 2143–2162 (2020).
Baker, M. J. et al. Using Fourier transform IR spectroscopy to analyze biological materials. Nat. Protoc. 9, 1771 (2014).
Ryabchykov, O., Guo, S. & Bocklitz, T. Analyzing Raman spectroscopic data. Phys. Sci. Rev. https://doi.org/10.1515/psr-2017-0043 (2019).
Butler, H. J. et al. Using Raman spectroscopy to characterize biological materials. Nat. Protoc. 11, 664 (2016).
Smith, E. & Dent, G. Modern Raman Spectroscopy: A Practical Approach (Wiley, 2019).
Quinn, G. P. & Keough, M. J. Experimental Design and Data Analysis for Biologists (Cambridge University Press, 2002).
Shreve, A. P., Cherepy, N. J. & Mathies, R. A. Effective rejection of fluorescence interference in Raman spectroscopy using a shifted excitation difference technique. Appl. Spectrosc. 46, 707–711 (1992).
Zhao, J., Carrabba, M. M. & Allen, F. S. Automated fluorescence rejection using shifted excitation Raman difference spectroscopy. Appl. Spectrosc. 56, 834–845 (2002).
Guo, S. et al. Spectral reconstruction for shifted-excitation Raman difference spectroscopy (SERDS). Talanta 186, 372–380 (2018).
Matousek, P. et al. Subsurface probing in diffusely scattering media using spatially offset Raman spectroscopy. Appl. Spectrosc. 59, 393–400 (2005).
Bocklitz, T. et al. Spectrometer calibration protocol for Raman spectra recorded with different excitation wavelengths. Spectrochim. Acta A Mol. Biomol. Spectrosc. 149, 544–549 (2015).
Dörfer, T. et al. Checking and improving calibration of Raman spectra using chemometric approaches. Z. Phys. Chem. 225, 753–764 (2011).
ASTM E1840–96(2014): Standard Guide for Raman Shift Standards for Spectrometer Calibration (ASTM International, 2014).
Carrabba, M. M. Wavenumber standards for Raman Spectrometry. in Handbook of Vibrational Spectroscopy Vol 1 (Wiley, 2006).
Hajian-Tilaki, K. Sample size estimation in diagnostic test studies of biomedical informatics. J. Biomed. Inform. 48, 193–204 (2014).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Gy, P. Sampling for Analytical Purposes (John Wiley & Sons, 1998).
Saccenti, E. & Timmerman, M. E. Approaches to sample size determination for multivariate data: Applications to PCA and PLS-DA of omics data. J. Proteome Res. 15, 2379–2393 (2016).
Cohen, J. Statistical power analysis. Curr. Dir. Psychol. Sci. 1, 98–101 (1992).
Nakagawa, S. & Cuthill, I. C. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. 82, 591–605 (2007).
Ali, N. et al. Sample-size planning for multivariate data: a Raman-spectroscopy-based example. Anal. Chem. 90, 12485–12492 (2018).
Beleites, C. et al. Sample size planning for classification models. Anal. Chim. Acta 760, 25–33 (2013).
Bocklitz, T. et al. How to pre-process Raman spectra for reliable and stable models? Anal. Chim. Acta 704, 47–56 (2011).
Heraud, P. et al. Effects of pre-processing of Raman spectra on in vivo classification of nutrient status of microalgal cells. J. Chemom. 20, 193–197 (2006).
Penny, K. I. & Jolliffe, I. T. A comparison of multivariate outlier detection methods for clinical laboratory safety data. J. R. Stat. Soc. D. 50, 295–307 (2001).
Brownfield, B. & Kalivas, J. H. Consensus outlier detection using sum of ranking differences of common and new outlier measures without tuning parameter selections. Anal. Chem. 89, 5087–5094 (2017).
Ryabchykov, O. et al. Automatization of spike correction in Raman spectra of biological samples. Chemom. Intell. Lab. Syst. 155, 1–6 (2016).
Guo, S. et al. Towards an improvement of model transferability for Raman spectroscopy in biological applications. Vib. Spectrosc. 91, 111–118 (2017).
Bloemberg, T. G. et al. Warping methods for spectroscopic and chromatographic signal alignment: a tutorial. Anal. Chim. Acta 781, 14–32 (2013).
Tomasi, G., Van Den Berg, F. & Andersson, C. Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. 18, 231–241 (2004).
Liu, Y.-J. et al. Multivariate statistical process control (MSPC) using Raman spectroscopy for in-line culture cell monitoring considering time-varying batches synchronized with correlation optimized warping (COW). Anal. Chim. Acta 952, 9–17 (2017).
Beier, B. D. & Berger, A. J. Method for automated background subtraction from Raman spectra containing known contaminants. Analyst 134, 1198–1202 (2009).
McLaughlin, G., Sikirzhytski, V. & Lednev, I. K. Circumventing substrate interference in the Raman spectroscopic identification of blood stains. Forensic Sci. Int. 231, 157–166 (2013).
McLaughlin, G. et al. Universal detection of body fluid traces in situ with Raman hyperspectroscopy for forensic purposes: evaluation of a new detection algorithm (HAMAND) using semen samples. J. Raman Spectrosc. 50, 1147–1153 (2019).
Ryan, C. et al. SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrum. Methods Phys. Res. B 34, 396–402 (1988).
Eilers, P. H. & Boelens, H. F. Baseline correction with asymmetric least squares smoothing. Leiden-. Univ. Med. Cent. Rep. 1, 5 (2005).
Lieber, C. A. & Mahadevan-Jansen, A. Automated method for subtraction of fluorescence from biological Raman spectra. Appl. Spectrosc. 57, 1363–1367 (2003).
Afseth, N. K. & Kohler, A. Extended multiplicative signal correction in vibrational spectroscopy, a tutorial. Chemom. Intell. Lab. Syst. 117, 92–99 (2012).
Knorr, F., Smith, Z. J. & Wachsmann-Hogiu, S. Development of a time-gated system for Raman spectroscopy of biological samples. Opt. Express 18, 20049–20058 (2010).
Praveen, B. B. et al. Fluorescence suppression using wavelength modulated Raman spectroscopy in fiber-probe-based tissue analysis. J. Biomed. Opt. 17, 077006 (2012).
Engel, J. et al. Breaking with trends in pre-processing? TrAC Trends Anal. Chem. 50, 96–106 (2013).
Gerretzen, J. et al. Boosting model performance and interpretation by entangling preprocessing selection and variable selection. Anal. Chim. Acta 938, 44–52 (2016).
Guo, S., Bocklitz, T. & Popp, J. Optimization of Raman-spectrum baseline correction in biological application. Analyst 141, 2396–2404 (2016).
Morishita, A., Imaging device and image processing program for estimating fixed pattern noise from partial noise output of available pixel area. Google Patents (2012).
Brown, C. D. & Wentzell, P. D. Hazards of digital smoothing filters as a preprocessing tool in multivariate calibration. J. Chemom. 13, 133–152 (1999).
Theodoridis, S. and Koutroumbas, K. Pattern Recognition 4th edn (Academic Press, 2008).
Hastie, T. et al. The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27, 83–85 (2005).
Guo, S. et al. Common mistakes in cross-validating classification models. Anal. Methods 9, 4410–4417 (2017).
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence Vol. 2, 1137–1145 (1995).
de Boves Harrington, P. Statistical validation of classification and calibration models using bootstrapped Latin partitions. TrAC Trends Anal. Chem. 25, 1112–1124 (2006).
Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003).
Liu, J. et al. Deep convolutional neural networks for Raman spectrum recognition: a unified solution. Analyst 142, 4067–4074 (2017).
Hedegaard, M. et al. Spectral unmixing and clustering algorithms for assessment of single cells by Raman microscopic imaging. Theor. Chem. Acc. 130, 1249–1260 (2011).
Nascimento, J. M. & Dias, J. M. Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43, 898–910 (2005).
Li, R. & Wang, X. Dimension reduction of process dynamic trends using independent component analysis. Comput. Chem. Eng. 26, 467–473 (2002).
Zhang, Z., Chow, T. W. & Zhao, M. M-Isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans. Cybern. 43, 180–191 (2012).
de Silva, V. & Tenenbaum, J. B. Global versus local methods in nonlinear dimensionality reduction. in Advances in Neural Information Processing Systems (2003).
Shan, R., Cai, W. & Shao, X. Variable selection based on locally linear embedding mapping for near-infrared spectral analysis. Chemom. Intell. Lab. Syst. 131, 31–36 (2014).
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Wold, S. Pattern recognition by means of disjoint principal components models. Pattern Recognit. 8, 127–139 (1976).
Barker, M. & Rayens, W. Partial least squares for discrimination. J. Chemom. 17, 166–173 (2003).
Copas, J. B. Regression, prediction and shrinkage. J. R. Stat. Soc. B Methodol. 45, 311–335 (1983).
Szymańska, E. et al. Chemometrics and qualitative analysis have a vibrant relationship. TrAC Trends Anal. Chem. 69, 34–51 (2015).
Ballabio, D., Grisoni, F. & Todeschini, R. Multivariate comparison of classification performance measures. Chemom. Intell. Lab. Syst. 174, 33–44 (2018).
Olivieri, A. C. Analytical figures of merit: from univariate to multiway calibration. Chem. Rev. 114, 5358–5378 (2014).
Petersen, L., Minkkinen, P. & Esbensen, K. H. Representative sampling for reliable data analysis: theory of sampling. Chemom. Intell. Lab. Syst. 77, 261–277 (2005).
Esbensen, K. H. & Geladi, P. Principles of proper validation: use and abuse of re-sampling for validation. J. Chemom. 24, 168–187 (2010).
Kalivas, J. H. et al. Calibration maintenance and transfer using Tikhonov regularization approaches. Appl. Spectrosc. 63, 800–809 (2009).
Fernández Pierna, J. et al. Standardization of NIR microscopy spectra obtained from inter-laboratory studies by using a standardization cell. Biotechnol. Agron. Soc. Environ. 17, 547–555 (2013).
Sjöblom, J. et al. An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra. Chemom. Intell. Lab. Syst. 44, 229–244 (1998).
Wang, Y., Veltkamp, D. J. & Kowalski, B. R. Multivariate instrument standardization. Anal. Chem. 63, 2750–2756 (1991).
Guo, S. et al. Model transfer for Raman-spectroscopy-based bacterial classification. J. Raman Spectrosc. 49, 627–637 (2018).
Guo, S. et al. Extended multiplicative signal correction based model transfer for Raman spectroscopy in biological applications. Anal. Chem. 90, 9787–9795 (2018).
Morais, C. L. et al. Standardization of complex biologically derived spectrochemical datasets. Nat. Protoc. 14, 1546–1577 (2019).
Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936).
Neugebauer, U. et al. Towards detection and identification of circulating tumour cells using Raman spectroscopy. Analyst 135, 3178–3182 (2010).
Stöckel, S. et al. Identification of Bacillus anthracis via Raman spectroscopy and chemometric approaches. Anal. Chem. 84, 9873–9880 (2012).
Vogler, N. et al. Systematic evaluation of the biological variance within the Raman based colorectal tissue diagnostics. J. Biophotonics 9, 533–541 (2016).
Kumar, B. N. V. et al. Demonstration of carbon catabolite repression in naphthalene degrading soil bacteria via Raman spectroscopy based stable isotope probing. Anal. Chem. 88, 7574–7582 (2016).
Héberger, K. & Kollár-Hunek, K. Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers. J. Chemom. 25, 151–158 (2011).
Acknowledgements
The research in this contribution was supported by the Free State of Thuringia under the number 2019 FGR 0083 and cofinanced by European Union funds within the framework of the European Social Fund (ESF) via the TAB-FG MorphoTox. The authors highly acknowledge the financial support from the BMBF for the project LPI-BT1 (FKZ 13N15466) and the scholarship from China Scholarship Council (CSS) for SG. Part of the protocol relates to the NFDI4Chem project (441958208) funded by the German Research Foundation (DFG).
Author information
Authors and Affiliations
Contributions
T.B. conceived the project. S.G., T.B. and J.P. performed the conception and design of the protocol. T.B. and J.P. oversaw the overall planning of the project. J.P. supervised the experimental part, while T.B. supervised the computational part. S.G. performed the computations and the data analysis. S.G. and T.B. wrote the first draft of the protocol. All authors discussed the results and contributed to the manuscript review.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Protocols thanks Luiz Fernando Cappa De Oliveira, Igor Lednev and Alejandro Olivieri for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Ali, N. et al. Anal. Chem. 90, 12485–12492 (2018): https://pubs.acs.org/doi/10.1021/acs.analchem.8b02167
Neugebauer, U. et al. Analyst 135, 3178–3182 (2010): https://pubs.rsc.org/en/content/articlelanding/2010/AN/c0an00608d
Stöckel, S. et al. Anal. Chem. 84, 9873–9880 (2012): https://pubs.acs.org/doi/abs/10.1021/ac302250t
Vogler, N. et al. J. Biophoton. 9, 533–541 (2016): https://onlinelibrary.wiley.com/doi/10.1002/jbio.201500237
Butler, H. J. et al. Nat. Prot. 11, 664–687 (2016): https://doi.org/10.1038/nprot.2016.036
Extended data
Extended Data Fig. 1 An example of data structure.
The data is structured hierarchically following device-replicate-group. The calibration files are saved along with the sample spectra under the folder each group. The date and time information of the measurement is marked in file names in a format ‘ddmmyy_hhmmss’. The ‘Info’ files in each folder contain necessary records of the measurement.
Extended Data Fig. 2 Results of model validation and evaluation based on two dimensional reduction methods and different mechanisms of sampling for the bacterial dataset (Dataset 2).
The classification was performed using two dimension reduction methods and four classifiers in the framework of different sample sampling. Each box contains 9 values representing the mean sensitivity of the validation and testing results produced during the 9 iterations of the 9-fold/9-replicate external validation. The internal validation is considered unbiased if the testing and validation results are comparable, otherwise it is biased.
Extended Data Fig. 3 Results of model validation and evaluation based on two dimensional reduction methods and different mechanisms of sampling for the cell’s dataset (Dataset 1).
Each box contains 9 values representing the mean sensitivity of the validation and testing results produced during the 9 iterations of the 9-fold/9-replicate external validation. The internal validation is considered unbiased if the testing and validation results are comparable, otherwise it is biased.
Supplementary information
Source data
Source Data Fig. 5
Source data.
Source Data Fig. 6
Source data.
Source Data Fig. 9
Source data.
Source Data Fig. 12
Source data.
Source Data Fig. 13
Source data.
Source Data Fig. 14
Source data.
Source Data Extented Data Fig. 2
Source data.
Source Data Extented Data Fig. 3
Source data.
Rights and permissions
About this article
Cite this article
Guo, S., Popp, J. & Bocklitz, T. Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling. Nat Protoc 16, 5426–5459 (2021). https://doi.org/10.1038/s41596-021-00620-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-021-00620-3
This article is cited by
-
Noise learning of instruments for high-contrast, high-resolution and fast hyperspectral microscopy and nanoscopy
Nature Communications (2024)
-
Identification of microplastic fibres released from COVID-19 test swabs with Raman imaging
Environmental Sciences Europe (2023)
-
Electrochemical surface-enhanced Raman spectroscopy
Nature Reviews Methods Primers (2023)
-
Non-invasive monitoring of T cell differentiation through Raman spectroscopy
Scientific Reports (2023)
-
Raman spectroscopy and convolutional neural networks for monitoring biochemical radiation response in breast tumour xenografts
Scientific Reports (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.