Abstract
The execution and analysis of complex experiments are challenged by the vast dimensionality of the underlying parameter spaces. Although an increase in data-acquisition rates should allow broader querying of the parameter space, the complexity of experiments and the subtle dependence of the model function on input parameters remains daunting owing to the sheer number of variables. New strategies for autonomous data acquisition are being developed, with one promising direction being the use of Gaussian process regression (GPR). GPR is a quick, non-parametric and robust approximation and uncertainty quantification method that can be applied directly to autonomous data acquisition. We review GPR-driven autonomous experimentation and illustrate its functionality using real-world examples from large experimental facilities in the USA and France. We introduce the basics of a GPR-driven autonomous loop with a focus on Gaussian processes, and then shift the focus to the infrastructure that needs to be built around GPR to create a closed loop. Finally, the case studies we discuss show that Gaussian-process-based autonomous data acquisition is a widely applicable method that can facilitate the optimal use of instruments and facilities by enabling the efficient acquisition of high-value datasets.
Key points
-
Gaussian process regression (GPR) is a robust statistical, non-parametric technique for uncertainty quantification and function approximation.
-
GPR can directly be applied to autonomous and optimal data acquisition.
-
GPR provides straightforward ways to inject domain knowledge and can easily be customized for feature finding.
-
The gpCAM software tool provides a simple way for practitioners to use GPR for autonomous experimentation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Code availability
The gpCAM code for autonomous steering associated with this Review is available at https://doi.org/10.11578/dc.20210217.5 and https://bitbucket.org/MarcusMichaelNoack/gpcam and via pip install gpCAM. Any updates will be published in the repository and on the Python package index (PyPi). The Takin software is available at https://doi.org/10.1016/j.softx.2021.100667.
References
Peirce, C. S. The fixation of belief. Pop. Sci. Mon. 12, 1−15 (1877).
Peirce, C. S. & Menand, L. How to make our ideas clear. Pop. Sci. Mon. 12, 286–302 (1878).
McKay, M. D., Beckman, R. J. & Conover, W. J. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979).
Fisher, R. A. The arrangement of field experiments. In Breakthroughs in Statistics 82−91 (Springer, 1992).
Settles, B. Active learning literature survey. Technical Reports (University of Wisconsin-Madison, Department of Computer Sciences, 2009).
Krishnakumar, A. Active learning literature survey. Technical Reports 42 (University of California Santa Cruz, 2007).
van de Schoot, R. et al. Bayesian statistics and modelling. Nat. Rev. Methods Primers 1, 1–26 (2021).
Noack, M. M. et al. A Kriging-based approach to autonomous experimentation with applications to X-ray scattering. Sci. Rep. 9, 11809 (2019).
Noack, M. M., Doerk, G. S., Li, R., Fukuto, M. & Yager, K. G. Advances in Kriging-based autonomous X-ray scattering experiments. Sci. Rep. 10, 1325 (2020).
Noack, M. & Zwart, P. Computational strategies to increase efficiency of Gaussian-process-driven autonomous experiments. In 2019 IEEE/ACM 1st Annual Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP) 1−7 (IEEE, 2019).
Noack, M. M. et al. Autonomous materials discovery driven by Gaussian process regression with inhomogeneous measurement noise and anisotropic kernels. Sci. Rep. 10, 17663 (2020).
Wiegart, L. et al. Instrumentation for in situ/operando X-ray scattering studies of polymer additive manufacturing processes. Synchrotron Radiat. News 32, 20–27 (2019).
Frazier, P. I. Bayesian optimization. Recent Adv. Optim. Model. Contemp. Probl. https://doi.org/10.1287/educ.2018.0188 (2018).
Noack, M. gpcam version 6. bitbucket https://bitbucket.org/MarcusMichaelNoack/gpcam (2021).
Noack, M. M. & Funke, S. W. Hybrid genetic deflated Newton method for global optimisation. J. Comput. Appl. Math. 325, 97–112 (2017).
Hobson, A. & Cheng, B.-K. A comparison of the Shannon and Kullback information measures. J. Stat. Phys. 7, 301–310 (1973).
Noack, M. M. & Sethian, J. A. Advanced stationary and non-stationary Kernel designs for domain-aware Gaussian processes. Preprint at https://arxiv.org/abs/2102.03432 (2021).
Fratzl, P. Small-angle scattering in materials science — a short review of applications in alloys, ceramics and composite materials. J. Appl. Crystallogr. 36, 397–404 (2003).
Dubcek, P. Nanostructures as seen by the SAXS. Vacuum 80, 92–97 (2005).
Yager, K. G., Zhang, Y., Lu, F. & Gang, O. Periodic lattices of arbitrary nano-objects: modeling and applications for self-assembled systems. J. Appl. Crystallogr. 47, 118–129 (2014).
Liu, J. et al. The impact of alterations in lignin deposition on cellulose organization of the plant cell wall. Biotechnol. Biofuels 9, 126 (2016).
Paris, O. From diffraction to imaging: new avenues in studying hierarchical biological tissues with X-ray microbeams (review). Biointerphases 3, FB16 (2008).
Aghamohammadzadeh, H., Newton, R. H. & Meek, K. M. X-ray scattering used to map the preferred collagen orientation in the human cornea and limbus. Structure 12, 249–256 (2004).
Liu, J. et al. Amyloid structure exhibits polymorphism on multiple length scales in human brain tissue. Sci. Rep. 6, 33079 (2016).
Weaver, J. C. et al. The stomatopod dactyl club: a formidable damage-tolerant biological hammer. Science 336, 1275–1280 (2012).
Wang, Q. et al. Phase transformations and structural developments in the radular teeth of Cryptochiton stelleri. Adv. Funct. Mater. 23, 2908–2917 (2013).
Meredith, J. C., Smith, A. P., Karim, A. & Amis, E. J. Combinatorial materials science for polymer thin-film dewetting. Macromolecules 33, 9747–9756 (2000).
Stafford, C. M., Roskov, K. E., Epps III, T. H. & Fasolka, M. J. Generating thickness gradients of thin polymer films via flow coating. Rev. Sci. Instrum. 77, 023908 (2006).
Smith, A. P., Douglas, J. F., Meredith, J. C., Amis, E. J. & Karim, A. High-throughput characterization of pattern formation in symmetric diblock copolymer films. J. Polym. Sci. B 39, 2141–2158 (2001).
Davis, R. L., Jayaraman, S., Chaikin, P. M. & Register, R. A. Creating controlled thickness gradients in polymer thin films via flowcoating. Langmuir 30, 5637–5644 (2014).
Meredith, J. C., Karim, A. & Amis, E. J. High-throughput measurement of polymer blend phase behavior. Macromolecules 33, 5760–5762 (2000).
Roberson, S. V., Fahey, A. J., Sehgal, A. & Karim, A. Multifunctional ToF-SIMS: combinatorial mapping of gradient energy substrates. Appl. Surf. Sci. 200, 150–164 (2002).
Berry, B. C. et al. Versatile platform for creating gradient combinatorial libraries via modulated light exposure. Rev. Sci. Instrum. 78, 072202 (2007).
Smith, A. P., Sehgal, A., Douglas, J. F., Karim, A. & Amis, E. J. Combinatorial mapping of surface energy effects on diblock copolymer thin film ordering. Macromol. Rapid Commun. 24, 131–135 (2003).
Toth, K., Osuji, C. O., Yager, K. G. & Doerk, G. S. Electrospray deposition tool: creating compositionally gradient libraries of nanomaterials. Rev. Sci. Instrum. 91, 013701 (2020).
Holman, H.-Y. N., Bechtel, H. A., Hao, Z. & Martin, M. C. Synchrotron IR spectromicroscopy: chemistry of living cells. Anal. Chem. 82, 8757–8765 (2010).
Holman, H.-Y. N. et al. Real-time characterization of biogeochemical reduction of Cr (VI) on basalt surfaces by SR-FTIR imaging. Geomicrobiol. J. 16, 307–324 (1999).
Holman, H.-Y. N. et al. Catalysis of PAH biodegradation by humic acid shown in synchrotron infrared studies. Environ. Sci. Technol. 36, 1276–1280 (2002).
Mason, O. U. et al. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. ISME J. 6, 1715–1727 (2012).
Holman, H.-Y. N. et al. Real-time molecular monitoring of chemical environment in obligate anaerobes during oxygen adaptive response. Proc. Natl Acad. Sci. USA 106, 12599–12604 (2009).
Hazen, T. C. et al. Deep-sea oil plume enriches indigenous oil-degrading bacteria. Science 330, 204–208 (2010).
Bælum, J. et al. Deep-sea bacteria enriched by oil and dispersant from the Deepwater Horizon spill. Environ. Microbiol. 14, 2405–2416 (2012).
Benning, L. G., Phoenix, V., Yee, N. & Konhauser, K. The dynamics of cyanobacterial silicification: an infrared micro-spectroscopic investigation. Geochim. Cosmochim. Acta 68, 743–757 (2004).
Benning, L. G., Phoenix, V., Yee, N. & Tobin, M. Molecular characterization of cyanobacterial silicification using synchrotron infrared micro-spectroscopy. Geochim. Cosmochim. Acta 68, 729–741 (2004).
Yee, N., Benning, L. G., Phoenix, V. R. & Ferris, F. G. Characterization of metal-cyanobacteria sorption reactions: a combined macroscopic and infrared spectroscopic investigation. Environ. Sci. Technol. 38, 775–782 (2004).
Probst, A. J. et al. Tackling the minority: sulfate-reducing bacteria in an archaea-dominated subsurface biofilm. ISME J. 7, 635–651 (2013).
Valdespino-Castillo, P. M. et al. Exploring biogeochemistry and microbial diversity of extant microbialites in Mexico and Cuba. Front. Microbiol. 9, 510 (2018).
Valdespino-Castillo, P. M. et al. Interplay of microbial communities with mineral environments in coralline algae. Sci. Total Environ. 757, 143877 (2021).
Holman, E. et al. Autonomous adaptive data acquisition for scanning hyperspectral imaging. Commun. Biol. 3, 684 (2020).
Davies, T. & Fearn, T. Back to basics: the principles of principal component analysis. Spectrosc. Eur. 16, 20 (2004).
Melton, C. N. et al. K-means-driven Gaussian process data collection for angle-resolved photoemission spectroscopy. Mach. Learn. Sci. Technol. 1, 045015 (2020).
Cao, Y. et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature 556, 43–50 (2018).
Squires, G. L. Introduction to the Theory of Thermal Neutron Scattering (Cambridge Univ. Press, 2012).
Weber, T. Takin 2 (software). GitLab https://code.ill.fr/scientific-software/takin (2021).
Weber, T. Update 2.0 to “Takin: an open-source software for experiment planning, visualisation, and data analysis”, (PII: S2352711016300152). SoftwareX 14, 100667 (2021).
Bostwick, A. et al. Band structure and many body effects in graphene. Eur. Phys. J. Spec. Top. 148, 5–13 (2007).
Boehm, M. et al. ThALES – Three Axis Low Energy Spectroscopy for highly correlated electron systems. Neutron News 26, 18–21 (2015).
Acknowledgements
The work was partly funded through the Center for Advanced Mathematics for Energy Research Applications (CAMERA), which is jointly funded by the Advanced Scientific Computing Research (ASCR) and Basic Energy Sciences (BES) within the Department of Energy’s Office of Science, as well as by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory, under US Department of Energy contract no. DE-AC02-05CH11231. This research used resources of the Center for Functional Nanomaterials and the National Synchrotron Light Source II, which are US DOE Office of Science facilities, at Brookhaven National Laboratory under contract no. DE-SC0012704. This research also used resources of the Berkeley Synchrotron Infrared Structural Biology (BSISB) Imaging Program, funded by the US Department of Energy, Office of Biological and Environmental Research, under contract no. DE-AC02-05CH11231. The Advanced Light Source is supported by the Director, Office of Science, and the Office of Basic Energy Sciences. Both the ALS and BSISB were supported through contract no. DE-AC02-05CH11231. K.C.E. and C.B.M. acknowledge support from the Office of Naval Research Multidisciplinary University Research Initiative Award ONR N00014-18-1-2497. K.C.E. acknowledges support from the NSF Graduate Research Fellowship Program under grant no. DGE-1321851. This work is based on experiments performed at the Institut Laue-Langevin (ILL) in Grenoble, France. The collected datasets have the DOIs 10.5291/ILL-DATA.TEST-3123 and in part 10.5291/ILL-DATA.4-01-1643. The authors thank E. Villard, P. Chevalier and J. Locatelli for technical support at the ThALES spectrometer. C. N. Melton (author of ref.51) performed the K-means cluster-based GP collection simulations.
Author information
Authors and Affiliations
Contributions
M.M.N. wrote the initial drafts of the introduction and the technical sections, devised the algorithm used, formulated the required mathematics, and implemented the computer codes (gpCAM). P.H.Z. designed, coordinated and collaborated on the development of basic computational strategies in gpCAM and on its use in SR-FTIR microscopy and ARPES experiments and took part in writing and editing this manuscript. D.M.U. designed, configured and implemented codes associated with convnets for reverse image search and wrote the related section. M.F. and K.G.Y. planned, supervised and coordinated experiments at Brookhaven National Laboratory’s National Synchrotron Light Source II, and wrote the related section. M.F., K.G.Y., E.H.R.T., R.L., G.F. and M.Z. performed X-ray scattering experiments at National Synchrotron Light Source II, including beamline operation and data analytics. K.C.E. and C.B.M. prepared nanoplatelet materials. A.S. and G.S.D. prepared chemical templates and self-assembled films. E.R. planned and led the ARPES measurements at the Advanced Light Source, and wrote the related section. H.-Y.N.H. led the SR-FTIR measurements, coordinated the simulations and wrote the initial draft of the related section, S.L. designed and performed the PCA-based GP collection simulations and wrote the related section. L.C. designed the simulations and wrote the related section. Y.L.G. and T.W. customized gpCAM for use at the ThALES spectrometer. T.W. developed and performed preparatory simulations with gpCAM using theoretical dynamical structure factor models for neutron scattering. T.W. planned and T.W., M.B., P.S. and P.M. performed the first autonomous commissioning experiment at ThALES measuring the magnons in the chiral magnet MnSi. M.B. proposed and M.B., T.W., P.S. and P.M. performed the second autonomous commissioning experiment at ThALES, the results of which are shown in Fig. 4. The sample for the first experiment (MnSi) was provided by A. Bauer, the sample for the second autonomous commissioning experiment was provided by M.B. T.W. analysed the data of the first experiment (MnSi, not shown), M.B. analysed the data of the second experiment (Fig. 4). M.B. and T.W. wrote the text of the corresponding section to equal parts. J.A.S. supervised the development of the mathematics and the implementation of the code, and revised and improved the manuscript. All authors commented on the manuscript and revised it repeatedly.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information
Nature Reviews Physics thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Glossary
- Uncertainty quantification
-
The quantitative characterization of uncertainties in computational and real-world applications.
- Himmelblau’s function
-
A common test function in the optimization community.
- Principal component analysis
-
A dimensionality reduction technique that finds an orthonormal basis; typically retaining only the first few basis vectors preserves the majority of the variance of the dataset while substantially reducing data dimensionality.
- Non-negative matrix factorization
-
A computational linear algebra technique to factorize a matrix into two matrices without negative elements.
- Bump function
-
A function that is both smooth and compactly supported.
- Surrogate model
-
An approximate model when the actual model is difficult or costly to evaluate.
- Linear interpolation with Voronoi tessellation
-
A technique for function approximation and automated data acquisition.
- Triple-axis spectrometers
-
A special spectrometer that selects the wavelengths of neutrons before and after they hit the sample, which directly probes the energy and momentum response of various materials.
- Delaunay triangulation
-
A triangulation technique such that no point in the set is inside the circumcircle of any of the triangles connecting the points.
Rights and permissions
About this article
Cite this article
Noack, M.M., Zwart, P.H., Ushizima, D.M. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nat Rev Phys 3, 685–697 (2021). https://doi.org/10.1038/s42254-021-00345-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42254-021-00345-y
This article is cited by
-
Bayesian active learning with model selection for spectral experiments
Scientific Reports (2024)
-
Autonomous atomic Hamiltonian construction and active sampling of X-ray absorption spectroscopy by adversarial Bayesian optimization
npj Computational Materials (2023)
-
Physics-Informed Bayesian learning of electrohydrodynamic polymer jet printing dynamics
Communications Engineering (2023)
-
Robotic pendant drop: containerless liquid for μs-resolved, AI-executable XPCS
Light: Science & Applications (2023)
-
Demonstration of an AI-driven workflow for autonomous high-resolution scanning microscopy
Nature Communications (2023)