ABSTRACT
Gaussian Process Models (GPMs) are widely regarded as a prominent tool for capturing the inherent characteristics of data. These bayesian machine learning models allow for data analysis tasks such as regression and classification. Usually a process of automatic GPM retrieval is needed to find an optimal model for a given dataset, despite prevailing default instantiations and existing prior knowledge in some scenarios, which both shortcut the way to an optimal GPM. Since non-approximative Gaussian Processes only allow for processing small datasets with low statistical versatility, we propose a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data. The resulting model is composed of independent statistical representations for non-overlapping segments of the given data. Our performance evaluation of the new approach demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.
Supplemental Material
- Mumtaz Ahmad, Songyun Duan, Ashraf Aboulnaga, and Shivnath Babu. 2011. Predicting completion times of batch query workloads using interaction-aware models and simulation. In EDBT. ACM, 449--460.Google Scholar
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD Conference. ACM, 1009--1024.Google Scholar
- Pablo A. Alvarado and Dan Stowell. 2016. Gaussian processes for music audio modelling and content analysis. In MLSP. IEEE, 1--6.Google Scholar
- Christian Beecks, Kjeld Willy Schmidt, Fabian Berns, and Alexander Graß. 2019. Gaussian Processes for Anomaly Description in Production Environments. In EDBT/ICDT Workshops (CEUR Workshop Proceedings), Vol. 2322.Google Scholar
- Fabian Berns and Christian Beecks. 2020. Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning. In Proceedings of the 9th International Conference on Data Science, Technology and Applications.Google ScholarCross Ref
- Fabian Berns, Kjeld Willy Schmidt, Alexander Grass, and Christian Beecks. 2019. A New Approach for Efficient Structure Discovery in IoT. In BigData. IEEE, 4152--4156.Google Scholar
- Roberto Calandra, Jan Peters, Carl Edward Rasmussen, and Marc Peter Deisenroth. 2016. Manifold Gaussian Processes for regression. In IJCNN. IEEE, 3338--3345.Google Scholar
- Ching-An Cheng and Byron Boots. 2017. Variational Inference for Gaussian Process Models with Linear Complexity. NIPS. 5184--5194.Google Scholar
- Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. 2015. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In CVPR. IEEE, 2909--2917.Google Scholar
- Lehel Csató and Manfred Opper. 2000. Sparse Representation for Gaussian Process Models. NIPS. MIT Press, 444--450.Google Scholar
- Andreas C. Damianou, Michalis K. Titsias, and Neil D. Lawrence. 2011. Variational Gaussian Process Dynamical Systems. NIPS. 2510--2518.Google Scholar
- Abhirup Datta, Sudipto Banerjee, Andrew O. Finley, and Alan E. Gelfand. 2016. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. J. Amer. Statist. Assoc., Vol. 111, 514 (2016), 800--812.Google ScholarCross Ref
- Alex Gittens and Michael W. Mahoney. 2016. Revisiting the Nystrom Method for Improved Large-scale Machine Learning. J. Mach. Learn. Res., Vol. 17 (2016), 117:1--117:65.Google Scholar
- Kohei Hayashi, Masaaki Imaizumi, and Yuichi Yoshida. 2020. On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis. In AISTATS (Proceedings of Machine Learning Research), Vol. 108. PMLR, 2055--2065.Google Scholar
- Georges Hebrail and Alice Berard. 2012. Individual household electric power consumption Data Set. (2012). https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumptionGoogle Scholar
- James Hensman, Nicoló Fusi, and Neil D. Lawrence. 2013. Gaussian Processes for Big Data. In UAI. AUAI Press.Google Scholar
- Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, Vol. 14, 8 (2002), 1771--1800.Google ScholarDigital Library
- Tao Hong, Pierre Pinson, and Shu Fan. 2014. Global Energy Forecasting Competition 2012. International Journal of Forecasting, Vol. 30, 2 (2014), 357--363.Google ScholarCross Ref
- Anton I. Iliev, Nikolay Kyurkchiev, and S. Markov. 2017. On the approximation of the step function by some sigmoid functions. Math. Comput. Simul., Vol. 133 (2017), 223--234.Google ScholarCross Ref
- Hyun-Chul Kim and Jaewook Lee. 2007. Clustering Based on Gaussian Processes. Neural Computation, Vol. 19, 11 (2007), 3088--3107.Google ScholarDigital Library
- Hyunjik Kim and Yee Whye Teh. 2018. Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes. In AISTATS (Proceedings of Machine Learning Research), Vol. 84. PMLR, 575--584.Google Scholar
- Donghoon Lee, Hyunsin Park, and Chang Dong Yoo. 2015. Face alignment using cascade Gaussian process regression trees. In CVPR. IEEE, 4204--4212.Google Scholar
- Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. Deep Neural Networks as Gaussian Processes. In ICLR (Poster).Google Scholar
- Steven Cheng-Xian Li and Benjamin M. Marlin. 2016. A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. NIPS. 1804--1812.Google Scholar
- Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. 2018. When Gaussian Process Meets Big Data: A Review of Scalable GPs. CoRR, Vol. abs/1807.01065 (2018).Google Scholar
- James Robert Lloyd, David Duvenaud, Roger B. Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In AAAI. 1242--1250.Google Scholar
- Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: a literature survey. Artif. Intell. Rev., Vol. 42, 2 (2014), 275--293.Google ScholarDigital Library
- Max Planck Institute for Biogeochemistry. 2019. Weather Station Beutenberg / Weather Station Saaleaue: Jena Weather Data Analysis. (2019). https://www.bgc-jena.mpg.de/wetter/Google Scholar
- Tony A. Plate. 1999. Accuracy Versus Interpretability in Flexible Modeling: Implementing a Tradeoff Using Gaussian Process Models. Behaviormetrika, Vol. 26, 1 (1999), 29--50.Google ScholarCross Ref
- Zhe Qiang and Jinwen Ma. 2015. Automatic Model Selection of the Mixtures of Gaussian Processes for Regression. In ISNN, Vol. 9377. Springer, 335--344.Google ScholarDigital Library
- C. E. Rasmussen and C. K. I. Williams. 2006. Gaussian Processes for Machine Learning (Adaptive Computation And Machine Learning). The MIT Press.Google ScholarDigital Library
- Rodrigo Rivera and Evgeny Burnaev. 2017. Forecasting of Commercial Sales with Large Scale Gaussian Processes. In ICDM Workshops. IEEE, 625--634.Google Scholar
- S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. 2013. Gaussian processes for time-series modelling. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, Vol. 371, 1984 (2013).Google Scholar
- Edward Snelson and Zoubin Ghahramani. 2007. Local and global sparse Gaussian process approximations. In AISTATS (JMLR Proceedings), Vol. 2. 524--531.Google Scholar
- Michalis K. Titsias. 2009. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In AISTATS (JMLR Proceedings), Vol. 5. 567--574.Google Scholar
- Michalis K. Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian Process Latent Variable Model. AISTATS (JMLR Proceedings), Vol. 9. 844--851.Google Scholar
- Charles Truong, Laurent Oudre, and Nicolas Vayatis. 2020. Selective review of offline change point detection methods. Signal Process., Vol. 167 (2020).Google ScholarDigital Library
- Pinar Tüfekci. 2014. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, Vol. 60 (2014), 126--140.Google ScholarCross Ref
- Raquel Urtasun and Trevor Darrell. 2007. Discriminative Gaussian process latent variable model for classification. In ICML, Vol. 227. ACM, 927--934.Google ScholarDigital Library
- Andrew Gordon Wilson and Ryan Prescott Adams. 2013. Gaussian Process Kernels for Pattern Discovery and Extrapolation. In ICML (3) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 1067--1075.Google Scholar
- Dongkuan Xu and Yingjie Tian. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, Vol. 2, 2 (2015).Google ScholarCross Ref
Index Terms
- Automatic Gaussian Process Model Retrieval for Big Data
Recommendations
Local Gaussian Process Model Inference Classification for Time Series Data
SSDBM '21: Proceedings of the 33rd International Conference on Scientific and Statistical Database ManagementOne of the prominent types of time series analytics is classification, which entails identifying expressive class-wise features for determining class labels of time series data. In this paper, we propose a novel approach for time series classification ...
Indian Buffet Process for Model Selection in Latent Force Models
Progress in Pattern Recognition, Image Analysis, Computer Vision, and ApplicationsAbstractLatent force models (LFM) are an hybrid approach which combines multiple output Gaussian processes and differential equations, where the covariance functions encode the physical models given by the differential equations. LFM require the ...
Generalized multi-output Gaussian process censored regression
Highlights- Censored data as defining characteristic of numerous domains in science.
- Heteroscedastic Multi-Output Gaussian Process formulated for censored regression.
- Generalization of arbitrary likelihood functions enabled by devising a ...
AbstractWhen modelling censored observations (i.e. data in which the value of a measurement or observation is un-observable beyond a given threshold), a typical approach in current regression methods is to use a censored-Gaussian (i.e. Tobit) model to ...
Comments