short-paper

Automatic Gaussian Process Model Retrieval for Big Data

Authors:
Fabian Berns

University of Münster, Münster, Germany

University of Münster, Münster, Germany
View Profile

,
Christian Beecks

University of Münster & Fraunhofer Institute for Applied Information Technology, Münster & Sankt Augustin, Germany

University of Münster & Fraunhofer Institute for Applied Information Technology, Münster & Sankt Augustin, Germany
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 1965–1968https://doi.org/10.1145/3340531.3412182

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 1965–1968

ABSTRACT

Gaussian Process Models (GPMs) are widely regarded as a prominent tool for capturing the inherent characteristics of data. These bayesian machine learning models allow for data analysis tasks such as regression and classification. Usually a process of automatic GPM retrieval is needed to find an optimal model for a given dataset, despite prevailing default instantiations and existing prior knowledge in some scenarios, which both shortcut the way to an optimal GPM. Since non-approximative Gaussian Processes only allow for processing small datasets with low statistical versatility, we propose a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data. The resulting model is composed of independent statistical representations for non-overlapping segments of the given data. Our performance evaluation of the new approach demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.

Supplemental Material

3340531.3412182.mp4

mp4

23.9 MB

Download

References

Mumtaz Ahmad, Songyun Duan, Ashraf Aboulnaga, and Shivnath Babu. 2011. Predicting completion times of batch query workloads using interaction-aware models and simulation. In EDBT. ACM, 449--460.Google Scholar
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD Conference. ACM, 1009--1024.Google Scholar
Pablo A. Alvarado and Dan Stowell. 2016. Gaussian processes for music audio modelling and content analysis. In MLSP. IEEE, 1--6.Google Scholar
Christian Beecks, Kjeld Willy Schmidt, Fabian Berns, and Alexander Graß. 2019. Gaussian Processes for Anomaly Description in Production Environments. In EDBT/ICDT Workshops (CEUR Workshop Proceedings), Vol. 2322.Google Scholar
Fabian Berns and Christian Beecks. 2020. Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning. In Proceedings of the 9th International Conference on Data Science, Technology and Applications.Google ScholarCross Ref
Fabian Berns, Kjeld Willy Schmidt, Alexander Grass, and Christian Beecks. 2019. A New Approach for Efficient Structure Discovery in IoT. In BigData. IEEE, 4152--4156.Google Scholar
Roberto Calandra, Jan Peters, Carl Edward Rasmussen, and Marc Peter Deisenroth. 2016. Manifold Gaussian Processes for regression. In IJCNN. IEEE, 3338--3345.Google Scholar
Ching-An Cheng and Byron Boots. 2017. Variational Inference for Gaussian Process Models with Linear Complexity. NIPS. 5184--5194.Google Scholar
Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. 2015. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In CVPR. IEEE, 2909--2917.Google Scholar
Lehel Csató and Manfred Opper. 2000. Sparse Representation for Gaussian Process Models. NIPS. MIT Press, 444--450.Google Scholar
Andreas C. Damianou, Michalis K. Titsias, and Neil D. Lawrence. 2011. Variational Gaussian Process Dynamical Systems. NIPS. 2510--2518.Google Scholar
Abhirup Datta, Sudipto Banerjee, Andrew O. Finley, and Alan E. Gelfand. 2016. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. J. Amer. Statist. Assoc., Vol. 111, 514 (2016), 800--812.Google ScholarCross Ref
Alex Gittens and Michael W. Mahoney. 2016. Revisiting the Nystrom Method for Improved Large-scale Machine Learning. J. Mach. Learn. Res., Vol. 17 (2016), 117:1--117:65.Google Scholar
Kohei Hayashi, Masaaki Imaizumi, and Yuichi Yoshida. 2020. On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis. In AISTATS (Proceedings of Machine Learning Research), Vol. 108. PMLR, 2055--2065.Google Scholar
Georges Hebrail and Alice Berard. 2012. Individual household electric power consumption Data Set. (2012). https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumptionGoogle Scholar
James Hensman, Nicoló Fusi, and Neil D. Lawrence. 2013. Gaussian Processes for Big Data. In UAI. AUAI Press.Google Scholar
Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, Vol. 14, 8 (2002), 1771--1800.Google ScholarDigital Library
Tao Hong, Pierre Pinson, and Shu Fan. 2014. Global Energy Forecasting Competition 2012. International Journal of Forecasting, Vol. 30, 2 (2014), 357--363.Google ScholarCross Ref
Anton I. Iliev, Nikolay Kyurkchiev, and S. Markov. 2017. On the approximation of the step function by some sigmoid functions. Math. Comput. Simul., Vol. 133 (2017), 223--234.Google ScholarCross Ref
Hyun-Chul Kim and Jaewook Lee. 2007. Clustering Based on Gaussian Processes. Neural Computation, Vol. 19, 11 (2007), 3088--3107.Google ScholarDigital Library
Hyunjik Kim and Yee Whye Teh. 2018. Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes. In AISTATS (Proceedings of Machine Learning Research), Vol. 84. PMLR, 575--584.Google Scholar
Donghoon Lee, Hyunsin Park, and Chang Dong Yoo. 2015. Face alignment using cascade Gaussian process regression trees. In CVPR. IEEE, 4204--4212.Google Scholar
Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. Deep Neural Networks as Gaussian Processes. In ICLR (Poster).Google Scholar
Steven Cheng-Xian Li and Benjamin M. Marlin. 2016. A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. NIPS. 1804--1812.Google Scholar
Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. 2018. When Gaussian Process Meets Big Data: A Review of Scalable GPs. CoRR, Vol. abs/1807.01065 (2018).Google Scholar
James Robert Lloyd, David Duvenaud, Roger B. Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In AAAI. 1242--1250.Google Scholar
Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: a literature survey. Artif. Intell. Rev., Vol. 42, 2 (2014), 275--293.Google ScholarDigital Library
Max Planck Institute for Biogeochemistry. 2019. Weather Station Beutenberg / Weather Station Saaleaue: Jena Weather Data Analysis. (2019). https://www.bgc-jena.mpg.de/wetter/Google Scholar
Tony A. Plate. 1999. Accuracy Versus Interpretability in Flexible Modeling: Implementing a Tradeoff Using Gaussian Process Models. Behaviormetrika, Vol. 26, 1 (1999), 29--50.Google ScholarCross Ref
Zhe Qiang and Jinwen Ma. 2015. Automatic Model Selection of the Mixtures of Gaussian Processes for Regression. In ISNN, Vol. 9377. Springer, 335--344.Google ScholarDigital Library
C. E. Rasmussen and C. K. I. Williams. 2006. Gaussian Processes for Machine Learning (Adaptive Computation And Machine Learning). The MIT Press.Google ScholarDigital Library
Rodrigo Rivera and Evgeny Burnaev. 2017. Forecasting of Commercial Sales with Large Scale Gaussian Processes. In ICDM Workshops. IEEE, 625--634.Google Scholar
S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. 2013. Gaussian processes for time-series modelling. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, Vol. 371, 1984 (2013).Google Scholar
Edward Snelson and Zoubin Ghahramani. 2007. Local and global sparse Gaussian process approximations. In AISTATS (JMLR Proceedings), Vol. 2. 524--531.Google Scholar
Michalis K. Titsias. 2009. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In AISTATS (JMLR Proceedings), Vol. 5. 567--574.Google Scholar
Michalis K. Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian Process Latent Variable Model. AISTATS (JMLR Proceedings), Vol. 9. 844--851.Google Scholar
Charles Truong, Laurent Oudre, and Nicolas Vayatis. 2020. Selective review of offline change point detection methods. Signal Process., Vol. 167 (2020).Google ScholarDigital Library
Pinar Tüfekci. 2014. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, Vol. 60 (2014), 126--140.Google ScholarCross Ref
Raquel Urtasun and Trevor Darrell. 2007. Discriminative Gaussian process latent variable model for classification. In ICML, Vol. 227. ACM, 927--934.Google ScholarDigital Library
Andrew Gordon Wilson and Ryan Prescott Adams. 2013. Gaussian Process Kernels for Pattern Discovery and Extrapolation. In ICML (3) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 1067--1075.Google Scholar
Dongkuan Xu and Yingjie Tian. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, Vol. 2, 2 (2015).Google ScholarCross Ref

Index Terms

Automatic Gaussian Process Model Retrieval for Big Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
    2. Machine learning approaches
      1. Kernel methods
        Gaussian processes

Recommendations

Local Gaussian Process Model Inference Classification for Time Series Data
SSDBM '21: Proceedings of the 33rd International Conference on Scientific and Statistical Database Management

One of the prominent types of time series analytics is classification, which entails identifying expressive class-wise features for determining class labels of time series data. In this paper, we propose a novel approach for time series classification ...
Read More
Indian Buffet Process for Model Selection in Latent Force Models
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Abstract
Latent force models (LFM) are an hybrid approach which combines multiple output Gaussian processes and differential equations, where the covariance functions encode the physical models given by the differential equations. LFM require the ...
Read More
Generalized multi-output Gaussian process censored regression
Highlights
- Censored data as defining characteristic of numerous domains in science.
- Heteroscedastic Multi-Output Gaussian Process formulated for censored regression.
- Generalization of arbitrary likelihood functions enabled by devising a ...
Abstract
When modelling censored observations (i.e. data in which the value of a measurement or observation is un-observable beyond a given threshold), a typical approach in current regression methods is to use a censored-Gaussian (i.e. Tobit) model to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bayesian machine learning
gaussian processes
information retrieval
performance evaluation
regression
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 214
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic Gaussian Process Model Retrieval for Big Data

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Local Gaussian Process Model Inference Classification for Time Series Data

Indian Buffet Process for Model Selection in Latent Force Models

Generalized multi-output Gaussian process censored regression