skip to main content
10.1145/3340531.3412182acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Automatic Gaussian Process Model Retrieval for Big Data

Authors Info & Claims
Published:19 October 2020Publication History

ABSTRACT

Gaussian Process Models (GPMs) are widely regarded as a prominent tool for capturing the inherent characteristics of data. These bayesian machine learning models allow for data analysis tasks such as regression and classification. Usually a process of automatic GPM retrieval is needed to find an optimal model for a given dataset, despite prevailing default instantiations and existing prior knowledge in some scenarios, which both shortcut the way to an optimal GPM. Since non-approximative Gaussian Processes only allow for processing small datasets with low statistical versatility, we propose a new approach that allows to efficiently and automatically retrieve GPMs for large-scale data. The resulting model is composed of independent statistical representations for non-overlapping segments of the given data. Our performance evaluation of the new approach demonstrates the quality of resulting models, which clearly outperform default GPM instantiations, while maintaining reasonable model training time.

Skip Supplemental Material Section

Supplemental Material

3340531.3412182.mp4

mp4

23.9 MB

References

  1. Mumtaz Ahmad, Songyun Duan, Ashraf Aboulnaga, and Shivnath Babu. 2011. Predicting completion times of batch query workloads using interaction-aware models and simulation. In EDBT. ACM, 449--460.Google ScholarGoogle Scholar
  2. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD Conference. ACM, 1009--1024.Google ScholarGoogle Scholar
  3. Pablo A. Alvarado and Dan Stowell. 2016. Gaussian processes for music audio modelling and content analysis. In MLSP. IEEE, 1--6.Google ScholarGoogle Scholar
  4. Christian Beecks, Kjeld Willy Schmidt, Fabian Berns, and Alexander Graß. 2019. Gaussian Processes for Anomaly Description in Production Environments. In EDBT/ICDT Workshops (CEUR Workshop Proceedings), Vol. 2322.Google ScholarGoogle Scholar
  5. Fabian Berns and Christian Beecks. 2020. Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning. In Proceedings of the 9th International Conference on Data Science, Technology and Applications.Google ScholarGoogle ScholarCross RefCross Ref
  6. Fabian Berns, Kjeld Willy Schmidt, Alexander Grass, and Christian Beecks. 2019. A New Approach for Efficient Structure Discovery in IoT. In BigData. IEEE, 4152--4156.Google ScholarGoogle Scholar
  7. Roberto Calandra, Jan Peters, Carl Edward Rasmussen, and Marc Peter Deisenroth. 2016. Manifold Gaussian Processes for regression. In IJCNN. IEEE, 3338--3345.Google ScholarGoogle Scholar
  8. Ching-An Cheng and Byron Boots. 2017. Variational Inference for Gaussian Process Models with Linear Complexity. NIPS. 5184--5194.Google ScholarGoogle Scholar
  9. Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. 2015. Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In CVPR. IEEE, 2909--2917.Google ScholarGoogle Scholar
  10. Lehel Csató and Manfred Opper. 2000. Sparse Representation for Gaussian Process Models. NIPS. MIT Press, 444--450.Google ScholarGoogle Scholar
  11. Andreas C. Damianou, Michalis K. Titsias, and Neil D. Lawrence. 2011. Variational Gaussian Process Dynamical Systems. NIPS. 2510--2518.Google ScholarGoogle Scholar
  12. Abhirup Datta, Sudipto Banerjee, Andrew O. Finley, and Alan E. Gelfand. 2016. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. J. Amer. Statist. Assoc., Vol. 111, 514 (2016), 800--812.Google ScholarGoogle ScholarCross RefCross Ref
  13. Alex Gittens and Michael W. Mahoney. 2016. Revisiting the Nystrom Method for Improved Large-scale Machine Learning. J. Mach. Learn. Res., Vol. 17 (2016), 117:1--117:65.Google ScholarGoogle Scholar
  14. Kohei Hayashi, Masaaki Imaizumi, and Yuichi Yoshida. 2020. On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis. In AISTATS (Proceedings of Machine Learning Research), Vol. 108. PMLR, 2055--2065.Google ScholarGoogle Scholar
  15. Georges Hebrail and Alice Berard. 2012. Individual household electric power consumption Data Set. (2012). https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumptionGoogle ScholarGoogle Scholar
  16. James Hensman, Nicoló Fusi, and Neil D. Lawrence. 2013. Gaussian Processes for Big Data. In UAI. AUAI Press.Google ScholarGoogle Scholar
  17. Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, Vol. 14, 8 (2002), 1771--1800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tao Hong, Pierre Pinson, and Shu Fan. 2014. Global Energy Forecasting Competition 2012. International Journal of Forecasting, Vol. 30, 2 (2014), 357--363.Google ScholarGoogle ScholarCross RefCross Ref
  19. Anton I. Iliev, Nikolay Kyurkchiev, and S. Markov. 2017. On the approximation of the step function by some sigmoid functions. Math. Comput. Simul., Vol. 133 (2017), 223--234.Google ScholarGoogle ScholarCross RefCross Ref
  20. Hyun-Chul Kim and Jaewook Lee. 2007. Clustering Based on Gaussian Processes. Neural Computation, Vol. 19, 11 (2007), 3088--3107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hyunjik Kim and Yee Whye Teh. 2018. Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes. In AISTATS (Proceedings of Machine Learning Research), Vol. 84. PMLR, 575--584.Google ScholarGoogle Scholar
  22. Donghoon Lee, Hyunsin Park, and Chang Dong Yoo. 2015. Face alignment using cascade Gaussian process regression trees. In CVPR. IEEE, 4204--4212.Google ScholarGoogle Scholar
  23. Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. Deep Neural Networks as Gaussian Processes. In ICLR (Poster).Google ScholarGoogle Scholar
  24. Steven Cheng-Xian Li and Benjamin M. Marlin. 2016. A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. NIPS. 1804--1812.Google ScholarGoogle Scholar
  25. Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. 2018. When Gaussian Process Meets Big Data: A Review of Scalable GPs. CoRR, Vol. abs/1807.01065 (2018).Google ScholarGoogle Scholar
  26. James Robert Lloyd, David Duvenaud, Roger B. Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In AAAI. 1242--1250.Google ScholarGoogle Scholar
  27. Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: a literature survey. Artif. Intell. Rev., Vol. 42, 2 (2014), 275--293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Max Planck Institute for Biogeochemistry. 2019. Weather Station Beutenberg / Weather Station Saaleaue: Jena Weather Data Analysis. (2019). https://www.bgc-jena.mpg.de/wetter/Google ScholarGoogle Scholar
  29. Tony A. Plate. 1999. Accuracy Versus Interpretability in Flexible Modeling: Implementing a Tradeoff Using Gaussian Process Models. Behaviormetrika, Vol. 26, 1 (1999), 29--50.Google ScholarGoogle ScholarCross RefCross Ref
  30. Zhe Qiang and Jinwen Ma. 2015. Automatic Model Selection of the Mixtures of Gaussian Processes for Regression. In ISNN, Vol. 9377. Springer, 335--344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. E. Rasmussen and C. K. I. Williams. 2006. Gaussian Processes for Machine Learning (Adaptive Computation And Machine Learning). The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rodrigo Rivera and Evgeny Burnaev. 2017. Forecasting of Commercial Sales with Large Scale Gaussian Processes. In ICDM Workshops. IEEE, 625--634.Google ScholarGoogle Scholar
  33. S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. 2013. Gaussian processes for time-series modelling. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, Vol. 371, 1984 (2013).Google ScholarGoogle Scholar
  34. Edward Snelson and Zoubin Ghahramani. 2007. Local and global sparse Gaussian process approximations. In AISTATS (JMLR Proceedings), Vol. 2. 524--531.Google ScholarGoogle Scholar
  35. Michalis K. Titsias. 2009. Variational Learning of Inducing Variables in Sparse Gaussian Processes. In AISTATS (JMLR Proceedings), Vol. 5. 567--574.Google ScholarGoogle Scholar
  36. Michalis K. Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian Process Latent Variable Model. AISTATS (JMLR Proceedings), Vol. 9. 844--851.Google ScholarGoogle Scholar
  37. Charles Truong, Laurent Oudre, and Nicolas Vayatis. 2020. Selective review of offline change point detection methods. Signal Process., Vol. 167 (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pinar Tüfekci. 2014. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, Vol. 60 (2014), 126--140.Google ScholarGoogle ScholarCross RefCross Ref
  39. Raquel Urtasun and Trevor Darrell. 2007. Discriminative Gaussian process latent variable model for classification. In ICML, Vol. 227. ACM, 927--934.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Andrew Gordon Wilson and Ryan Prescott Adams. 2013. Gaussian Process Kernels for Pattern Discovery and Extrapolation. In ICML (3) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 1067--1075.Google ScholarGoogle Scholar
  41. Dongkuan Xu and Yingjie Tian. 2015. A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, Vol. 2, 2 (2015).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic Gaussian Process Model Retrieval for Big Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
        October 2020
        3619 pages
        ISBN:9781450368599
        DOI:10.1145/3340531

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader