Elsevier

Neurocomputing

Volume 459, 12 October 2021, Pages 249-289
Neurocomputing

Online learning: A comprehensive survey

https://doi.org/10.1016/j.neucom.2021.04.112Get rights and content

Abstract

Online learning represents a family of machine learning methods, where a learner attempts to tackle some predictive (or any type of decision-making) task by learning from a sequence of data instances one by one at each time. The goal of online learning is to maximize the accuracy/correctness for the sequence of predictions/decisions made by the online learner given the knowledge of correct answers to previous prediction/learning tasks and possibly additional information. This is in contrast to traditional batch or offline machine learning methods that are often designed to learn a model from the entire training data set at once. Online learning has become a promising technique for learning from continuous streams of data in many real-world applications. This survey aims to provide a comprehensive survey of the online machine learning literature through a systematic review of basic ideas and key principles and a proper categorization of different algorithms and techniques. Generally speaking, according to the types of learning tasks and the forms of feedback information, the existing online learning works can be classified into three major categories: (i) online supervised learning where full feedback information is always available, (ii) online learning with limited feedback, and (iii) online unsupervised learning where no feedback is available. Due to space limitation, the survey will be mainly focused on the first category, but also briefly cover some basics of the other two categories. Finally, we also discuss some open issues and attempt to shed light on potential future research directions in this field.

Introduction

Machine learning plays a crucial role in modern data analytics and artificial intelligence (AI) applications. Traditional machine learning paradigms often work in a batch learning or offline learning fashion (especially for supervised learning), where a model is trained by some learning algorithm from an entire training data set at once, and then the model is deployed for inference without (or seldom) performing any update afterwards. Such learning methods suffer from expensive re-training cost when dealing with new training data, and thus are poorly scalable for real-world applications. In the era of big data, traditional batch learning paradigms become more and more restricted, especially when live data grows and evolves rapidly. Making machine learning scalable and practical especially for learning from continuous data streams has become an open grand challenge in machine learning and AI.

Unlike traditional machine learning, online learning is a subfield of machine learning and includes an important family of learning techniques which are devised to learn models incrementally from data in a sequential manner. Online learning overcomes the drawbacks of traditional batch learning in that the model can be updated instantly and efficiently by an online learner when new training data arrives. Besides, online learning algorithms are often easy to understand, simple to implement, and often founded on solid theory with rigorous regret bounds. Along with urgent need of making machine learning practical for real big data analytics, online learning has attracted increasing interest in recent years.

This survey aims to give a comprehensive survey of online learning literature. Online learning 1 has been extensively studied across different fields, ranging from machine learning, data mining, statistics, optimization and applied math, to artificial intelligence and data science. This survey aims to distill the core ideas of online learning methodologies and applications in literature. This survey is written mainly for machine learning audiences, and assumes readers with basic knowledge in machine learning. While trying our best to make the survey as comprehensive as possible, it is very difficult to cover every detail since online learning research has been evolving rapidly in recent years. We apologize in advance for any missing papers or inaccuracies in description, and encourage readers to provide feedback, comments or suggestions. Finally, as a supplemental document to this survey, readers may check our updated version online at: http://libol.stevenhoi.org/survey.

Traditional machine learning paradigm often runs in a batch learning fashion, e.g., a supervised learning task, where a collection of training data is given in advance to train a model by following some learning algorithm. Such a paradigm requires the entire training data set to be made available prior to the learning task, and the training process is often done in an offline environment due to the expensive training cost. Traditional batch learning methods suffer from some critical drawbacks: (i) low efficiency in both time and space costs; and (ii) poor scalability for large-scale applications because the model often has to be re-trained from scratch for new training data.

In contrast to batch learning algorithms, online learning is a method of machine learning for data arriving in a sequential order, where a learner aims to learn and update the best predictor for future data at every step. Online learning is able to overcome the drawbacks of batch learning in that the predictive model can be updated instantly for any new data instances. Thus, online learning algorithms are far more efficient and scalable for large-scale machine learning tasks in real-world data analytics applications where data are not only large in size, but also arriving at a high velocity.

Similar to traditional (batch) machine learning methods, online learning techniques can be applied to solve a variety of tasks in a wide range of real-world application domains. Examples of online learning tasks include the following:

Supervised learning tasks: Online learning algorithms can be derived for supervised learning tasks. One of the most common tasks is classification, aiming to predict the categories for a new data instance belongs to, on the basis of observing past training data instances whose category labels are given. For example, a commonly studied task in online learning is online binary classification (e.g., spam email filtering) which only involves two categories (“spam” vs “benign” emails); other types of supervised classification tasks include multi-class classification, multi-label classification, and multiple-instance classification, etc.

In addition to classification tasks, another common supervised learning task is regression analysis, which refers to the learning process for estimating the relationships among variables (typically between a dependent variable and one or more independent variables). Online learning techniques are naturally applied for regression analysis tasks, e.g., time series analysis in financial markets where data instances naturally arrive in a sequential way. Besides, another application for online learning with financial time series data is online portfolio section where an online learner aims to find a good (e.g., profitable and low-risk) strategy for making a sequence of decisions for portfolio selection.

Bandit learning tasks: Bandit online learning algorithms, also known as Multi-armed bandits (MAB), have been extensively used for many online recommender systems, such as online advertising for internet monetization, product recommendation in e-commerce, movie recommendation for entertainment, and other personalized recommendation, etc.

Unsupervised learning tasks: Online learning algorithms can be applied for unsupervised learning tasks. Examples include clustering or cluster analysis—a process of grouping objects such that objects in the same group (“cluster”) are more similar to each other than to objects in other clusters. Online clustering aims to perform incremental cluster analysis on a sequence of instances, which is common for mining data streams.

Other learning tasks: Online learning can also be used for other kinds of machine learning tasks, such as learning for recommender systems, learning to rank, or reinforcement learning. For example, collaborative filtering with online learning can be applied to enhance the performance of recommender systems by learning to improve collaborative filtering tasks sequentially from continuous streams of ratings/feedback information from users.

Last but not least, we note that online learning techniques are often used in two major scenarios. One is to improve efficiency and scalability of existing machine learning methodologies for batch machine learning tasks where a full collection of training data must be made available before the learning tasks. For example, Support Vector Machines (SVM) is a well-known supervised learning method for batch classification tasks, in which classical SVM algorithms (e.g., QP or SMO solvers [296]) suffer from poor scalability for very large-scale applications. In literature, various online learning algorithms have been explored for training SVM in an online (or stochastic) learning manner [297], [336], making it more efficient and scalable than conventional batch SVMs. The other scenario is to apply online learning algorithms to directly tackle online streaming data analytics tasks where data instances naturally arrive in a sequential manner and the target concepts may be drifting or evolving over time. Examples include time series regression, such as stock price prediction, where data arrives periodically and the learner has to make decisions immediately before getting the next instance.

To help readers better understand the online learning literature as a whole, we attempt to construct a taxonomy of online learning methods and techniques, as summarized in Fig. 1. In general, from a theoretical perspective, online learning methodologies are founded based on theory and principles from three major theory communities: learning theory, optimization theory, and game theory. From the perspective of specific algorithms, we can further group the existing online learning techniques into different categories according to their specific learning principles and problem settings. Specifically, according to the types of feedback information and the types of supervision in the learning tasks, online learning techniques can be classified into the following three major categories:

  • Online supervised learning: This is concerned with supervised learning tasks where full feedback information is always revealed to a learner at the end of each online learning round. It can be further divided into two groups of studies: (i) “Online Supervised Learning” which forms the fundamental approaches and principles for Online Supervised Learning; and (ii) “Applied Online Learning” which constitute more non-traditional online supervised learning, where the fundamental approaches cannot be directly applied, and algorithms have been appropriately tailored to suit the non-traditional online learning setting.

  • Online learning with limited feedback: This is concerned with tasks where an online learner receives partial feedback information from the environment during the online learning process. For example, consider an online multi-class classification task, at a particular round, the learner makes a prediction of class label for an incoming instance, and then receives the partial feedback indicating whether the prediction is correct or not, instead of the particular true class label explicitly. For such tasks, the online learner often has to make the online updates or decisions by attempting to achieve some tradeoff between the exploitation of disclosed knowledge and the exploration of unknown information with the environment.

  • Online unsupervised learning: This is concerned with online learning tasks where the online learner only receives the sequence of data instances without any additional feedback (e.g., true class label) during the online learning tasks. Unsupervised online learning can be considered as a natural extension of traditional unsupervised learning for dealing with data streams, which is typically studied in batch learning fashion. Examples of unsupervised online learning include online clustering, online dimension reduction, and online anomaly detection tasks, etc. Unsupervised online learning has less restricted assumptions about data without requiring explicit feedback or label information which could be difficult or expensive to acquire.

This article will conduct a systematic review of existing works for online learning, especially for online supervised learning and online learning with partial feedback. Finally, we note that it is always very challenging to make a precise categorization of all the existing online learning work, and it is likely that the above proposed taxonomy may not fully cover all the existing online learning work in literature, though we have tried our best to cover as much as possible.

This paper attempts to make a comprehensive survey of online learning research work. In literature, there are some related books, PHD theses, and articles published over the past years dedicated to online learning [73], [333], in which many of them also include rich discussions on related work on online learning. For example, the book titled “Prediction, Learning, and Games” [73] gave a nice introduction about some niche subjects of online learning, particularly for online prediction with expert advice and online learning with partial feedback. Another work titled “Online Learning and Online Convex Optimization” [333] gave a nice tutorial about basics of online learning and foundations of online convex optimization. In addition, there are also quite a few PHD theses dedicated to addressing different subjects of online learning [205], [332], [427], [224]. Readers are also encouraged to read some older related books, surveys and tutorial notes about online learning and online algorithms [125], [49], [304], [45], [14]. Finally, readers who are interested in applied online learning can explore some open-source toolboxes, including LIBOL [173], [397] and Vowpal Wabbit [217].

Section snippets

Problem formulations and related theory

Without loss of generality, we first give a formal formulation of a classic online learning problem, i.e., online binary classification, and then introduce basics of statistical learning theory, online convex optimization and game theory as the theoretical foundations for online learning techniques.

Overview

In this section, we survey a family of “online supervised learning” algorithms which define the fundamental approaches and principles for online learning methodologies toward supervised learning tasks [333], [305].

We first discuss linear online learning methods, where a target model is a linear function. More formally, consider an input domain X and an output domain Y for a learning task, we aim to learn a hypothesis f:XY, where the target model f is linear. For example, consider a typical

Overview

In this section, we survey the most representative algorithms for a group of non-traditional online learning tasks, wherein the supervised online algorithms cannot be used directly. These algorithms are motivated by new problem settings and applications which follow the traditional online setting, where the data arrives in a sequential manner. However, there was a need to develop new algorithms which were suited to these scenarios. Our review includes cost-sensitive online learning, online

Overview

Bandit online learning, a.k.a. the “Multi-armed Bandit (MAB) problem [310], [197], [364], [55], [144], is an important branch of online learning where a learner makes sequential decisions by receiving only partial feedback from the environment each time.

MAB problems are online learning tasks for sequential decisions with a trade-off between exploration and exploitation. Specifically, on each round, a player chooses one out of K actions, the environment then reveals the payoff of the player’s

Overview

In a standard online learning task (e.g., online binary classification), the learner receives and makes prediction for a sequence of instances generated from some unknown distribution. At the end of every round, it always assumes the learner will receive the true label (feedback) from the environment. For many real-world applications, obtaining the labels could be very expensive, and sometimes it is not always necessary/informative to query the true labels of every instance, e.g., if an

Overview

Semi-Supervised Learning (SSL) has been an important class of machine learning tasks and techniques, which aims to make use of unlabeled data for learning tasks. It has been extensively studied mostly in the settings of batch learning and some comprehensive surveys can be found in [442], [444]. When online learning meets semi-supervised learning, there are two major branches of research. One major branch of studies is to turn traditional batch semi-supervised learning methods into online

Overview

In this section we briefly review some key work in the literature of online unsupervised learning, where models are learned from unlabeled data streams where no explicit feedback is available. Broadly, we categorize the existing work into four major groups: Online Clustering, Online Dimension Reduction, Online Anomaly Detection, and Online Density Estimation. Due to the vast number of ways in which unsupervised learning in online settings have been explored in literature, and numerous

Overview

In this section, we discuss the relationship of online learning with other related areas and terminologies which sometimes may be confused. We note that some of the following remarks may be somewhat subjective, and their meanings may vary in diverse contexts whereas some terms and notions may be used interchangeably.

Incremental learning

Incremental learning, or decremental learning, represents a family of machine learning techniques [274], [297], [307], which are particularly suitable for learning from data

Concluding remarks

This paper gave a comprehensive survey of existing online learning works and reviewed ongoing trends of online learning research. In theory, online learning methodologies are founded primarily based on learning theory, optimization theory, and game theory. According to the type of feedback to the learner, the existing online learning methods can be roughly grouped into the following three major categories:

  • Supervised online learning is concerned with the online learning tasks where full feedback

CRediT authorship contribution statement

Steven C.H. Hoi: Conceptualization, Investigation, Writing - original draft, Writing - review & editing, Supervision, Project administration. Doyen Sahoo: Investigation, Writing - original draft, Writing - review & editing. Jing Lu: Investigation, Writing - original draft, Writing - review & editing. Peilin Zhao: Investigation, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

aaaa

Steven C.H. Hoi is currently the Managing Director of Salesforce Research Asia, and a Professor of Information Systems at Singapore Management University, Singapore. Prior to joining SMU, he was an Associate Professor with Nanyang Technological University, Singapore. He received his Bachelor degree from Tsinghua University, P.R. China, in 2002, and his Ph.D degree in computer science and engineering from The Chinese University of Hong Kong, in 2006. His research interests are machine learning

References (446)

  • R. Agarwal, A.A. Sekh, K. Agarwal, D.K. Prasad, Auxiliary network: scalable and agile online learning for dynamic...
  • C.C. Aggarwal, A survey of stream clustering algorithms,...
  • C.C. Aggarwal, J. Han, J. Wang, P.S. Yu, A framework for projected clustering of high dimensional data streams, in:...
  • S. Agmon

    The relaxation method for linear inequalities

    Can. J. Math.

    (1954)
  • S. Agrawal et al.

    Analysis of thompson sampling for the multi-armed bandit problem

    Conference on Learning Theory

    (2012)
  • S. Agrawal et al.

    Thompson sampling for contextual bandits with linear payoffs

    International Conference on Machine Learning

    (2013)
  • K. Akcoglu et al.

    Fast universalization of investment strategies

    SIAM J. Comput.

    (2004)
  • S. Albers

    Online algorithms: a survey

    Math. Program.

    (2003)
  • M. Ali et al.

    Parallel collaborative filtering for streaming data

    (2011)
  • A. Amini et al.

    On density-based data streams clustering algorithms: a survey

    J. Comput. Sci. Technol.

    (2014)
  • A. Amini et al.

    Dengris-stream: a density-grid based clustering algorithm for evolving data streams over sliding window

  • O. Anava et al.

    Online learning for time series prediction

    Conference on Learning Theory

    (2013)
  • F. Angiulli et al.

    Detecting distance-based outliers in streams of data

  • K. Ariu et al.

    Regret in online recommendation systems

  • R. Arora, A. Cotter, K. Livescu, N. Srebro, Stochastic optimization for pca and pls, in: Allerton Conference, Citeseer,...
  • R. Arora et al.

    Stochastic optimization of pca with capped msg

    Advances in Neural Information Processing Systems

    (2013)
  • S. Arora et al.

    The multiplicative weights update method: a meta-algorithm and applications

    Theory Comput.

    (2012)
  • A. Ashfahani et al.

    Autonomous deep learning: continual learning approach for dynamic environments

  • L.E. Atlas et al.

    Training connectionist networks with queries and selective sampling

  • P. Auer

    Using confidence bounds for exploitation-exploration trade-offs

    J. Mach. Learn. Res.

    (2002)
  • P. Auer et al.

    Finite-time analysis of the multiarmed bandit problem

    Mach. Learn.

    (2002)
  • P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire, Gambling in a rigged casino: the adversarial multi-armed bandit...
  • P. Auer et al.

    The nonstochastic multiarmed bandit problem

    SIAM J. Comput.

    (2002)
  • G. BakIr

    Predicting structured data

    (2007)
  • Y. Baram et al.

    Online choice of active learning algorithms

    J. Mach. Learn. Res.

    (2004)
  • B. Barbaro, Tuning hyperparameters for online learning. Ph.D. thesis. Case Western Reserve University,...
  • A.G. Barto et al.

    Reinforcement learning and its relationship to supervised learning

    Handbook of learning and approximate dynamic programming

    (2004)
  • M. Belkin et al.

    Manifold regularization: a geometric framework for learning from labeled and unlabeled examples

    J. Mach. Learn. Res.

    (2006)
  • S. Ben-David et al.

    Online learning versus offline learning

    Mach. Learn.

    (1997)
  • P. Berkhin

    A survey of clustering data mining techniques

    Grouping multidimensional data. Springer

    (2006)
  • D.A. Berry et al.

    Bandit problems with infinitely many arms

    Ann. Stat.

    (1997)
  • A. Beygelzimer et al.

    Efficient online bandit multiclass learning with Tregret

  • V. Bhatnagar et al.

    Clustering data streams using grid-based synopsis

    Knowl. Inf. Syst.

    (2014)
  • H. Bhatt, R. Singh, M. Vatsa, N. Ratha, Improving cross-resolution face matching using ensemble based co-transfer...
  • H.S. Bhatt et al.

    Matching cross-resolution face images using co-transfer learning

  • M. Biesialska et al.

    Continual lifelong learning in natural language processing: A survey

  • A. Blum

    On-line algorithms in machine learning

    (1998)
  • A.P. Boedihardjo et al.

    A framework for estimating complex probability density structures in data streams

  • A. Borodin et al.

    Can we learn to beat the best stock

    Advances in Neural Information Processing Systems

    (2004)
  • L. Bottou, Online algorithms and stochastic approximations, in: D. Saad (Ed.), Online Learning and Neural Networks,...
  • Cited by (285)

    View all citing articles on Scopus

    Steven C.H. Hoi is currently the Managing Director of Salesforce Research Asia, and a Professor of Information Systems at Singapore Management University, Singapore. Prior to joining SMU, he was an Associate Professor with Nanyang Technological University, Singapore. He received his Bachelor degree from Tsinghua University, P.R. China, in 2002, and his Ph.D degree in computer science and engineering from The Chinese University of Hong Kong, in 2006. His research interests are machine learning and data mining and their applications to multimedia information retrieval, social media and web mining, and computational finance, etc. He has served as the Editor-in-Chief for Neurocomputing Journal, general co-chair for ACM SIGMM Workshops on Social Media, program co-chair for the fourth Asian Conference on Machine Learning, book editor for “Social Media Modeling and Computing”, guest editor for ACM Transactions on Intelligent Systems and Technology. He is an IEEE Fellow and ACM Distinguished Member.

    Doyen Sahoo is a Senior Research Scientist at Salesforce Research Asia. Prior to joining Salesforce, Doyen was a Research Fellow at the Living Analytics Research Center at Singapore Management University (SMU). He was also serving as Adjunct Faculty in SMU. Doyen earned his PhD in Information Systems from SMU in 2018 and B.Eng in Computer Science from Nanyang Technological University in 2012. His research interests include Online Learning, Deep Learning, Computer Vision, and he also works on applied research including AIOps, Computational Finance and Cyber Security applications. He has published over 40 articles in top tier conferences and journals including ICLR, CVPR, ACL, KDD, JMLR, etc.

    Jing Lu is a senior engineer in JD.com, focusing on online advertising systems for e-commerce platforms. Prior to joining JD, she finished her Doctor’s degree on Machine Learning, in School of Information System, Singapore Management University (2014-2018) and School of Computer Engineering in Nanyang Technological University (2012-2014). She is currently dedicated to the research area of personalized recommendation systems, CTR prediction in sponsored search, online learning and active learning and has published several research papers in top tier journals and conferences including, NIPS, KDD, ICCV, JMLR, TKDE, TIST etc.

    Peilin Zhao is currently a Principal Researcher at Tencent AI Lab, China. Previously, he has worked at Rutgers University, Institute for Infocomm Research (I2R), Ant Group. His research interests include: Online Learning, Recommendation System, Automatic Machine Learning, Deep Graph Learning, and Reinforcement Learning etc. He has published over 100 papers in top venues, including JMLR, ICML, KDD, etc. He has been invited as a PC member, reviewer or editor for many conferences and journals, such as ICML, JMLR, etc. He received his bachelor’s degree from Zhejiang University, and his Ph.D. degree from Nanyang Technological University.

    View full text