research-article

Open challenges for data stream mining research

Authors:
Georg Krempl

University Magdeburg, Germany

University Magdeburg, Germany
View Profile

,
Indre Žliobaite

Aalto University and HIIT, Finland

Aalto University and HIIT, Finland
View Profile

,
Dariusz Brzeziński

Poznan U. of Technology, Poland

Poznan U. of Technology, Poland
View Profile

,
Eyke Hüllermeier

University of Paderborn, Germany

University of Paderborn, Germany
View Profile

,
Mark Last

Ben-Gurion U. of the Negev, Israel

Ben-Gurion U. of the Negev, Israel
View Profile

,
Vincent Lemaire

Orange Labs, France

Orange Labs, France
View Profile

,
Tino Noack

TU Cottbus, Germany

TU Cottbus, Germany
View Profile

,
Ammar Shaker

University of Paderborn, Germany

University of Paderborn, Germany
View Profile

,
Sonja Sievi

Astrium Space Transportation, Germany

Astrium Space Transportation, Germany
View Profile

,
Myra Spiliopoulou

University Magdeburg, Germany

University Magdeburg, Germany
View Profile

,
Jerzy Stefanowski

Poznan U. of Technology, Poland

Poznan U. of Technology, Poland
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 16 Issue 1June 2014pp 1–10https://doi.org/10.1145/2674026.2674028

Published:25 September 2014Publication History

ACM SIGKDD Explorations Newsletter

Abstract

Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.

References

C. Aggarwal, editor. Data Streams: Models and Algorithms. Springer, 2007. Google ScholarDigital Library
C. Aggarwal and D. Turaga. Mining data streams: Systems and algorithms. In Machine Learning and Knowledge Discovery for Engineering Systems Health Management, pages 4--32. Chapman and Hall, 2012.Google Scholar
R. Agrawal and R. Srikant. Privacy-preserving data mining. SIGMOD Rec., 29(2):439--450, 2000. Google ScholarDigital Library
C. Anagnostopoulos, N. Adams, and D. Hand. Deciding what to observe next: Adaptive variable selection for regression in multivariate data streams. In Proc. of the 2008 ACM Symp. on Applied Computing, SAC, pages 961--965, 2008. Google ScholarDigital Library
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS, pages 1--16, 2002. Google ScholarDigital Library
C. Brodley, U. Rebbapragada, K. Small, and B. Wallace. Challenges and opportunities in applied machine learning. AI Magazine, 33(1):11--24, 2012.Google ScholarDigital Library
D. Brzezinski and J. Stefanowski. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. on Neural Networks and Learning Systems., 25:81--94, 2014.Google ScholarCross Ref
D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal. Mortal multi-armed bandits. In Proc. of the 22nd Conf. on Neural Information Processing Systems, NIPS, pages 273--280, 2008.Google Scholar
D. Cox and D. Oakes. Analysis of Survival Data. Chapman & Hall, London, 1984.Google Scholar
T. Dietterich. Machine-learning research. AI Magazine, 18(4):97--136, 1997.Google ScholarDigital Library
G. Ditzler and R. Polikar. Semi-supervised learning in nonstationary environments. In Proc. of the 2011 Int. Joint Conf. on Neural Networks, IJCNN, pages 2741--2748, 2011.Google ScholarCross Ref
W. Fan and A. Bifet. Mining big data: current status, and forecast to the future. SIGKDD Explorations, 14(2):1--5, 2012. Google ScholarDigital Library
M. Gaber, J. Gama, S. Krishnaswamy, J. Gomes, and F. Stahl. Data stream mining in ubiquitous environments: state-of-theart and current directions. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(2):116--138, 2014.Google ScholarDigital Library
M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: A review. SIGMOD Rec., 34(2):18--26, 2005. Google ScholarDigital Library
J. Gama. Knowledge Discovery from Data Streams. Chapman & Hall/CRC, 2010. Google ScholarDigital Library
J. Gama, R. Sebastiao, and P. Rodrigues. On evaluating stream learning algorithms. Machine Learning, 90(3):317--346, 2013. Google ScholarDigital Library
J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept-drift adaptation. ACM Computing Surveys, 46(4), 2014. Google ScholarDigital Library
J. Gantz and D. Reinsel. The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east, December 2012.Google Scholar
A. Goldberg, M. Li, and X. Zhu. Online manifold regularization: A new learning setting and empirical study. In Proc. of the European Conf. on Machine Learning and Principles of Knowledge Discovery in Databases, ECMLPKDD, pages 393--407, 2008. Google ScholarDigital Library
I. Guyon, A. Saffari, G. Dror, and G. Cawley. Model selection: Beyond the bayesian/frequentist divide. Journal of Machine Learning Research, 11:61--87, 2010. Google ScholarDigital Library
M. Hassani and T. Seidl. Towards a mobile health context prediction: Sequential pattern mining in multiple streams. In Proc. of , IEEE Int. Conf. on Mobile Data Management, MDM, pages 55--57, 2011. Google ScholarDigital Library
H. He and Y. Ma, editors. Imbalanced Learning: Foundations, Algorithms, and Applications. IEEE, 2013. Google ScholarDigital Library
T. Hoens, R. Polikar, and N. Chawla. Learning from streaming data with concept drift and imbalance: an overview. Progress in Artificial Intelligence, 1(1):89--101, 2012.Google ScholarCross Ref
IBM. An architectural blueprint for autonomic computing. Technical report, IBM, 2003.Google Scholar
E. Ikonomovska, K. Driessens, S. Dzeroski, and J. Gama. Adaptive windowing for online learning from multiple interrelated data streams. In Proc. of the 11th IEEE Int. Conf. on Data Mining Workshops, ICDMW, pages 697--704, 2011. Google ScholarDigital Library
A. Kotov, C. Zhai, and R. Sproat. Mining named entities with temporally correlated bursts from multilingual web news streams. In Proc. of the 4th ACMInt. Conf. onWeb Search and Data Mining, WSDM, pages 237--246, 2011. Google ScholarDigital Library
G. Krempl. The algorithm APT to classify in concurrence of latency and drift. In Proc. of the 10th Int. Conf. on Advances in Intelligent Data Analysis, IDA, pages 222--233, 2011. Google ScholarDigital Library
M. Last and H. Halpert. Survival analysis meets data stream mining. In Proc. of the 1st Worksh. on Real-World Challenges for Data Stream Mining, RealStream, pages 26--29, 2013.Google Scholar
F. Nelwamondo and T.Marwala. Key issues on computational intelligence techniques for missing data imputation - a review. In Proc. of World Multi Conf. on Systemics, Cybernetics and Informatics, volume 4, pages 35--40, 2008.Google Scholar
E. Noack,W. Belau, R.Wohlgemuth, R.Müller, S. Palumberi, P. Parodi, and F. Burzagli. Efficiency of the columbus failure management system. In Proc. of the AIAA 40th Int. Conf. on Environmental Systems, 2010.Google ScholarCross Ref
E. Noack, A. Luedtke, I. Schmitt, T. Noack, E. Schaumlöffel, E. Hauke, J. Stamminger, and E. Frisk. The columbus module as a technology demonstrator for innovative failure management. In German Air and Space Travel Congress, 2012.Google Scholar
M. Oliveira and J. Gama. A framework to monitor clusters evolution applied to economy and finance problems. Intelligent Data Analysis, 16(1):93--111, 2012. Google ScholarDigital Library
D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers Inc., 1999. Google ScholarDigital Library
T. Raeder and N. Chawla. Model monitor (m2): Evaluating, comparing, and monitoring models. Journal of Machine Learning Research, 10:1387--1390, 2009. Google ScholarDigital Library
P. Rodrigues and J. Gama. Distributed clustering of ubiquitous data streams. WIREs Data Mining and Knowledge Discovery, pages 38--54, 2013.Google Scholar
T. Sakaki, M. Okazaki, and Y. Matsuo. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. on Knowledge and Data Engineering, 25(4):919--931, 2013. Google ScholarDigital Library
C. Salperwyck and V. Lemaire. Learning with few examples: An empirical study on leading classifiers. In Proc. of the 2011 Int. Joint Conf. on Neural Networks, IJCNN, pages 1010--1019, 2011.Google ScholarCross Ref
B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, 2012.Google Scholar
A. Shaker and E. Hüllermeier. Survival analysis on data streams: Analyzing temporal events in dynamically changing environments. Int. Journal of Applied Mathematics and Computer Science, 24(1):199--212, 2014.Google ScholarCross Ref
C. Shearer. The CRISP-DMmodel: the new blueprint for data mining. J Data Warehousing, 2000.Google Scholar
Z. Siddiqui, M. Oliveira, J. Gama, and M. Spiliopoulou. Where are we going? predicting the evolution of individuals. In Proc. of the 11th Int. Conf. on Advances in Intelligent Data Analysis, IDA, pages 357--368, 2012. Google ScholarDigital Library
Z. Siddiqui and M. Spiliopoulou. Classification rule mining for a stream of perennial objects. In Proc. of the 5th Int. Conf. on Rule-based Reasoning, Programming, and Applications, RuleML, pages 281--296, 2011. Google ScholarDigital Library
M. Spiliopoulou and G. Krempl. Tutorial "mining multiple threads of streaming data". In Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, PAKDD, 2013.Google Scholar
D. Waterman. A Guide to Expert Systems. Addison-Wesley, 1986. Google ScholarDigital Library
W. Young, G. Weckman, and W. Holland. A survey of methodologies for the treatment of missing values within datasets: limitations and benefits. Theoretical Issues in Ergonomics Science, 12, January 2011.Google ScholarCross Ref
B. Zhou, Y. Han, J. Pei, B. Jiang, Y. Tao, and Y. Jia. Continuous privacy preserving publishing of data streams. In Proc. of the 12th Int. Conf. on Extending Database Technology, EDBT, pages 648--659, 2009. Google ScholarDigital Library
I. Zliobaite. Controlled permutations for testing adaptive learning models. Knowledge and Information Systems, In Press, 2014. Google ScholarDigital Library
I. Zliobaite, A. Bifet, M. Gaber, B. Gabrys, J. Gama, L. Minku, and K. Musial. Next challenges for adaptive learning systems. SIGKDD Explorations, 14(1):48--55, 2012. Google ScholarDigital Library
I. Zliobaite and B. Gabrys. Adaptive preprocessing for streaming data. IEEE Trans. on Knowledge and Data Engineering, 26(2):309--321, 2014. Google ScholarDigital Library

Index Terms

Open challenges for data stream mining research
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

Big data stream is a new hype but a practical computational challenge founded on data streams that are prevalent in applications nowadays. It is quite well known that data streams that are originated and collected from monitoring sensors accumulate ...
Read More
IoT Big Data Stream Mining
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become ...
Read More
Data Stream Mining: Challenges and Techniques
ICTAI '10: Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence - Volume 02

Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, call center records and so on. Their sheer volume and speed pose a great challenge for the data mining community to mine them. Data streams ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGKDD Explorations Newsletter Volume 16, Issue 1
Special issue on big data
June 2014
63 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/2674026
Editors:
Charu C. Aggarwal,
Haixun Wang,
Hanghang Tong,
Ankur Teredesai
University of Washington, Seattle, Washington
Issue’s Table of Contents
Copyright © 2014 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 September 2014
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 198
  Total Citations
  View Citations
- 2,628
  Total Downloads
- Downloads (Last 12 months)102
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Open challenges for data stream mining research

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

IoT Big Data Stream Mining

Data Stream Mining: Challenges and Techniques