Abstract
Emerging Patterns are itemsets whose supports change significantly from one dataset to another. They are useful as a means of discovering distinctions inherently present amongst a collection of datasets and have been shown to be a powerful technique for constructing accurate classifiers. The task of finding such patterns is challenging though, and efficient techniques for their mining are needed.
In this paper, we present a new mining method for a particular type of emerging pattern known as a jumping emerging pattern. The basis of our algorithm is the construction of trees, whose structure specifically targets the likely distribution of emerging patterns. The mining performance is typically around 5 times faster than earlier approaches. We then examine the problem of computing a useful subset of the possible emerging patterns. We show that such patterns can be mined even more efficiently (typically around 10 times faster), with little loss of precision.
Chapter PDF
References
R. Agrawal and R. Skrikant. Fast algorithms for mining association rules. In Proceedings of the Twentieth International Conference on Very Large Data Bases, Santiago, Chile, 1994. p. 487–499.
Bayardo, R. J. Efficiently Mining Long Patterns from Databases. SIGMOD 1998.
C. L. Blake and P. M. Murphy. UCI Repository of machine learning [http://www.ics.uci.edu/~mlearn/MLRepository.html].
C. V. Cormack, C. R Palmer and C. L. A. Clarke. Efficient construction of large test collections. In Proceedings of the Twenty-first Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998. p. 282–289.
De Raedt, L., Kramer, S. The Level-Wise Version Space Algorithm and its Application to Molecular Fragment Finding. (IJCAI-01), 2001.
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth International Conference on Knowledge Discovery and Data Mining, San Diego, USA, (SIGKDD’99), 1999, p.43–52.
Dong, G., Li, J. and Zhang, X. Discovering Jumping Emerging Patterns and Experiments on Real Datasets. (IDC99), 1999.
Fan, H. and Ramamohanarao, K. An efficient Single-scan Algorithm for mining Essential Jumping Emerging Patterns for Classification Accepted at PAKDD-2002, Taipei, May 6–8, Taiwan.C.
J. Han, J. Pei, Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the International Conference on Management of Data, Dallas, Texas, USA (ACM SIGMOD), 2000. p. 1–12.
Kramer, S., De Raedt, L., Helma, C. Molecular Feature Mining in HIV Data. ACM SIGKDD (KDD-01), 2001.
J. Li, G. Dong and K. Ramamohanarao. Making use of the most expressive jumping emerging patterns for classification. In Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, 2000. p. 220–232.
J. Li and L. Wong. Emerging patterns and Gene Expression Data. In proceedings of 12th Workshop on Genome Informatics. Japan. December 2001, pages 3–13.
Mannila, H. and Toivonen, H. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery 1(3), 1997.
T. M. Mitchell. Generalization as search. Artificial Intelligence, 18, 203–226, 1982.
Pasquier, N., Bastide, R., Taouil, R. and Lakhal, L. Efficient Mining of Association Rules using Closed Itemset Lattices. Information Systems 24(1), 1999.
J. R. Quinlan: C4.5 Programs for Machine Learning. Morgan Kaufmann, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bailey, J., Manoukian, T., Ramamohanarao, K. (2002). Fast Algorithms for Mining Emerging Patterns. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_4
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive