Mining Contrast Subspaces

Duan, Lei; Tang, Guanting; Pei, Jian; Bailey, James; Dong, Guozhu; Campbell, Akiko; Tang, Changjie

doi:10.1007/978-3-319-06608-0_21

Lei Duan^23,28,
Guanting Tang²⁴,
Jian Pei²⁴,
James Bailey²⁵,
Guozhu Dong²⁶,
Akiko Campbell²⁷ &
…
Changjie Tang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8443))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3124 Accesses
6 Citations

Abstract

In this paper, we tackle a novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C ₊ and C ₋ and a query object o, we want to find top-k subspaces S that maximize the ratio of likelihood of o in C ₊ against that in C ₋. We demonstrate that this problem has important applications, and at the same time, is very challenging. It even does not allow polynomial time approximation. We present CSMiner, a mining method with various pruning techniques. CSMiner is substantially faster than the baseline method. Our experimental results on real data sets verify the effectiveness and efficiency of our method.

This work was supported in part by an NSERC Discovery grant, a BCIC NRAS Team Project, NSFC 61103042, SRFDP 20100181120029, and SKLSE2012-09-32. Work by Lei Duan and Guozhu Dong at Simon Fraser University was supported by an Ebco/Eppich visiting professorship. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jeffreys, H.: The Theory of Probability, 3rd edn., Oxford (1961)
Google Scholar
Dong, G., Bailey, J. (eds.): Contrast Data Mining: Concepts, Algorithms, and Applications. CRC Press (2013)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD, pp. 43–52 (1999)
Google Scholar
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Article MATH Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Chapter Google Scholar
Novak, P.K., Lavrac, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)
MATH Google Scholar
Böhm, K., Keller, F., Müller, E., Nguyen, H.V., Vreeken, J.: CMI: An information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013)
Google Scholar
Keller, F., Müller, E., Böhm, K.: HiCS: High contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048 (2012)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)
Google Scholar
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: KDD, pp. 444–452 (2008)
Google Scholar
He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: Frequent pattern based outlier detection. Computer Science and Information Systems 2(1), 103–118 (2005)
Article Google Scholar
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. ACM Sigmod Record 30, 37–46 (2001)
Article Google Scholar
Hua, M., Pei, J., Fu, A.W., Lin, X., Leung, H.F.: Top-k typicality queries and efficient query answering methods on large databases. The VLDB Journal 18(3), 809–835 (2009)
Article Google Scholar
Breiman, L., Meisel, W., Purcell, E.: Variable kernel estimates of multivariate densities. Technometrics 19(2), 135–144 (1977)
Article MATH Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC, London (1986)
Book MATH Google Scholar
Wang, L., Zhao, H., Dong, G., Li, J.: On the complexity of finding emerging patterns. Theor. Comput. Sci. 335(1), 15–27 (2005)
Article MATH MathSciNet Google Scholar
Rymon, R.: Search through systematic set enumeration. In: Proc. of the 3rd Int’l Conf. on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)
Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Sichuan University, China
Lei Duan & Changjie Tang
School of Computing Science, Simon Fraser University, Canada
Guanting Tang & Jian Pei
Dept. of Computing and Information Systems, University of Melbourne, Australia
James Bailey
Dept. of Computer Sci & Engr, Wright State University, USA
Guozhu Dong
Pacific Blue Cross, Canada
Akiko Campbell
State Key Laboratory of Software Engineering, Wuhan University, China
Lei Duan

Authors

Lei Duan
View author publications
You can also search for this author in PubMed Google Scholar
Guanting Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
James Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Guozhu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Akiko Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Changjie Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, L. et al. (2014). Mining Contrast Subspaces. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-06608-0_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06607-3
Online ISBN: 978-3-319-06608-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics