Skip to main content

Smart Sampling: A Novel Unsupervised Boosting Approach for Outlier Detection

  • Conference paper
  • First Online:
Book cover AI 2016: Advances in Artificial Intelligence (AI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

Abstract

While various ensemble algorithms have been proposed for supervised ensembles or clustering ensembles, there are few ensemble based approaches for outlier detection. The main challenge in this context is the lack of knowledge about the accuracy of the outlier detectors. Hence, none of the proposed approaches focused on sequential boosting techniques. In this paper for the first time we propose a novel boosting algorithm for outlier detection called BSS, where we sequentially improve the accuracy of each ensemble detector in an unsupervised manner. We discuss the effectiveness of our approach in terms of bias-variance trade-off. Furthermore, an extended version of BSS (called DBSS) is proposed to introduce a novel source of diversity in outlier ensemble modeling. DBSS is used to analyze the effect of changing the input parameter of BSS on its detection accuracy. Our experimental results on both synthetic and real data sets demonstrate that our approaches outperform the two state-of-the-art outlier ensemble algorithms and benefit from bias reduction. In addition, our BSS approach is robust with respect to the changing input parameter. Since each detector in our proposed BSS/DBSS is only a subset of the whole dataset, our both techniques are well suited to application environments with limited memory processors (e.g., wireless sensor networks).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Barnett, V., Lewis, T.: Outliers in Statistical Data, vol. 3. Wiley, Hoboken (1994)

    MATH  Google Scholar 

  2. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)

    Article  Google Scholar 

  3. Aggarwal, C.C.: Outlier Analysis. Springer, Heidelberg (2013)

    Book  MATH  Google Scholar 

  4. Buhlmann, P.: Bagging, subagging and bragging for improving some prediction algorithms. Recent advances and trends in nonparametric statistics (2003)

    Google Scholar 

  5. Ghosh, J., Acharya, A.: Cluster ensembles. Wiley Interdisc. Rev. DMKD 1(4), 305–315 (2011)

    Google Scholar 

  6. Aggarwal, C.C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor. Newsl. 17(1), 24–47 (2015)

    Article  Google Scholar 

  7. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. Yang, X., Latecki, L.J., Pokrajac, D.: Outlier detection with globally optimal exemplar-based GMM. In: SDM, pp. 145–154 (2009)

    Google Scholar 

  9. Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)

    Google Scholar 

  10. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. SIGMOD 29, 93–104 (2000)

    Article  Google Scholar 

  11. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: CIKM, pp. 1649–1652 (2009)

    Google Scholar 

  12. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: fast outlier detection using the local correlation integral. In: ICDE, pp. 315–326 (2003)

    Google Scholar 

  13. Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: SIGKDD, pp. 877–885 (2012)

    Google Scholar 

  14. Aggarwal, C.C.: Outlier ensembles: position paper. SIGKDD Explor. Newsl. 14(2), 49–58 (2013)

    Article  Google Scholar 

  15. Zimek, A., Campello, R.J., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions a position paper. SIGKDD Explor. Newsl. 15(1), 11–22 (2014)

    Article  Google Scholar 

  16. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based anomaly detection. TKDD 6(1), 3 (2012)

    Article  Google Scholar 

  17. Zimek, A., Gaudet, M., Campello, R.J., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: SIGKDD, pp. 428–436 (2013)

    Google Scholar 

  18. Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. TKDE 15(5), 1170–1187 (2003)

    Google Scholar 

  19. Wu, M., Jermaine, C.: Outlier detection by sampling with accuracy guarantees. In: SIGKDD, pp. 767–772 (2006)

    Google Scholar 

  20. Sugiyama, M., Borgwardt, K.: Rapid distance-based outlier detection via sampling. In: NIPS, pp. 467–475 (2013)

    Google Scholar 

  21. Salehi, M., Leckie, C.A., Moshtaghi, M., Vaithianathan, T.: A relevance weighted ensemble model for anomaly detection in switching data streams. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8444, pp. 461–473. Springer, Heidelberg (2014). doi:10.1007/978-3-319-06605-9_38

    Chapter  Google Scholar 

  22. Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM, pp. 669–678 (2008)

    Google Scholar 

  23. Bouguessa, M.: Modeling outlier score distributions. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 713–725. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35527-1_59

    Chapter  Google Scholar 

  24. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: SIAM, pp. 13–24 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahsa Salehi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Salehi, M., Zhang, X., Bezdek, J.C., Leckie, C. (2016). Smart Sampling: A Novel Unsupervised Boosting Approach for Outlier Detection. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50127-7_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50126-0

  • Online ISBN: 978-3-319-50127-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics