Abstract
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.
- 1 Aming, A., Agrawal R., Raghavan R: "A Linear Method for Deviation Detection in Large Databases", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, p. 164-169.Google Scholar
- 2 Ankerst M., Breunig M.M., Kriegel H.-E, Sander J.: "OPTICS: Ordering Points To Identify the Clustering Structure", Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA, 1999. Google ScholarDigital Library
- 3 Agrawal R., Gehrke J., Gunopulos D., Raghavan E: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94-105. Google ScholarDigital Library
- 4 Berchthold S., Keim D. A., Kriegel H.-E: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarDigital Library
- 5 Barnett V., Lewis T.: "Outliers in statistical data", John Wiley, 1994.Google Scholar
- 6 DuMouchel W., Schonlau M.: "A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 189-193.Google Scholar
- 7 Ester M., Kriegel H.-E, Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google Scholar
- 8 Fawcett T., Provost F.: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp. 291-316. Google ScholarDigital Library
- 9 Fayyad U., Piatetsky-Shapiro G., Smyth R: "Knowledge Discovery and Data Mining: Towards a Unifying Framework", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, pp. 82-88.Google Scholar
- 10 Hawkins, D.: "Identification of Outliers", Chapman and Hall, London, 1980.Google ScholarCross Ref
- 11 Hinneburg A., Keim D.A.: "An Efficient Approach to Clustering in Large Multimedia Databases with Noise", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York City, NY, 1998,pp. 58-65.Google Scholar
- 12 Johnson T., Kwok I., Ng R.: "Fast Computation of 2- Dimensional Depth Contours", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 224-228.Google Scholar
- 13 Knott E.M., Ng R.T.: "Algorithms for Mining Distance- Based Outliers in Large Datasets", Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 392-403. Google ScholarDigital Library
- 14 Knott E. M., Ng R.T.: "Finding Intensional Knowledge of Distance-based Outliers", Proc. 25th Int. Conf. on Very Large Data Bases, Edinburgh, Scotland, 1999, pp. 211-222. Google ScholarDigital Library
- 15 Ng R. T., Hart J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarDigital Library
- 16 Preparata E, Shamos M.: "Computational Geometry: an Introduction", Springer, 1988. Google ScholarDigital Library
- 17 Ramaswamy S., Rastogi R., Kyuseok S.: "Efficient Algorithms for Mining Outliers from Large Data Sets", Proc. ACM SIDMOD Int. Conf. on Management of Data, 2000. Google ScholarDigital Library
- 18 Ruts I., Rousseeuw E: "Computing Depth Contours of Bivariate Point Clouds, Journal of Computational Statistics and Data Analysis, 23, 1996, pp. 153-168. Google ScholarDigital Library
- 19 Sheikholeslami G., Chatterjee S., Zhang A.: "WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 428-439. Google ScholarDigital Library
- 20 Tukey J. W.: "Exploratory Data Analysis", Addison-Wesley, 1977.Google Scholar
- 21 Weber R., Schek Hans-L, Blott S.: "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 194-205. Google ScholarDigital Library
- 22 Wang W., Yang J., Muntz R.: "STING: A Statistical Information Grid Approach to Spatial Data Mining", Proc. 23th Int. Conf. on Very Large Data Bases, Athens, Greece, Morgan Kaufmann Publishers, San Francisco, CA, 1997, pp. 186-195. Google ScholarDigital Library
- 23 Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Databases", Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103-114. Google ScholarDigital Library
Index Terms
- LOF: identifying density-based local outliers
Recommendations
LOF: identifying density-based local outliers
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of dataFor many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary ...
Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm
Rough SetsAbstractDetecting outliers in data is essential in various fields, such as finance, healthcare, and many other domains with anomalies. Among well-known outlier detection algorithms, Local Outlier Factor (LOF) is widely used for identifying unusual data ...
LDBOD: A novel local distribution based outlier detector
As an important research direction in KDD field, outlier detection has been drawing much attention from different communities. In this paper, two novel algorithms LDBOD and LDBOD+ for outlier detection are proposed. Similar to LOF, they also aim to find ...
Comments