skip to main content
10.1145/3318464.3380582acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open Access
Results Reproduced / v1.1

Theoretically-Efficient and Practical Parallel DBSCAN

Authors Info & Claims
Published:31 May 2020Publication History

ABSTRACT

The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nłog n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with two-way hyper-threading show that our implementations outperform existing parallel implementations by up to several orders of magnitude, and achieve speedups of up to 33x over the best sequential algorithms.

Skip Supplemental Material Section

Supplemental Material

3318464.3380582.mp4

mp4

99.4 MB

References

  1. Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, and Leonardo Rocha. 2013. G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering. Procedia Computer Science, Vol. 18 (2013), 369 -- 378.Google ScholarGoogle ScholarCross RefCross Ref
  2. Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering Points to Identify the Clustering Structure. In ACM International Conference on Management of Data (SIGMOD). 49--60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Antonio Cavalcante Araujo Neto, Ticiana Linhares Coelho da Silva, Victor Aguiar Evangelista de Farias, José Antonio F. Macêdo, and Javam de Castro Machado. 2015. G2P: A Partitioning Approach for Processing DBSCAN with MapReduce. In Web and Wireless Geographical Information Systems. 191--202.Google ScholarGoogle Scholar
  4. Domenica Arlia and Massimo Coppola. 2001. Experiments in Parallel Clustering with DBSCAN. In European Conference on Parallel Processing (Euro-Par). 326--331.Google ScholarGoogle Scholar
  5. Sunil Arya and David M. Mount. 2000. Approximate range searching. Computational Geometry, Vol. 17, 3 (2000), 135 -- 152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, Vol. 18, 9 (Sept. 1975), 509--517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally Deterministic Parallel Algorithms Can Be Fast. In ACM SIGPLAN Symposium on Proceedings of Principles and Practice of Parallel Programming (PPoPP). 181--192.Google ScholarGoogle Scholar
  8. Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2010. Low-Depth Cache Oblivious Algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 189--199.Google ScholarGoogle Scholar
  9. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. J. ACM, Vol. 46, 5 (Sept. 1999), 720--748.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christian Böhm, Robert Noll, Claudia Plant, and Bianca Wackersreuther. 2009. Density-based Clustering Using Graphics Processors. In ACM Conference on Information and Knowledge Management. 661--670.Google ScholarGoogle Scholar
  11. B. Borah and D. K. Bhattacharyya. 2004. An improved sampling-based DBSCAN for large spatial databases. In International Conference on Intelligent Sensing and Information Processing. 92--96.Google ScholarGoogle Scholar
  12. Prosenjit Bose, Anil Maheshwari, Pat Morin, Jason Morrison, Michiel Smid, and Jan Vahrenhold. 2007. Space-efficient geometric divide-and-conquer algorithms. Computational Geometry, Vol. 37, 3 (2007), 209 -- 227.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Brecheisen, H. Kriegel, and M. Pfeifle. 2004. Efficient density-based clustering of complex objects. In IEEE International Conference on Data Mining (ICDM). 43--50.Google ScholarGoogle Scholar
  14. Stefan Brecheisen, Hans-Peter Kriegel, and Martin Pfeifle. 2006. Parallel Density-Based Clustering of Complex Objects. In Advances in Knowledge Discovery and Data Mining (PAKDD). 179--188.Google ScholarGoogle Scholar
  15. Richard P. Brent. 1974. The Parallel Evaluation of General Arithmetic Expressions. J. ACM, Vol. 21, 2 (April 1974), 201--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ricardo Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander. 2015. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Trans. Knowl. Discov. Data, Vol. 10, 1, Article 5 (July 2015), 5:1--5:51 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chun-Chieh Chen and Ming-Syan Chen. 2015. HiClus: Highly Scalable Density-based Clustering with Heterogeneous Cloud. Procedia Computer Science, Vol. 53 (2015), 149 -- 157.Google ScholarGoogle ScholarCross RefCross Ref
  18. Danny Z. Chen, Michiel Smid, and Bin Xu. 2005 a. Geometric Algorithms for Density-Based Data Clustering. International Journal of Computational Geometry & Applications, Vol. 15, 03 (2005), 239--260.Google ScholarGoogle ScholarCross RefCross Ref
  19. Danny Z Chen, Michiel Smid, and Bin Xu. 2005 b. Geometric algorithms for density-based data clustering. International Journal of Computational Geometry & Applications, Vol. 15, 03 (2005), 239--260.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xiaoming Chen, Wanquan Liu, Huining Qiu, and Jianhuang Lai. 2011. APSCAN: A parameter free algorithm for clustering. Pattern Recognition Letters, Vol. 32, 7 (2011), 973 -- 986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Richard Cole. 1988. Parallel Merge Sort. SIAM J. Comput., Vol. 17, 4 (Aug. 1988), 770--785.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Richard Cole, Philip N. Klein, and Robert E. Tarjan. 1996. Finding Minimum Spanning Forests in Logarithmic Time and Linear Work Using Random Sampling. In ACM Symposium on Parallel Algorithms and Architectures (SPAA). 243--250.Google ScholarGoogle Scholar
  23. Massimo Coppola and Marco Vanneschi. 2002. High-performance Data Mining with Skeleton-based Structured Parallel Programming. Parallel Comput., Vol. 28, 5 (May 2002), 793--813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Cordova and T. Moh. 2015. DBSCAN on Resilient Distributed Datasets. In International Conference on High Performance Computing Simulation (HPCS). 531--540.Google ScholarGoogle Scholar
  25. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3. ed.) .MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. CriteoLabs. 2013. Terabyte Click Logs. http://labs.criteo.com/downloads/download-terabyte-click-logs/Google ScholarGoogle Scholar
  27. B. Dai and I. Lin. 2012. Efficient Map/Reduce-Based DBSCAN Algorithm with Optimized Data Partition. In IEEE International Conference on Cloud Computing. 59--66.Google ScholarGoogle Scholar
  28. Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Computational Geometry: Algorithms and Applications .Springer-Verlag.Google ScholarGoogle ScholarCross RefCross Ref
  29. Mark de Berg, Ade Gunawan, and Marcel Roeloffzen. 2017. Faster DB-scan and HDB-scan in Low-Dimensional Euclidean Spaces. In International Symposium on Algorithms and Computation (ISAAC). 25:1--25:13.Google ScholarGoogle Scholar
  30. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  31. Y. El-Sonbaty, M. A. Ismail, and M. Farouk. 2004. An efficient density based clustering algorithm for large databases. In IEEE International Conference on Tools with Artificial Intelligence. 673--677.Google ScholarGoogle Scholar
  32. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In International Conference on Knowledge Discovery and Data Mining (KDD). 226--231.Google ScholarGoogle Scholar
  33. Xiufen Fu, Yaguang Wang, Yanna Ge, Peiwen Chen, and Shaohua Teng. 2014. Research and Application of DBSCAN Algorithm Based on Hadoop Platform. In Pervasive Computing and the Networked World. 73--87.Google ScholarGoogle Scholar
  34. Junhao Gan and Yufei Tao. 2017. On the Hardness and Approximation of Euclidean DBSCAN. ACM Trans. Database Syst., Vol. 42, 3 (2017), 14:1--14:45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hillel Gazit. 1991. An Optimal Randomized Parallel Algorithm for Finding Connected Components in a Graph. SIAM J. Comput., Vol. 20, 6 (Dec. 1991), 1046--1067.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Gil, Y. Matias, and U. Vishkin. 1991. Towards a theory of nearly constant time parallel algorithms. In IEEE Symposium on Foundations of Computer Science (FOCS). 698--710.Google ScholarGoogle Scholar
  37. Markus Götz, Christian Bodenstein, and Morris Riedel. 2015. HPDBSCAN: Highly Parallel DBSCAN. In Workshop on Machine Learning in High-Performance Computing Environments. Article 2, 2:1--2:10 pages.Google ScholarGoogle Scholar
  38. Yan Gu, Julian Shun, Yihan Sun, and Guy E. Blelloch. 2015. A Top-Down Parallel Semisort. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 24--34.Google ScholarGoogle Scholar
  39. Ade Gunawan. 2013. A faster algorithm for DBSCAN. Master's thesis, Eindhoven University of Technology.Google ScholarGoogle Scholar
  40. M. Haklay and P. Weber. 2008. OpenStreetMap: User-Generated Street Maps. IEEE Pervasive Computing, Vol. 7, 4 (Oct 2008), 12--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shay Halperin and Uri Zwick. 1994. An Optimal Randomized Logarithmic Time Connectivity Algorithm for the EREW PRAM (Extended Abstract). In ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Shay Halperin and Uri Zwick. 2001. Optimal Randomized EREW PRAM Algorithms for Finding Spanning Forests. Journal of Algorithms, Vol. 39, 1 (2001), 1 -- 46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Han, A. Agrawal, W. Liao, and A. Choudhary. 2016. A Novel Scalable DBSCAN Algorithm with Spark. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1393--1402.Google ScholarGoogle Scholar
  44. Qing He, Hai Xia Gu, Qin Wei, and Xu Wang. 2017. A Novel DBSCAN Based on Binary Local Sensitive Hashing and Binary-KNN Representation. Adv. in MM, Vol. 2017 (2017), 3695323:1--3695323:9.Google ScholarGoogle Scholar
  45. Yaobin He, Haoyu Tan, Wuman Luo, Shengzhong Feng, and Jianping Fan. 2014. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Frontiers of Computer Science, Vol. 8, 1 (01 Feb 2014), 83--99.Google ScholarGoogle Scholar
  46. Xu Hu, Jun Huang, and Minghui Qiu. 2017. A Communication Efficient Parallel DBSCAN Algorithm Based on Parameter Server. In ACM on Conference on Information and Knowledge Management (CIKM). 2107--2110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xiaojuan Hu, Lei Liu, Ningjia Qiu, Di Yang, and Meng Li. 2018. A MapReduce-based improvement algorithm for DBSCAN. Journal of Algorithms & Computational Technology, Vol. 12, 1 (2018), 53--61.Google ScholarGoogle ScholarCross RefCross Ref
  48. Fang Huang, Qiang Zhu, Ji Zhou, Jian Tao, Xiaocheng Zhou, Du Jin, Xicheng Tan, and Lizhe Wang. 2017. Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform. Remote Sensing, Vol. 9, 12 (2017).Google ScholarGoogle Scholar
  49. M. Huang and F. Bian. 2009. A Grid and Density Based Fast Spatial Clustering Algorithm. In International Conference on Artificial Intelligence and Computational Intelligence, Vol. 4. 260--263.Google ScholarGoogle Scholar
  50. J. Jaja. 1992. Introduction to Parallel Algorithms .Addison-Wesley Professional.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jennifer Jang and Heinrich Jiang. 2019. DBSCAN+: Towards fast and scalable density clustering. In International Conference on Machine Learning (ICML), Vol. 97. 3019--3029.Google ScholarGoogle Scholar
  52. Eshref Januzaj, Hans-Peter Kriegel, and Martin Pfeifle. 2004 a. DBDC: Density Based Distributed Clustering. In International Conference on Extending Database Technology (EDBT). 88--105.Google ScholarGoogle Scholar
  53. Eshref Januzaj, Hans-Peter Kriegel, and Martin Pfeifle. 2004 b. Scalable Density-based Distributed Clustering. In European Conference on Principles and Practice of Knowledge Discovery in Databases. 231--244.Google ScholarGoogle Scholar
  54. Hua Jiang, Jing Li, Shenghe Yi, Xiangyang Wang, and Xin Hu. 2011. A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Systems with Applications, Vol. 38, 8 (2011), 9373 -- 9381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Karin Kailing, Hans-Peter Kriegel, and Peer Krö ger. 2004. Density-Connected Subspace Clustering for High-Dimensional Data. In SIAM International Conference on Data Mining. 246--256.Google ScholarGoogle Scholar
  56. Jeong-Hun Kim, Jong-Hyeok Choi, Kwan-Hee Yoo, and Aziz Nasridinov. 2019. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. The Journal of Supercomputing, Vol. 75, 1 (01 Jan 2019), 142--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Younghoon Kim, Kyuseok Shim, Min-Soeng Kim, and June Sup Lee. 2014. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce. Information Systems, Vol. 42 (2014), 15 -- 35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Marzena Kryszkiewicz and Piotr Lasek. 2010. TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. In Rough Sets and Current Trends in Computing. 60--69.Google ScholarGoogle Scholar
  59. YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, and Sarah Loebman. 2010. Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster. In Scientific and Statistical Database Management. 132--150.Google ScholarGoogle Scholar
  60. Charles E. Leiserson. 2010. The Cilk+ concurrency platform. J. Supercomputing, Vol. 51, 3 (2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. B. Liu. 2006. A Fast Density-Based Clustering Algorithm for Large Databases. In International Conference on Machine Learning and Cybernetics. 996--1000.Google ScholarGoogle ScholarCross RefCross Ref
  62. Alessandro Lulli, Matteo Dell'Amico, Pietro Michiardi, and Laura Ricci. 2016. NG-DBSCAN: Scalable Density-based Clustering for Arbitrary Data. Proc. VLDB Endow., Vol. 10, 3 (Nov. 2016), 157--168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. G. Luo, X. Luo, T. F. Gooch, L. Tian, and K. Qin. 2016. A Parallel DBSCAN Algorithm Based on Spark. In IEEE International Conferences on Big Data and Cloud Computing. 548--553.Google ScholarGoogle Scholar
  64. K. Mahesh Kumar and A. Rama Mohan Reddy. 2016. A Fast DBSCAN Clustering Algorithm by Accelerating Neighbor Searching Using Groups Method. Pattern Recogn., Vol. 58, C (Oct. 2016), 39--48.Google ScholarGoogle Scholar
  65. S. Mahran and K. Mahar. 2008. Using grid for accelerating density-based clustering. In IEEE International Conference on Computer and Information Technology. 35--40.Google ScholarGoogle Scholar
  66. Md. Mostofa Ali Patwary, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukić, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Prabhat, and Pradeep Dubey. 2015. BD-CATS: Big Data Clustering at Trillion Particle Scale. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Article 6, 6:1--6:12 pages.Google ScholarGoogle Scholar
  67. M. M. A. Patwary, D. Palsetia, A. Agrawal, W. k. Liao, F. Manne, and A. Choudhary. 2012. A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC). 62:1--62:11.Google ScholarGoogle Scholar
  68. M. M. A. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, and A. Choudhary. 2013. Scalable parallel OPTICS data clustering using graph algorithmic techniques. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 49:1--49:12.Google ScholarGoogle Scholar
  69. Md. Mostofa Ali Patwary, Nadathur Satish, Narayanan Sundaram, Fredrik Manne, Salman Habib, and Pradeep Dubey. 2014. PARDICLE: Parallel Approximate Density-based Clustering. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 560--571.Google ScholarGoogle Scholar
  70. Seth Pettie and Vijaya Ramachandran. 2002. A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest. SIAM J. Comput., Vol. 31, 6 (2002), 1879--1895.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. John H. Reif and Sandeep Sen. 1992. Optimal randomized parallel algorithms for computational geometry. Algorithmica, Vol. 7, 1 (01 Jun 1992), 91--117.Google ScholarGoogle Scholar
  72. Jörg Sander, Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu. 1998. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Mining and Knowledge Discovery, Vol. 2, 2 (01 Jun 1998), 169--194.Google ScholarGoogle Scholar
  73. A. Sarma, P. Goyal, S. Kumari, A. Wani, J. S. Challa, S. Islam, and N. Goyal. 2019. μDBSCAN: An Exact Scalable DBSCAN Algorithm for Big Data Exploiting Spatial Locality. In IEEE International Conference on Cluster Computing (CLUSTER). 1--11.Google ScholarGoogle Scholar
  74. Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Trans. Database Syst., Vol. 42, 3, Article 19 (July 2017), 19:1--19:21 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. J. Shun and G. E. Blelloch. 2014. Phase-Concurrent Hash Tables for Determinism. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 96--107.Google ScholarGoogle Scholar
  76. Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: the Problem Based Benchmark Suite. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 68--70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Hwanjun Song and Jae-Gil Lee. 2018. RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning. In ACM International Conference on Management of Data (SIGMOD). 1173--1187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Cheng-Fa Tsai and Chien-Tsung Wu. 2009. GF-DBSCAN: A New Efficient and Effective Data Clustering Technique for Large Databases. In WSEAS International Conference on Multimedia Systems & Signal Processing. 231--236.Google ScholarGoogle Scholar
  79. O. Uncu, W. A. Gruver, D. B. Kotak, D. Sabaz, Z. Alibhai, and C. Ng. 2006. GRIDBSCAN: GRId Density-Based Spatial Clustering of Applications with Noise. In IEEE International Conference on Systems, Man and Cybernetics, Vol. 4. 2976--2981.Google ScholarGoogle Scholar
  80. Uzi Vishkin. 2010. Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques.Google ScholarGoogle Scholar
  81. P. Viswanath and V. Suresh Babu. 2009. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, Vol. 30, 16 (2009), 1477 -- 1488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. P. Viswanath and R. Pinkesh. 2006. l-DBSCAN : A Fast Hybrid Density Based Clustering Method. In International Conference on Pattern Recognition (ICPR), Vol. 1. 912--915.Google ScholarGoogle Scholar
  83. Yiqiu Wang, Yan Gu, and Julian Shun. 2019. Theoretically-Efficient and Practical Parallel DBSCAN. arxiv: cs.DS/1912.06255Google ScholarGoogle Scholar
  84. Benjamin Welton, Evan Samanas, and Barton P. Miller. 2013. Mr. Scan: Extreme Scale Density-based Clustering Using a Tree-based Network of GPGPU Nodes. In ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Article 84, 84:1--84:11 pages.Google ScholarGoogle Scholar
  85. Yi-Pu Wu, Jin-Jiang Guo, and Xue-Jie Zhang. 2007. A Linear DBSCAN Algorithm Based on LSH. In International Conference on Machine Learning and Cybernetics, Vol. 5. 2608--2614.Google ScholarGoogle ScholarCross RefCross Ref
  86. Yan Xiang Fu, Wei Zhong Zhao, and Huifang Ma. 2011. Research on parallel DBSCAN algorithm design based on MapReduce. Advanced Materials Research, Vol. 301--303 (07 2011), 1133--1138.Google ScholarGoogle Scholar
  87. Xiaowei Xu, Jochen Jager, and Hans-Peter Kriegel. 1999. A Fast Parallel Clustering Algorithm for Large Spatial Databases. Data Mining and Knowledge Discovery, Vol. 3, 3 (01 Sep 1999), 263--290.Google ScholarGoogle Scholar
  88. Yanwei Yu, Jindong Zhao, Xiaodong Wang, Qin Wang, and Yonggang Zhang. 2015. Cludoop: An Efficient Distributed Density-based Clustering for Big Data Using Hadoop. Int. J. Distrib. Sen. Netw., Vol. 2015, Article 2 (Jan. 2015), 2:2--2:2 pages.Google ScholarGoogle Scholar
  89. Yu Zheng, Like Liu, Longhao Wang, and Xing Xie. 2008. Learning Transportation Mode from Raw Gps Data for Geographic Applications on the Web. In International Conference on World Wide Web. 247--256.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Theoretically-Efficient and Practical Parallel DBSCAN

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
          June 2020
          2925 pages
          ISBN:9781450367356
          DOI:10.1145/3318464

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 May 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader