research-article

Open Access

Theoretically-Efficient and Practical Parallel DBSCAN

Authors:
Yiqiu Wang

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Yan Gu

University of California, Riverside, Riverside, CA, USA

University of California, Riverside, Riverside, CA, USA
View Profile

,
Julian Shun

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataJune 2020Pages 2555–2571https://doi.org/10.1145/3318464.3380582

Published:31 May 2020Publication History

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 2555–2571

ABSTRACT

The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nłog n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with two-way hyper-threading show that our implementations outperform existing parallel implementations by up to several orders of magnitude, and achieve speedups of up to 33x over the best sequential algorithms.

Supplemental Material

3318464.3380582.mp4

mp4

99.4 MB

Download

References

Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, and Leonardo Rocha. 2013. G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering. Procedia Computer Science, Vol. 18 (2013), 369 -- 378.Google ScholarCross Ref
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering Points to Identify the Clustering Structure. In ACM International Conference on Management of Data (SIGMOD). 49--60.Google ScholarDigital Library
Antonio Cavalcante Araujo Neto, Ticiana Linhares Coelho da Silva, Victor Aguiar Evangelista de Farias, José Antonio F. Macêdo, and Javam de Castro Machado. 2015. G2P: A Partitioning Approach for Processing DBSCAN with MapReduce. In Web and Wireless Geographical Information Systems. 191--202.Google Scholar
Domenica Arlia and Massimo Coppola. 2001. Experiments in Parallel Clustering with DBSCAN. In European Conference on Parallel Processing (Euro-Par). 326--331.Google Scholar
Sunil Arya and David M. Mount. 2000. Approximate range searching. Computational Geometry, Vol. 17, 3 (2000), 135 -- 152.Google ScholarDigital Library
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, Vol. 18, 9 (Sept. 1975), 509--517.Google ScholarDigital Library
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally Deterministic Parallel Algorithms Can Be Fast. In ACM SIGPLAN Symposium on Proceedings of Principles and Practice of Parallel Programming (PPoPP). 181--192.Google Scholar
Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2010. Low-Depth Cache Oblivious Algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 189--199.Google Scholar
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. J. ACM, Vol. 46, 5 (Sept. 1999), 720--748.Google ScholarDigital Library
Christian Böhm, Robert Noll, Claudia Plant, and Bianca Wackersreuther. 2009. Density-based Clustering Using Graphics Processors. In ACM Conference on Information and Knowledge Management. 661--670.Google Scholar
B. Borah and D. K. Bhattacharyya. 2004. An improved sampling-based DBSCAN for large spatial databases. In International Conference on Intelligent Sensing and Information Processing. 92--96.Google Scholar
Prosenjit Bose, Anil Maheshwari, Pat Morin, Jason Morrison, Michiel Smid, and Jan Vahrenhold. 2007. Space-efficient geometric divide-and-conquer algorithms. Computational Geometry, Vol. 37, 3 (2007), 209 -- 227.Google ScholarDigital Library
S. Brecheisen, H. Kriegel, and M. Pfeifle. 2004. Efficient density-based clustering of complex objects. In IEEE International Conference on Data Mining (ICDM). 43--50.Google Scholar
Stefan Brecheisen, Hans-Peter Kriegel, and Martin Pfeifle. 2006. Parallel Density-Based Clustering of Complex Objects. In Advances in Knowledge Discovery and Data Mining (PAKDD). 179--188.Google Scholar
Richard P. Brent. 1974. The Parallel Evaluation of General Arithmetic Expressions. J. ACM, Vol. 21, 2 (April 1974), 201--206.Google ScholarDigital Library
Ricardo Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander. 2015. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Trans. Knowl. Discov. Data, Vol. 10, 1, Article 5 (July 2015), 5:1--5:51 pages.Google ScholarDigital Library
Chun-Chieh Chen and Ming-Syan Chen. 2015. HiClus: Highly Scalable Density-based Clustering with Heterogeneous Cloud. Procedia Computer Science, Vol. 53 (2015), 149 -- 157.Google ScholarCross Ref
Danny Z. Chen, Michiel Smid, and Bin Xu. 2005 a. Geometric Algorithms for Density-Based Data Clustering. International Journal of Computational Geometry & Applications, Vol. 15, 03 (2005), 239--260.Google ScholarCross Ref
Danny Z Chen, Michiel Smid, and Bin Xu. 2005 b. Geometric algorithms for density-based data clustering. International Journal of Computational Geometry & Applications, Vol. 15, 03 (2005), 239--260.Google ScholarCross Ref
Xiaoming Chen, Wanquan Liu, Huining Qiu, and Jianhuang Lai. 2011. APSCAN: A parameter free algorithm for clustering. Pattern Recognition Letters, Vol. 32, 7 (2011), 973 -- 986.Google ScholarDigital Library
Richard Cole. 1988. Parallel Merge Sort. SIAM J. Comput., Vol. 17, 4 (Aug. 1988), 770--785.Google ScholarDigital Library
Richard Cole, Philip N. Klein, and Robert E. Tarjan. 1996. Finding Minimum Spanning Forests in Logarithmic Time and Linear Work Using Random Sampling. In ACM Symposium on Parallel Algorithms and Architectures (SPAA). 243--250.Google Scholar
Massimo Coppola and Marco Vanneschi. 2002. High-performance Data Mining with Skeleton-based Structured Parallel Programming. Parallel Comput., Vol. 28, 5 (May 2002), 793--813.Google ScholarDigital Library
I. Cordova and T. Moh. 2015. DBSCAN on Resilient Distributed Datasets. In International Conference on High Performance Computing Simulation (HPCS). 531--540.Google Scholar
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3. ed.) .MIT Press.Google ScholarDigital Library
CriteoLabs. 2013. Terabyte Click Logs. http://labs.criteo.com/downloads/download-terabyte-click-logs/Google Scholar
B. Dai and I. Lin. 2012. Efficient Map/Reduce-Based DBSCAN Algorithm with Optimized Data Partition. In IEEE International Conference on Cloud Computing. 59--66.Google Scholar
Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. 2008. Computational Geometry: Algorithms and Applications .Springer-Verlag.Google ScholarCross Ref
Mark de Berg, Ade Gunawan, and Marcel Roeloffzen. 2017. Faster DB-scan and HDB-scan in Low-Dimensional Euclidean Spaces. In International Symposium on Algorithms and Computation (ISAAC). 25:1--25:13.Google Scholar
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Y. El-Sonbaty, M. A. Ismail, and M. Farouk. 2004. An efficient density based clustering algorithm for large databases. In IEEE International Conference on Tools with Artificial Intelligence. 673--677.Google Scholar
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In International Conference on Knowledge Discovery and Data Mining (KDD). 226--231.Google Scholar
Xiufen Fu, Yaguang Wang, Yanna Ge, Peiwen Chen, and Shaohua Teng. 2014. Research and Application of DBSCAN Algorithm Based on Hadoop Platform. In Pervasive Computing and the Networked World. 73--87.Google Scholar
Junhao Gan and Yufei Tao. 2017. On the Hardness and Approximation of Euclidean DBSCAN. ACM Trans. Database Syst., Vol. 42, 3 (2017), 14:1--14:45.Google ScholarDigital Library
Hillel Gazit. 1991. An Optimal Randomized Parallel Algorithm for Finding Connected Components in a Graph. SIAM J. Comput., Vol. 20, 6 (Dec. 1991), 1046--1067.Google ScholarDigital Library
J. Gil, Y. Matias, and U. Vishkin. 1991. Towards a theory of nearly constant time parallel algorithms. In IEEE Symposium on Foundations of Computer Science (FOCS). 698--710.Google Scholar
Markus Götz, Christian Bodenstein, and Morris Riedel. 2015. HPDBSCAN: Highly Parallel DBSCAN. In Workshop on Machine Learning in High-Performance Computing Environments. Article 2, 2:1--2:10 pages.Google Scholar
Yan Gu, Julian Shun, Yihan Sun, and Guy E. Blelloch. 2015. A Top-Down Parallel Semisort. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 24--34.Google Scholar
Ade Gunawan. 2013. A faster algorithm for DBSCAN. Master's thesis, Eindhoven University of Technology.Google Scholar
M. Haklay and P. Weber. 2008. OpenStreetMap: User-Generated Street Maps. IEEE Pervasive Computing, Vol. 7, 4 (Oct 2008), 12--18.Google ScholarDigital Library
Shay Halperin and Uri Zwick. 1994. An Optimal Randomized Logarithmic Time Connectivity Algorithm for the EREW PRAM (Extended Abstract). In ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1--10.Google ScholarDigital Library
Shay Halperin and Uri Zwick. 2001. Optimal Randomized EREW PRAM Algorithms for Finding Spanning Forests. Journal of Algorithms, Vol. 39, 1 (2001), 1 -- 46.Google ScholarDigital Library
D. Han, A. Agrawal, W. Liao, and A. Choudhary. 2016. A Novel Scalable DBSCAN Algorithm with Spark. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1393--1402.Google Scholar
Qing He, Hai Xia Gu, Qin Wei, and Xu Wang. 2017. A Novel DBSCAN Based on Binary Local Sensitive Hashing and Binary-KNN Representation. Adv. in MM, Vol. 2017 (2017), 3695323:1--3695323:9.Google Scholar
Yaobin He, Haoyu Tan, Wuman Luo, Shengzhong Feng, and Jianping Fan. 2014. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Frontiers of Computer Science, Vol. 8, 1 (01 Feb 2014), 83--99.Google Scholar
Xu Hu, Jun Huang, and Minghui Qiu. 2017. A Communication Efficient Parallel DBSCAN Algorithm Based on Parameter Server. In ACM on Conference on Information and Knowledge Management (CIKM). 2107--2110.Google ScholarDigital Library
Xiaojuan Hu, Lei Liu, Ningjia Qiu, Di Yang, and Meng Li. 2018. A MapReduce-based improvement algorithm for DBSCAN. Journal of Algorithms & Computational Technology, Vol. 12, 1 (2018), 53--61.Google ScholarCross Ref
Fang Huang, Qiang Zhu, Ji Zhou, Jian Tao, Xiaocheng Zhou, Du Jin, Xicheng Tan, and Lizhe Wang. 2017. Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform. Remote Sensing, Vol. 9, 12 (2017).Google Scholar
M. Huang and F. Bian. 2009. A Grid and Density Based Fast Spatial Clustering Algorithm. In International Conference on Artificial Intelligence and Computational Intelligence, Vol. 4. 260--263.Google Scholar
J. Jaja. 1992. Introduction to Parallel Algorithms .Addison-Wesley Professional.Google ScholarDigital Library
Jennifer Jang and Heinrich Jiang. 2019. DBSCAN+: Towards fast and scalable density clustering. In International Conference on Machine Learning (ICML), Vol. 97. 3019--3029.Google Scholar
Eshref Januzaj, Hans-Peter Kriegel, and Martin Pfeifle. 2004 a. DBDC: Density Based Distributed Clustering. In International Conference on Extending Database Technology (EDBT). 88--105.Google Scholar
Eshref Januzaj, Hans-Peter Kriegel, and Martin Pfeifle. 2004 b. Scalable Density-based Distributed Clustering. In European Conference on Principles and Practice of Knowledge Discovery in Databases. 231--244.Google Scholar
Hua Jiang, Jing Li, Shenghe Yi, Xiangyang Wang, and Xin Hu. 2011. A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Systems with Applications, Vol. 38, 8 (2011), 9373 -- 9381.Google ScholarDigital Library
Karin Kailing, Hans-Peter Kriegel, and Peer Krö ger. 2004. Density-Connected Subspace Clustering for High-Dimensional Data. In SIAM International Conference on Data Mining. 246--256.Google Scholar
Jeong-Hun Kim, Jong-Hyeok Choi, Kwan-Hee Yoo, and Aziz Nasridinov. 2019. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. The Journal of Supercomputing, Vol. 75, 1 (01 Jan 2019), 142--169.Google ScholarDigital Library
Younghoon Kim, Kyuseok Shim, Min-Soeng Kim, and June Sup Lee. 2014. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce. Information Systems, Vol. 42 (2014), 15 -- 35.Google ScholarDigital Library
Marzena Kryszkiewicz and Piotr Lasek. 2010. TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality. In Rough Sets and Current Trends in Computing. 60--69.Google Scholar
YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, and Sarah Loebman. 2010. Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster. In Scientific and Statistical Database Management. 132--150.Google Scholar
Charles E. Leiserson. 2010. The Cilk+ concurrency platform. J. Supercomputing, Vol. 51, 3 (2010).Google ScholarDigital Library
B. Liu. 2006. A Fast Density-Based Clustering Algorithm for Large Databases. In International Conference on Machine Learning and Cybernetics. 996--1000.Google ScholarCross Ref
Alessandro Lulli, Matteo Dell'Amico, Pietro Michiardi, and Laura Ricci. 2016. NG-DBSCAN: Scalable Density-based Clustering for Arbitrary Data. Proc. VLDB Endow., Vol. 10, 3 (Nov. 2016), 157--168.Google ScholarDigital Library
G. Luo, X. Luo, T. F. Gooch, L. Tian, and K. Qin. 2016. A Parallel DBSCAN Algorithm Based on Spark. In IEEE International Conferences on Big Data and Cloud Computing. 548--553.Google Scholar
K. Mahesh Kumar and A. Rama Mohan Reddy. 2016. A Fast DBSCAN Clustering Algorithm by Accelerating Neighbor Searching Using Groups Method. Pattern Recogn., Vol. 58, C (Oct. 2016), 39--48.Google Scholar
S. Mahran and K. Mahar. 2008. Using grid for accelerating density-based clustering. In IEEE International Conference on Computer and Information Technology. 35--40.Google Scholar
Md. Mostofa Ali Patwary, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukić, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Prabhat, and Pradeep Dubey. 2015. BD-CATS: Big Data Clustering at Trillion Particle Scale. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Article 6, 6:1--6:12 pages.Google Scholar
M. M. A. Patwary, D. Palsetia, A. Agrawal, W. k. Liao, F. Manne, and A. Choudhary. 2012. A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC). 62:1--62:11.Google Scholar
M. M. A. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, and A. Choudhary. 2013. Scalable parallel OPTICS data clustering using graph algorithmic techniques. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 49:1--49:12.Google Scholar
Md. Mostofa Ali Patwary, Nadathur Satish, Narayanan Sundaram, Fredrik Manne, Salman Habib, and Pradeep Dubey. 2014. PARDICLE: Parallel Approximate Density-based Clustering. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 560--571.Google Scholar
Seth Pettie and Vijaya Ramachandran. 2002. A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest. SIAM J. Comput., Vol. 31, 6 (2002), 1879--1895.Google ScholarDigital Library
John H. Reif and Sandeep Sen. 1992. Optimal randomized parallel algorithms for computational geometry. Algorithmica, Vol. 7, 1 (01 Jun 1992), 91--117.Google Scholar
Jörg Sander, Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu. 1998. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Mining and Knowledge Discovery, Vol. 2, 2 (01 Jun 1998), 169--194.Google Scholar
A. Sarma, P. Goyal, S. Kumari, A. Wani, J. S. Challa, S. Islam, and N. Goyal. 2019. μDBSCAN: An Exact Scalable DBSCAN Algorithm for Big Data Exploiting Spatial Locality. In IEEE International Conference on Cluster Computing (CLUSTER). 1--11.Google Scholar
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Trans. Database Syst., Vol. 42, 3, Article 19 (July 2017), 19:1--19:21 pages.Google ScholarDigital Library
J. Shun and G. E. Blelloch. 2014. Phase-Concurrent Hash Tables for Determinism. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 96--107.Google Scholar
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: the Problem Based Benchmark Suite. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 68--70.Google ScholarDigital Library
Hwanjun Song and Jae-Gil Lee. 2018. RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning. In ACM International Conference on Management of Data (SIGMOD). 1173--1187.Google ScholarDigital Library
Cheng-Fa Tsai and Chien-Tsung Wu. 2009. GF-DBSCAN: A New Efficient and Effective Data Clustering Technique for Large Databases. In WSEAS International Conference on Multimedia Systems & Signal Processing. 231--236.Google Scholar
O. Uncu, W. A. Gruver, D. B. Kotak, D. Sabaz, Z. Alibhai, and C. Ng. 2006. GRIDBSCAN: GRId Density-Based Spatial Clustering of Applications with Noise. In IEEE International Conference on Systems, Man and Cybernetics, Vol. 4. 2976--2981.Google Scholar
Uzi Vishkin. 2010. Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques.Google Scholar
P. Viswanath and V. Suresh Babu. 2009. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, Vol. 30, 16 (2009), 1477 -- 1488.Google ScholarDigital Library
P. Viswanath and R. Pinkesh. 2006. l-DBSCAN : A Fast Hybrid Density Based Clustering Method. In International Conference on Pattern Recognition (ICPR), Vol. 1. 912--915.Google Scholar
Yiqiu Wang, Yan Gu, and Julian Shun. 2019. Theoretically-Efficient and Practical Parallel DBSCAN. arxiv: cs.DS/1912.06255Google Scholar
Benjamin Welton, Evan Samanas, and Barton P. Miller. 2013. Mr. Scan: Extreme Scale Density-based Clustering Using a Tree-based Network of GPGPU Nodes. In ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Article 84, 84:1--84:11 pages.Google Scholar
Yi-Pu Wu, Jin-Jiang Guo, and Xue-Jie Zhang. 2007. A Linear DBSCAN Algorithm Based on LSH. In International Conference on Machine Learning and Cybernetics, Vol. 5. 2608--2614.Google ScholarCross Ref
Yan Xiang Fu, Wei Zhong Zhao, and Huifang Ma. 2011. Research on parallel DBSCAN algorithm design based on MapReduce. Advanced Materials Research, Vol. 301--303 (07 2011), 1133--1138.Google Scholar
Xiaowei Xu, Jochen Jager, and Hans-Peter Kriegel. 1999. A Fast Parallel Clustering Algorithm for Large Spatial Databases. Data Mining and Knowledge Discovery, Vol. 3, 3 (01 Sep 1999), 263--290.Google Scholar
Yanwei Yu, Jindong Zhao, Xiaodong Wang, Qin Wang, and Yonggang Zhang. 2015. Cludoop: An Efficient Distributed Density-based Clustering for Big Data Using Hadoop. Int. J. Distrib. Sen. Netw., Vol. 2015, Article 2 (Jan. 2015), 2:2--2:2 pages.Google Scholar
Yu Zheng, Like Liu, Longhao Wang, and Xing Xie. 2008. Learning Transportation Mode from Raw Gps Data for Geographic Applications on the Web. In International Conference on World Wide Web. 247--256.Google ScholarDigital Library

Index Terms

Theoretically-Efficient and Practical Parallel DBSCAN

Recommendations

Exact, Fast and Scalable Parallel DBSCAN for Commodity Platforms
ICDCN '17: Proceedings of the 18th International Conference on Distributed Computing and Networking

DBSCAN is one of the most popular density-based clustering algorithm capable of identifying arbitrary shaped clusters and noise. It is computationally expensive for large data sets. In this paper, we present a grid-based DBSCAN algorithm, GridDBSCAN, ...
Read More
AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining ...
Read More
HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers
Highlights
- A parallel scalable DBSCAN algorithm which outperforms other implementations.
- ...
Abstract
Dbscan is a density-based clustering algorithm which is well known for its ability to discover clusters of arbitrary shape as well as to distinguish noise. As it is computationally expensive for large datasets, research ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Results Reproduced / v1.1
Author Tags
DBScan
parallel algorithms
spatial clustering
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 1,651
  Total Downloads
- Downloads (Last 12 months)502
- Downloads (Last 6 weeks)57
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Theoretically-Efficient and Practical Parallel DBSCAN

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Exact, Fast and Scalable Parallel DBSCAN for Commodity Platforms

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers