Paper Titles

Static Friction Research of LIGA-Microstructure: Comparison between Theory and Experiments
p.1109

Application of Distributed Hydrological Model Based on Multivariant Checking for Monitoring Regional ET
p.1115

Research and Experiment on Nonlinear Correction Algorithm of Metal Tube Rotameter
p.1123

Negative Temperature Coefficient of PVC Filled with Carbon Black
p.1128

Research on Parallel DBSCAN Algorithm Design Based on MapReduce
p.1133

Research of Electrochemistry Measuring System Based on Saliva Glucose
p.1139

Grid-Based Corner Detection of the Microscopic Camera Calibration
p.1145

The Perceptual Study of the Tolerance of Spectral Images Based on Bootstrap Analysis
p.1151

High-Speed LMS Algorithm’s Design and Implementation Based on FPGA
p.1157

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 301-303Research on Parallel DBSCAN Algorithm Design Based...

Research on Parallel DBSCAN Algorithm Design Based on MapReduce

Article Preview

Abstract:

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, more researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel DBSCAN clustering algorithm based on Hadoop, which is a simple yet powerful parallel programming platform. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

You might also be interested in these eBooks

Advanced Measurement and Test

Info:

Periodical:

Advanced Materials Research (Volumes 301-303)

Pages:

1133-1138

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.301-303.1133

Citation:

Cite this paper

Online since:

July 2011

Authors:

Yan Xiang Fu, Wei Zhong Zhao, Hui Fang Ma

Keywords:

Data Mining (DM), DBSCAN, Hadoop, MapReduce, Parallel Clustering

Export:

RIS, BibTeX

Price:

Permissions:

Request Permissions

[1] J. MacQueen: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. 5th Berkeley Symp. Math. Statist, Prob., 1: 281-297(1967).

[2] R. Ng, J. Han: Efficient and Effective Clustering Method for Spatial Data Mining. In: Proc. 1994 Int. Conf. Very large Data Bases, pp.144-155. Santiago, Chile (1994).

[3] S. Guha, R. Rastogi, K. Shim: CURE: An Efficient Clustering Algorithm for Large Databases. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data, pp.73-84. Seattle, WA (1998).

DOI: 10.1145/276305.276312

[4] G. Karypis, E. H. Han, V. Kumar: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. Computer, 32: 68-75 (1999).

DOI: 10.1109/2.781637

[5] M. Ester, H. Kriege, J. Sander, X. Xu: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining, pp.226-231. Portland, OR (1996).

[6] W. Wang, J. Yang, R. Muntz: STING: A Statistical Information Grid Approach to Spatial Data Mining. In: Proc. 1997 Int. Conf. Very Large Data Bases, pp.186-195. Athens, Greece (1997).

[7] R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. 1998 ACM SIGMOD Int. Conf. Management of Data, pp.94-105. Seattle, WA (1998).

DOI: 10.1145/276305.276314

[8] E. Rasmussen, P. Willett, P.: Efficiency of Hierarchical Agglomerative Clustering Using the ICL Distributed Array Processor. Journal of Documentation, 45(1): 1-24. (1989).

DOI: 10.1108/eb026836

[9] X. Li, Z. Fang: Parallel Clustering Algorithms. Parallel Computing, 11: 275-290. (1989).

DOI: 10.1016/0167-8191(89)90036-7

[10] C. Olson: Parallel Algorithms for Hierarchical Clustering. Parallel Computing, 21(8): 1313-1325. (1995).

DOI: 10.1016/0167-8191(95)00017-i

[11] J. Dean, S. Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. In: Proc. of Operating Systems Design and Implementation, pp.137-150. San Francisco, CA (2004).

[12] Hadoop: Open source implementation of MapReduce. http: /lucene. apache. org/hadoop.

[13] S. Ghemawat, H. Gobioff, S. Leung: The Google File System. Symposium on Operating Systems Principles, pp.29-43 (2003).

DOI: 10.1145/945445.945450

[14] D. Borthakur. The Hadoop Distributed File System: Architecture and Design. (2007).

[15] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, C. Kozyrakis: Evaluating MapReduce for Multi-core and Multiprocessor Systems. In: Proc. of 13th Int. Symposium on High-Performance Computer Architecture (HPCA). Phoenix, AZ (2007).

DOI: 10.1109/hpca.2007.346181

[16] R. Lammel.: Google's MapReduce Programming Model - Revisited. Science of Computer Programming, 70: 1-30 (2008).

[17] X. Xu, J. Jager, H. Kriegel: A Fast Parallel Clustering Algorithm for Large Spatial Databases. Data Mining and Knowledge Discovery, 3: 263-290 (1999).

DOI: 10.1007/0-306-47011-x_3