A Big Graph Clustering Algorithm Based on MapReduce

Article Preview

Abstract:

Graph clustering is an important technology in graph analysis area, the measure of similarity between node of graph is the presise for graph clustering. SimRank algorithm is a kind of universal structure similarity calculation model which is proposed by Jeh and Widom. SimRank algorithm using iterative method to calculate the similarity between nodes, so the time and space complexity is very high. With the rapid increase of data, the ability of single machine can not meet the requirement of the large-scale data calculation. In this paper, the distributed SimRank algorithm was proposed based on Mapreduce and was used to measure the similarity of graph. Then the distributed AP clustering algorithm was designed for clustering analysis graph nodes. The experimental was executed to compare the clustering running time and speedup and results show that the method can efficiently complete graph nodes similarity measure and clustering the large graph effectively.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 1049-1050)

Pages:

1467-1470

Citation:

Online since:

October 2014

Export:

Price:

* - Corresponding Author

[1] H. C. Wang, J. Ma, Study of Efficient Clustering Algorithm on Large Graphs, Journal of Chinese Computer Systems, vol. 34, no. 6, pp.1417-1423, (2013).

Google Scholar

[2] F. Du, Y. G. Chen, X. Y. Du, Survey of RDF Query Processing Techniques, Journal of Software, vol. 24, no. 6, pp.1222-1241, (2013).

DOI: 10.3724/sp.j.1001.2013.04387

Google Scholar

[3] G. WU, Research on Key Technologies of RDF Graph Data Management, Tsinghua University press, (2008).

Google Scholar

[4] P. Zhao, J. Han and Y. Sun, P-rank: A comprehensive structural similarity measure over information networks, International Conference on Information and Knowledge Management, (2009).

DOI: 10.1145/1645953.1646025

Google Scholar

[5] G. Jeh and J. Widom, SimRank: a measure of structural-context similarity, " In Proceedings of the eighth ACM SIGKDD conference(KDD, 02), (2002).

DOI: 10.1145/775047.775126

Google Scholar

[6] Q. L. Han, H. W. Pan,S. B. Cai, et al., Nodes similarity measure method basedon sturcture-attribute balance graph, Computer Engineering and Applications, vol. 49, no. 1, pp.15-18, (2013).

Google Scholar

[7] H. Khosravi-Farsani , M. Nematbakhsh , G. Lausen., Structure/attribute computation of similarities between nodes of a RDF graph with application to linked data clustering, Intelligent Data Analysis, vol. 17, no. 2, pp.179-194, (2013).

DOI: 10.3233/ida-130573

Google Scholar

[8] X. F. Meng, X. Ci, Big Data Management: Concepts, Techniques and Challenges, Jouranl of Computer Research and Development, vol. 50, no. 1, pp.146-169, (2013).

Google Scholar

[9] B. Frey, D. Duck, Clustering by passing messages between data points, Science, vol. 315, no. 5814, pp.972-976, (2007).

DOI: 10.1126/science.1136800

Google Scholar