ABSTRACT
Eigensolvers are important tools for analyzing and mining useful information from scale-free graphs. Such graphs are used in many applications and can be extremely large. Unfortunately, existing parallel eigensolvers do not scale well for these graphs due to the high communication overhead in the parallel matrix-vector multiplication (MatVec). We develop a MatVec algorithm based on 2D edge partitioning that significantly reduces the communication costs and embed it into a popular eigensolver library. We demonstrate that the enhanced eigensolver can attain two orders of magnitude performance improvement compared to the original on a state-of-art massively parallel machine. We illustrate the performance of the embedded MatVec by computing eigenvalues of a scale-free graph with 300 million vertices and 5 billion edges, the largest scale-free graph analyzed by any in-memory parallel eigensolver, to the best of our knowledge.
- A. Abou-rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In Proceedings, IEEE International Parallel & Distributed Processing Symposium (IPDPS), pages 16--575, 2006. Google ScholarDigital Library
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '06, pages 44--54, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- C. G. Baker, U. L. Hetmaniuk, R. B. Lehoucq, and H. K. Thornquist. Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Softw., 36:13:1--13:23, July 2009. Google ScholarDigital Library
- A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509, 1999.Google ScholarCross Ref
- S. T. Barnard. Pmrsb: parallel multilevel recursive spectral bisection. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 27, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- R. Bell, A. D. Malony, and S. Shende. Paraprof: A portable, extensible, and scalable tool for parallel performance profile analysis. In Euro-Par'03, pages 17--26, 2003.Google ScholarCross Ref
- M. J. Berger and S. H. Bokhari. A partitioning strategy for nonuniform problems on multiprocessors. IEEE Trans. Comput., 36(5):570--580, 1987. Google ScholarDigital Library
- P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press. Google ScholarDigital Library
- J. Bradley, D. de Jager, W. Knottenbelt, and A. Trifunović. Hypergraph partitioning for faster parallel pagerank computation. In M. Bravetti, L. Kloul, and G. Zavattaro, editors, Formal Techniques for Computer Systems and Business Processes, volume 3670 of Lecture Notes in Computer Science, pages 155--171. Springer Berlin/Heidelberg, 2005. Google ScholarDigital Library
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, and A. Tomkins. Graph structure in the web: Experiments and models. In 9th World Wide Web Conference, 2000. Google ScholarDigital Library
- H. Brunst, H.-C. Hoppe, W. E. Nagel, and M. Winkler. Performance optimization for large scale computing: the scalable VAMPIR approach. In ICCS '01: Proceedings of the International Conference on Computational Science-Part II, pages 751--760, 2001. Google ScholarDigital Library
- T. N. Bui and C. Jones. A heuristic for reducing fill-in in sparse matrix factorization. In PPSC, pages 445--452, 1993.Google Scholar
- U. Catalyurek and C. Aykanat. A fine-grain hypergraph model for 2d decomposition of sparse matrices. In Proceedings of the 15th International Parallel & Distributed Processing Symposium, IPDPS '01, pages 118--, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
- U. Catalyurek and C. Aykanat. A hypergraph-partitioning approach for coarse-grain decomposition. In Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), Supercomputing '01, pages 28--28, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- U. V. Çatalyürek and C. Aykanat. Hypergraph-partitioning based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. on Parallel and Distributed Systems, 10(7):673--693, 1999. Google ScholarDigital Library
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In In SDM, 2004.Google ScholarCross Ref
- J. Cho, H. Garcia-Molina, T. Haveliwala, W. Lam, A. Paepcke, S. Raghavan, and G. Wesley. Stanford webbase components and applications. ACM Trans. Internet Technol., 6:153--186, May 2006. Google ScholarDigital Library
- A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Phys. Rev. E, 70(6):066111, Dec. 2004.Google ScholarCross Ref
- R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. Tools with Artificial Intelligence, IEEE International Conference on, 0:0558, 1997. Google ScholarDigital Library
- J. Duch and A. Arenas. Community detection in complex networks using extremal optimization. Physical Review E, 72:027104, Jan. 2005.Google ScholarCross Ref
- S. Dutt. New faster kernighan-lin-type graph-partitioning algorithms. In ICCAD '93: Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design, pages 370--377, Los Alamitos, CA, USA, 1993. IEEE Computer Society Press. Google ScholarDigital Library
- C. Farhat and M. Lesoinne. Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics. Internat. J. Numer. Meth. Engrg, 36(5):745--764, 1993.Google ScholarCross Ref
- C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In 25 years of DAC: Papers on Twenty-five years of electronic design automation, pages 241--247, New York, NY, USA, 1988. ACM. Google ScholarDigital Library
- G. Fox et al. Solving Problems on Concurrent Processors. Prentice-Hall, 1988. Google ScholarDigital Library
- J. R. Gilbert and E. Zmijewski. A parallel graph partitioning algorithm for a message-passing multiprocessor. Int. J. Parallel Program., 16(6):427--449, 1987. Google ScholarDigital Library
- D. Gleich, L. Zhukov, and P. Berkhin. Fast parallel pagerank: A linear system approach. Technical report, Institute for Computation and Mathematical Enginneering, Stanford University, 2004.Google Scholar
- A. Grama, V. Kumar, and A. Sameh. Parallel matrix-vector product using approximate hierarchical methods. In In Proceedings of Supercomputing '95, 1995. Google ScholarDigital Library
- C. Groër, B. D. Sullivan, and S. Poole. A mathematical analysis of the R-MAT random graph generator. Networks, 2011. Google ScholarDigital Library
- B. Hendrickson and T. G. Kolda. Graph partitioning models for parallel computing. Parallel Computing, 26:1519--1534, 1999. Google ScholarDigital Library
- B. Hendrickson and R. Leland. The Chaco User's Guide, version 2.0. Technical Report SAND95--2344, Sandia National Laboratories, 1995.Google Scholar
- B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 28, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- B. Hendrickson, R. Leland, and S. Plimpton. An efficient parallel algorithm for matrix-vector multiplication. Int. Journal of High Speed Computing, 7(1):73--88, 1995.Google ScholarCross Ref
- V. Hernandez, J. E. Roman, and V. Vidal. Slepc: A scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw., 31(3):351--362, 2005. Google ScholarDigital Library
- Hyperion. https://hyperionproject.llnl.gov.Google Scholar
- IBM Blue Gene/P. www-03.ibm.com/systems/deepcomputing/solutions/bluegene.Google Scholar
- Y. Ji, X. Xu, and G. D. Stormo. A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics, 20(10):1603--1611, 2004. Google ScholarDigital Library
- M. Jones and P. Plassman. Computational results for parallel unstructured mesh computations. Technical Report UT-CS-94-248, Computer Science Department, University of Tennesse, 1994. Google ScholarDigital Library
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical Report 95--035, University of Minnesota, Dept. of Computer Science, 1995.Google Scholar
- G. Karypis and V. Kumar. MeTis: Unstrctured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0, 1995.Google Scholar
- G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Technical Report 95--064, University of Minnesota, Dept. of Computer Science, 1995.Google Scholar
- G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. In Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), page 35, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarDigital Library
- G. Karypis and V. Kumar. A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In PPSC, 1997.Google Scholar
- G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput., 48(1):71--95, 1998. Google ScholarDigital Library
- B. Kernighan and S. Lin. An efficient heuristics for partitioning graphs. Technical report, The Bell System Technical Journal, 1970.Google Scholar
- R. Kosala and H. Blockeel. Web mining research: a survey. SIGKDD Explor. Newsl., 2:1--15, June 2000. Google ScholarDigital Library
- R. Leland and B. Hendrickson. An emperical study of static load balancing algorithms. In Scalable High-Performance Comput. Conf., pages 682--685, 1994.Google Scholar
- J. G. Lewis and R. A. van de Geijn. Distributed memory matrix-vector multiplication and conjugate gradient algorithms. In IEEE, editor, Proceedings, Supercomputing '93: Portland, Oregon, November 15--19, 1993, pages 484--492, pub-IEEE:adr, 1993. IEEE Computer Society Press. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58:1019--1031, May 2007. Google ScholarDigital Library
- K. Maschhoff and D. Sorensen. P_ARPACK: An efficient portable large scale eigenvalue package for distributed memory parallel architectures. In J. Wasniewski, J. Dongarra, K. Madsen, and D. Olesen, editors, Applied Parallel Computing Industrial Computation and Optimization, volume 1184 of Lecture Notes in Computer Science, pages 478--486. Springer Berlin/Heidelberg, 1996. Google ScholarDigital Library
- E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. In ISMB (Supplement of Bioinformatics), pages 302--310, 2005. Google ScholarDigital Library
- M. E. J. Newman. Detecting community structure in networks. European Physical Journal B, 38:321--330, May 2004.Google ScholarCross Ref
- M. E. J. Newman. Fast algorithm for detecting community structure in networks. Phys. Rev. E, 69(6):066133, June 2004.Google ScholarCross Ref
- M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb. 2004.Google ScholarCross Ref
- F. Pellegrini. Software package and libraries for sequential and parallel graph partitioning, static mapping, and sparse matrix block ordering, and sequential mesh and hypergraph partitioning. http://www.labri.fr/perso/pelegrin/scotch/.Google Scholar
- A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), Supercomputing '99, 1999. Google ScholarDigital Library
- Portable, Extensible Toolkit for Scientific Computation. http://www.mcs.anl.gov/petsc/petsc-as.Google Scholar
- A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11(3):430--452, 1990. Google ScholarDigital Library
- Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2nd edition, 2003. Google ScholarDigital Library
- A. Schenker. Graph-theoretic techniques for web content mining. PhD thesis, Tampa, FL, USA, 2003. AAI3182715. Google ScholarDigital Library
- J. Scott. Social Network Analysis: A Handbook. SAGE Publications, London, UK, 1991.Google Scholar
- H. D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2:135--148, 1991.Google ScholarCross Ref
- A. Stathopoulos and J. R. McCombs. PRIMME: PReconditioned Iterative MultiMethod Eigensolver: Methods and software description. ACM Trans. Math. Software, 37(2):21:1--21:30, 2010. Google ScholarDigital Library
- The Graph500. http://www.graph500.org.Google Scholar
- B. Vastenhouw and R. H. Bisseling. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev., 47:67--95, January 2005. Google ScholarDigital Library
- S. Wasserman and K. Faust. Social network analysis: methods and applications. Cambridge University Press, 1994.Google ScholarCross Ref
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009. Google ScholarDigital Library
- A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and Ümit Çatalyürek. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In Proceedings of Supercomputing'05, Nov. 2005. Google ScholarDigital Library
- A. Yoo and K. Henderson. Parallel massive scale-free graph generators, 2010. http://arxiv.org/pdf/1003.3684v1.Google Scholar
- Z. Zohan, K. Mathur, S. Johnson, and T. Hughes. An efficient communication strategy for finite element methods on the connection machine cm-5 system. Technical Report TR-11-93, Parallel Computing Research Group, Center for Research in Computing Technology, Harvard University, 1993.Google Scholar
- Z. Zohan, K. Mathur, S. Johnson, and T. Hughes. Parallel implementation of recursive spectral bisection on the connection machine cm-5 system. Technical Report TR-07-94, Parallel Computing Research Group, Center for Research in Computing Technology, Harvard University, 1994.Google Scholar
Index Terms
- A scalable eigensolver for large scale-free graphs using 2D graph partitioning
Recommendations
Scalable matrix computations on large scale-free graphs using 2D graph partitioning
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisScalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two-dimensional ...
Circumference of 3-connected claw-free graphs and large Eulerian subgraphs of 3-edge-connected graphs
The circumference of a graph is the length of its longest cycles. Results of Jackson, and Jackson and Wormald, imply that the circumference of a 3-connected cubic n-vertex graph is @W(n^0^.^6^9^4), and the circumference of a 3-connected claw-free graph ...
Large Induced Forests in Triangle-Free Planar Graphs
AbstractGiven a planar graph G, what is the largest subset of vertices of G that induces a forest? Albertson and Berman [2] conjectured that every planar graph has an induced subgraph on at least half of the vertices that is a forest. For bipartite planar ...
Comments