A high-performance parallel algorithm for nonnegative matrix factorization

Authors:
Ramakrishnan Kannan

Georgia Tech

Georgia Tech
View Profile

,
Grey Ballard

Sandia National Laboratories

Sandia National Laboratories
View Profile

,
Haesun Park

Georgia Tech

Georgia Tech
View Profile

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingFebruary 2016Article No.: 9Pages 1–11https://doi.org/10.1145/2851141.2851152

Published:27 February 2016Publication History

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 1–11

ABSTRACT

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets.

We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.

References

G. Ballard, A. Druinsky, N. Knight, and O. Schwartz. Brief announcement: Hypergraph partitioning for parallel sparse matrix-matrix multiplication. In Proceedings of SPAA, pages 86--88, 2015. URL http://doi.acm.org/10.1145/2755573.2755613. Google ScholarDigital Library
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience, 19(13):1749--1783, 2007. URL http://dx.doi.org/10.1002/cpe.1206. Google ScholarDigital Library
A. Cichocki, R. Zdunek, A. H. Phan, and S.-i. Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. Wiley, 2009. Google ScholarDigital Library
J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger. Communication-optimal parallel recursive rectangular matrix multiplication. In Proceedings of IPDPS, pages 261--272, 2013. URL http://dx.doi.org/10.1109/IPDPS.2013.80. Google ScholarDigital Library
J. P. Fairbanks, R. Kannan, H. Park, and D. A. Bader. Behavioral clusters in dynamic graphs. Parallel Computing, 47:38--50, 2015. URL http://dx.doi.org/10.1016/j.parco.2015.03.002.Google ScholarDigital Library
C. Faloutsos, A. Beutel, E. P. Xing, E. E. Papalexakis, A. Kumar, and P. P. Talukdar. Flexi-FaCT: Scalable flexible factorization of coupled tensors on Hadoop. In Proceedings of the SDM, pages 109--117, 2014. URL http://epubs.siam.org/doi/abs/10.1137/1. 9781611973440.13.Google Scholar
R. Fujimoto, A. Guin, M. Hunter, H. Park, G. Kanitkar, R. Kannan, M. Milholen, S. Neal, and P. Pecher. A dynamic data driven application system for vehicle tracking. Procedia Computer Science, 29: 1203--1215, 2014. URL http://dx.doi.org/10.1016/j.procs.2014.05.108.Google ScholarCross Ref
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the KDD, pages 69--77. ACM, 2011. URL http://dx.doi.org/10.1145/2020408.2020426. Google ScholarDigital Library
D. Grove, J. Milthorpe, and O. Tardieu. Supporting array programming in X10. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY'14, pages 38:38--38:43, 2014. URL http://doi.acm.org/10.1145/2627373.2627380. Google ScholarDigital Library
N.-D. Ho, P. V. Dooren, and V. D. Blondel. Descent methods for nonnegative matrix factorization. CoRR, abs/0801.3199, 2008.Google Scholar
P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. JMLR, 5:1457--1469, 2004. URL www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf. Google ScholarDigital Library
O. Kaya and B. Uçar. Scalable sparse tensor decompositions in distributed memory systems. In Proceedings of SC, pages 77:1--77:11. ACM, 2015. URL http://doi.acm.org/10.1145/2807591.2807624. Google ScholarDigital Library
H. Kim and H. Park. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics, 23(12):1495--1502, 2007. URL http://dx.doi.org/10.1093/bioinformatics/btm134. Google ScholarDigital Library
J. Kim and H. Park. Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM Journal on Scientific Computing, 33(6):3261--3281, 2011. URL http://dx.doi.org/10.1137/110821172. Google ScholarDigital Library
J. Kim, Y. He, and H. Park. Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Journal of Global Optimization, 58(2):285--319, 2014. URL http://dx.doi.org/10.1007/s10898-013-0035-4. Google ScholarDigital Library
D. Kuang, C. Ding, and H. Park. Symmetric nonnegative matrix factorization for graph clustering. In Proceedings of SDM, pages 106--117, 2012. URL http://epubs.siam.org/doi/pdf/10.1137/1.9781611972825.10.Google ScholarCross Ref
D. Kuang, S. Yun, and H. Park. SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. Journal of Global Optimization, pages 1--30, 2013. URL http://dx.doi.org/10.1007/s10898-014-0247-2. Google ScholarDigital Library
R. Liao, Y. Zhang, J. Guan, and S. Zhou. CloudNMF: A MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics, proteomics & bioinformatics, 12(1): 48--51, 2014. URL http://dx.doi.org/10.1016/j.gpb.2013.06.001.Google Scholar
C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on MapReduce. In Proceedings of the WWW, pages 681--690. ACM, 2010. URL http://dx.doi.org/10.1145/1772690.1772760. Google ScholarDigital Library
E. Mejía-Roa, D. Tabas-Madrid, J. Setoain, C. García, F. Tirado, and A. Pascual-Montano. NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC bioinformatics, 16(1):43, 2015. URL http://dx.doi.org/10.1186/s12859-015-0485-4.Google ScholarCross Ref
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine Learning in Apache Spark, May 2015. URL http://arxiv.org/abs/1505.06807.Google Scholar
V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons. Text mining using nonnegative matrix factorizations. In Proceedings of SDM, 2004.Google Scholar
C. Sanderson. Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Technical report, NICTA, 2010. URL http://arma.sourceforge.net/armadillo_nicta_2010.pdf.Google Scholar
D. Seung and L. Lee. Algorithms for non-negative matrix factorization. NIPS, 13:556--562, 2001.Google Scholar
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, 2005. URL http://hpc.sagepub.com/content/19/1/49.abstract. Google ScholarDigital Library
Y.-X. Wang and Y.-J. Zhang. Nonnegative matrix factorization: A comprehensive review. TKDE, 25(6):1336--1353, June 2013. URL http://dx.doi.org/10.1109/TKDE.2012.51. Google ScholarDigital Library
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009. Google ScholarDigital Library
Z. Xianyi. Openblas, Last Accessed 03-Dec-2015. URL http://www.openblas.net.Google Scholar
J. Yin, L. Gao, and Z. Zhang. Scalable nonnegative matrix factorization with block-wise updates. In Machine Learning and Knowledge Discovery in Databases, volume 8726 of LNCS, pages 337--352, 2014. URL http://dx.doi.org/10.1007/978-3-662-44845-8_22.Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10. USENIX Association, 2010. URL http://dl.acm.org/citation.cfm?id=1863103.1863113. Google ScholarDigital Library

A high-performance parallel algorithm for nonnegative matrix factorization
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

A high-performance parallel algorithm for nonnegative matrix factorization
PPoPP '16

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in ...
Read More
Heuristics for exact nonnegative matrix factorization

The exact nonnegative matrix factorization (exact NMF) problem is the following: given an m-by-n nonnegative matrix X and a factorization rank r, find, if possible, an m-by-r nonnegative matrix W and an r-by-n nonnegative matrix H such that $$X = WH$$X=...
Read More
Symmetric nonnegative matrix factorization: A systematic review
Abstract
In recent years, symmetric non-negative matrix factorization (SNMF), a variant of non-negative matrix factorization (NMF), has emerged as a promising tool for data analysis. This paper mainly focuses on the theoretical idea, the basic ...
Highlights
- This paper reviews symmetric non-negative matrix factorization (SNMF).
- We ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK
ACM SIGPLAN Notices Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Editor:
Matthew Fluet
Issue’s Table of Contents
Copyright © 2016 ACM
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 February 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate230of1,014submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 1,661
  Total Downloads
- Downloads (Last 12 months)143
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A high-performance parallel algorithm for nonnegative matrix factorization

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

ABSTRACT

References

Cited By

Recommendations

A high-performance parallel algorithm for nonnegative matrix factorization

Heuristics for exact nonnegative matrix factorization

Symmetric nonnegative matrix factorization: A systematic review