ABSTRACT
We present a high performance algorithm for multiplying sparse distributed polynomials using a multicore processor. Each core uses a heap of pointers to multiply parts of the polynomials using its local cache. Intermediate results are written to buffers in shared cache and the cores take turns combining them to form the result. A cooperative approach is used to balance the load and improve scalability, and the extra cache from each core produces a superlinear speedup in practice. We present benchmarks comparing our parallel routine to a sequential version and to the routines of other computer algebra systems.
- D. Bini, V. Pan. Improved parallel polynomial division. SIAM J. Comp. 22 (3) 617--626, 1993. Google ScholarDigital Library
- W. Bosma, J. Cannon, and C. Playoust. The Magma algebra system. I. The user language. J. Symb. Comp., 24 (3-4) 235--265, 1997 Google ScholarDigital Library
- R. Fateman. Comparing the speed of programs for sparse polynomial multiplication. ACM SIGSAM Bulletin, 37 (1) (2003) 4--15. Google ScholarDigital Library
- M. Gastineau, J. Laskar. Development of TRIP: Fast Sparse Multivariate Polynomial Multiplication Using Burst Tries. Proceedings of ICCS 2006, Springer LNCS 3992 (2006) 446--453. Google ScholarDigital Library
- G.-M. Greuel, G. Pfister, and H. Schönemann. Singular 3.1.0 - A computer algebra system for polynomial computations. http://www.singular.uni-kl.de (2009).Google Scholar
- T. Granlund. The GNU Multiple Precision Arithmetic Library, version 4.2.2. http://www.gmplib.org/ (2008).Google Scholar
- S. C. Johnson. Sparse polynomial arithmetic. ACM SIGSAM Bulletin, 8 (3) (1974) 63--71. Google ScholarDigital Library
- M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Operating Systems Review., 25 (1991) 63--74.Google Scholar
- X. Li and M. Moreno Maza. Multithreaded parallel implementation of arithmetic operations modulo a triangular set. Proc. of PASCO '07, ACM Press, 53--59, 2007. Google ScholarDigital Library
- M. Monagan, K. Geddes, K. Heal, G. Labahn, S. Vorkoetter, J. McCarron, P. DeMarco. Maple 13 Introductory Programming Guide Maplesoft, 2009.Google Scholar
- M. Monagan, R. Pearce. Polynomial Division Using Dynamic Arrays, Heaps, and Packed Exponent Vectors. Proc. of CASC 2007, Springer (2007) 295--315. Google ScholarDigital Library
- M. Monagan, R. Pearce. Sparse Polynomial Division Using a Heap. submitted to J. Symb. Comp., October 2008. Google ScholarDigital Library
- A. Norman, J. Fitch. CABAL: Polynomial and power series algebra on a parallel computer. Proc. of PASCO '97, ACM Press, pp. 196--203, 1997. Google ScholarDigital Library
- PARI/GP, version 2.3.4, Bordeaux, 2008, http://pari.math.u-bordeaux.fr/.Google Scholar
- D. Reed, R. Kanodia. Synchronization with eventcounts and sequencers. Comm. of the ACM, 22 (2) (1979) 115--123. Google ScholarDigital Library
- L. Rudolph, M. Slivkin-Allalouf, and E. Upfal. A simple load balancing scheme for task allocation in parallel machines. Proc. of the third annual ACM symposium on Parallel algorithms and architectures., (1991), 237--245. Google ScholarDigital Library
- P. Sweazey, A. Smith. A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus. Proc. of 13th Annual International Symposium on Computer Architecture, (1986), 414--423. Google ScholarDigital Library
- P. Wang. Parallel Polynomial Operations on SMPs. J. Symbolic. Comp., 21 397--410, 1996. Google ScholarDigital Library
- B. Wilkinson, M. Allen. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, 1999. Google ScholarDigital Library
- C. Xavier, S. Iyengar. Introduction to Parallel Algorithms Wiley, 1998. Section 10.5 has an FFT based univariate multiplication. Google ScholarDigital Library
- T. Yan. The Geobucket Data Structure for Polynomials. J. Symb. Comput. 25 (1998) 285--293. Google ScholarDigital Library
Index Terms
- Parallel sparse polynomial multiplication using heaps
Recommendations
Parallel sparse polynomial division using heaps
PASCO '10: Proceedings of the 4th International Workshop on Parallel and Symbolic ComputationWe present a parallel algorithm for exact division of sparse distributed polynomials on a multicore processor. This is a problem with significant data dependencies, so our solution requires fine-grained parallelism. Our algorithm manages to avoid ...
Parallel operations of sparse polynomials on multicores: I. multiplication and Poisson bracket
PASCO '10: Proceedings of the 4th International Workshop on Parallel and Symbolic ComputationThe multiplication of the sparse multivariate polynomials using the recursive representations is revisited to take advantage on the multicore processors. We take care of the memory management and load-balancing in order to obtain linear speedup. The ...
Parallel sparse multivariate polynomial division
PASCO '15: Proceedings of the 2015 International Workshop on Parallel Symbolic ComputationWe present a scalable algorithm for dividing two sparse multivariate polynomials represented in a distributed format on shared memory multicore computers. The scalability on the large number of cores is ensured by the lack of synchronizations during the ...
Comments