Abstract
Many fundamental multi-processor coordination problems can be expressed as counting problems: Processes must cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention.
Motivated by observations on the behavior of sorting networks, we offer a new approach to solving such problems, by introducing counting networks, a new class of networks that can be used to count. We give two counting network constructions, one of depth log n(1 + log n)/2 using n log (1 + log n)/4 “gates,” and a second of depth log2 n using n log2 n/2 gates. These networks avoid the sequential bottlenecks inherent to earlier solutions and substantially lower the memory contention.
Finally, to show that counting networks are not merely mathematical creatures, we provide experimental evidence that they outperform conventional synchronization techniques under a variety of circumstances.
- ~AGARWAL, A, AND CHERIAN, M 1989. Adaptive backoff synchronization techniques. In Pro- ~c eedblgS of tt~e 16th Symposium ol~ Comp,terArchitecture (June). IEEE Computer Society Press, ~Los Alamitos, Calil., pp. 396-406. Google Scholar
- ~AGARWAL, A., CHAIKEN, D., D'SOUZA, G., JOHNSON, K., KRANZ, D., KUBIATOWICZ, J., KURIHARA, ~K., Elm, B.-H., M^A, G., NUSSBAUM, D., PARKIN, M., AND YOUNG, D. 1991. The MIT alewife ~machine: A large-scale distributed-memory multiprocessor. In Proceedings of Workshop on ~Scalable Shared Memoly Muhiprocessors. Kluwer Academic Publishers. (An extended version of ~this paper has been submitted for publication, and appears as MIT/LCS Memo TM-454, 1991.) Google Scholar
- ~AHARONSON, E., AND ATTIYA, H. 1992. Counting networks with arbitrary fan-out. In Proceed- ~ings of the 3rd Symposium on Discrete Algorithms (Orlando, Fla., Jan. 27-29). ACM-SIAM, New ~York, pp. 104-113. Google Scholar
- ~AJTAI, M., KOMLOS, J., AND SZEMERI2DI, E. 1983. An O(n log n) sorting network. In Proceed- ~trigs of the 15th ACM Symposium on the Theory of Computing. (Boston, Mass., Apr. 25-27). ~ACM, New York, pp. 1-9. Google Scholar
- ~ANDERSON, T.E. 1989. The performance implications of spin-waiting alternatives for shared- ~memory multiprocessors. Tech. Rep. 89-04-03. Univ. Washington, Seattle, Wash.Google Scholar
- ~ASPNES, J., HERLIHY, M. P., AND SHAVIT, N. 1991. Counting networks and multi-processor ~coordination. In Proceedings of the 23rd Annual Symposium on Theory of Compuung, New ~Orleans, La., May 6-8). ACM, New York, pp. 348-358. Google Scholar
- ~BATCttER, K.E. 1968. Sorting networks and their applications. In Proceedings of AF1PS Joint ~Computer Conference 32, 338-334.Google Scholar
- ~CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. MIT ~Press, Cambridge, Mass. Google Scholar
- ~DOWD, M., PERL, Y., RUDOLPH, L., AND SAKS, M. 1989. The periodic balanced sorting network. ~J. ACM 36, 4 (Oct), 738-757. Google Scholar
- ~ELLIS, C. S., AND OLSON, T.J. 1988. Algorithms for parallel memory allocation. J. Parallel ~Progr. 17, 4 (Aug.) 303 345. Google Scholar
- ~FELq'ON, E. W., L^MARC^, A., AND LADNER, R. 1993. Building counting networks from larger ~balancers. Tech. Rep. 93-(/4-09. Univ. Washington, Seattle, Wash.Google Scholar
- ~FREUDENTHAL, E., AND GOTTL1EB, A. 1991. Process coordination with fetch-and-increment. In ~Proceedings of the 4th International Conference on Architecture Support Jor Progl:amming Lan- ~gttages and Opovtting Systenzs, (Santa Clara, Calif., Apr.). Google Scholar
- ~GAWHCK, D. 1985. Processing "hot spots" in high performance systems. In Proceedings of ~COMPCON'~S5. IEEE, Los Alamitos, Cahf., pp. 249-251.Google Scholar
- ~GOODM~N, J., VERNON, M., AND WOEST, P. 1989. A set of efficient synchronization primitives ~for a large-scale shared-memory multiprocessor. In Proceedings o)~ the 3rd International Confer- ~ence on Architectural Suppot;t Jor Programming Languages and OI)eratmg Systems (Apr.). ACM, ~New York, pp. 64 77. Google Scholar
- ~GOTTLIEB, A., GRISHMAN, R., KRUSK^L, C. P., MCAULIFFE, K. P., RUDOLPH, L., AND SNIR, M. ~1984. The NYU ultracomputer Dcsignmg an mimd parallel computer. IEEE Trans. Conlpttt- ~ers C-32, 2 (Fcb.), 175-189.Google Scholar
- ~GOTTI.IEB, A., LUBACHEVSKS, B. D., AND RUDLOPH, L. 1983. Basic techniques for the efficient ~coordination of very large numbers of cooperating sequential processors. ACM Trans. Prog. ~Lang. &'st. 5, 2 (Apr.), 164-189. Google Scholar
- ~HARDAVEI,LAS, N., KARAKOS, D., AND MAYRONICOL^S, M. 1993. Notes on sorting and counting ~nctworks. In Proceedings of WDAG'93, to appear.Google Scholar
- ~HENSGEN, D., FINKEL, R., AND MANGER, U. 1988. Two algorithms for barrier synchronization. ~hit. J. Para. Prog. 17, 1, 1117. Google Scholar
- ~HERLIHY, M. P., LIM, B. H. AND SHAVIT, N. 1992. Low contention load balancing on largc-scale ~multiproccssors. In Proceedblgs of the 4th Annual ACM Symposium on Parallel Algorithms and ~,4rclntectures, (San Diego, Calif., June 29-July 11. ACM, New York, pp. 219 222. Google Scholar
- ~HERLIHY, M. P., SHAVIT, N., AND WAARTS, O. 1991. Low-contention linearizable counting. In ~Proceedings' of the 32th 1EEE Symposium on Foundations of Conlpztler Science (Oct.) IEEE, New ~York, pp. 526 535. Google Scholar
- ~KR^NZ, D., HALSTEAD, R., AND MOHR, E. 1989. MuI-T, A high-performance parallel L~sp. In ~Proceedtngs of the ACM SIGPLAN '89 Conference on Programming Language Design and ~hnplemeutatlon, (Portland, Ore., June 21-231. ACM, New York, pp. 81-90. Google Scholar
- ~KRUSKAL, C. P., RUDOLPH, L., AND SNIR, U. 1986. Efficient synchronization on multiproccssors ~with shared memory. In FroceedDtgx of the 5{h AC3{ ${OACT-$IOOPS S)'mpoMum on Frmciples ~oJ Dtstnbnted Computing, ACM. New York, pp. 218-228. Google Scholar
- ~KLUGERMAN, M. AND PLAXTON, C.G. 1992. Small-depth counting networks. In Proceedings of ~the 24th Annual Symposium on the Theory of Computing. (Victoria, B.C., Canada, May 4-6). ~ACM, New York, pp. 417 428. Google Scholar
- ~LYNCH, N. A., aND TU'fTLIS, M. R. 1987. Hierarchical correctness proofs for distributed ~algorithms. In Proceedbhqs of the 6th AC3,1 Symposutm on Principles of Dtstrtbuted Computing ~(Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 137-151. (Full version available ~as MIT Tech. Rep. MIT/LCS/TR-387.) Google Scholar
- ~MULLOR-CRUMMEY, J. M., ~ND SCOTT, M.L. 1990. Algorithms for scalable synchronization on ~shared-memory multlprocessors. Tech. Rep. 342. Umv. Rochester, Rochester, N.Y. (Apr.). Google Scholar
- ~RUDOLPIL L. 1983. Decentrahzed cache scheme for an MIMD parallel processor, In Proceed- ~m~~ of the Ilth Annual Computing Architecture Conference. pp. 340-347. Google Scholar
- ~MELLOR-CRUMMEY, J. M., AND SCOTI', g.L. 1991. Synchronization without contention. In ~Prc)ceedztzg, s' of the 4th lnternatzonal Conference on Architecture Support for Programming Lan- ~guages attd Operating Systems (Santa Clara, Calif., Apr.) ACM, New York, pp. 269 278. Google Scholar
- ~PELEG, D., AND UPFAL, E. 1986. The token distribution problem. In Proeeedmgs of the 27th ~IEEE Sympostum on Foundaaons of Computer Sctence (Oct.). IEEE, New York.Google Scholar
- ~PF1STER, G. H., ET AL. 1985. The IBM research parallel processor prototype (RP3): Introduc-tion and architecture. In Proceedings of the International Conference on Parallel Processing.Google Scholar
- ~PFIS'rER, G. H., AND NORTON, A. 1985. 'Hot spot' contention and combining m multistage ~mterconnectlon networks. IEEE ;trans. Comput. C-34, 11 (Nov.), 933 938.Google Scholar
- ~STONE, H. S. 1984. Database applications of the fetch-and-add instruction. IEEE Trans ~Comput. C-33, 7 (July), 604-612.Google Scholar
- ~ViSHKIN, U. 1984. A parallel-deslgn distributed-implementation (PDDI) general purpose com- ~puter. Theoret. Compul. Sci. 32, 157 172. Google Scholar
Index Terms
- Counting networks
Recommendations
A combinatorial treatment of balancing networks
Balancing networks, originally introduced by Aspnes et al. (Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348-358, May 1991), represent a new class of distributed, low-contention data structures suitable for solving many ...
Scalable concurrent counting
The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. We investigate the scalability of a variety of counting ...
New Self-Routing Permutation Networks
This contribution is focused on self-routing permutation networks capable of routing all n! permutations of its n inputs to its n outputs without internal conflict. First, a self-routing permutation network named BNB SRPN is described. The network ...
Comments