Abstract
We present a scalable distributed data structure called LH*. LH* generalizes Linear Hashing (LH) to distributed RAM and disk files. An LH* file can be created from records with primary keys, or objects with OIDs, provided by any number of distributed and autonomous clients. It does not require a central directory, and grows gracefully, through splits of one bucket at a time, to virtually any number of servers. The number of messages per random insertion is one in general, and three in the worst case, regardless of the file size. The number of messages per key search is two in general, and four in the worst case. The file supports parallel operations, e.g., hash joins and scans. Performing a parallel operation on a file of M buckets costs at most 2M + 1 messages, and between 1 and O(log2 Mrounds of messages.
We first describle the basic LH* scheme where a coordinator site manages abucket splits, and splits a bucket every time a collision occurs. We show that the average load factor of an LH* file is 65%–70% regardless of file size, and bucket capacity. We then enhance the scheme with load control, performed at no additional message cost. The average load factor then increases to 80–95%. These values are about that of LH, but the load factor for LH* varies more.
We nest define LH* schemes without a coordinator. We show that insert and search costs are the same as for the basic scheme. The splitting cost decreases on the average, but becomes more variable, as cascading splits are needed to prevent file overload. Next, we briefly describe two variants of splitting policy, using parallel splits and presplitting that should enhance performance for high-performance applications.
All together, we show that LH* files can efficiently scale to files that are orders of magnitude larger in size than single-site files. LH* files that reside in main memory may also be much faster than single-site disk files. Finally, LH* files can be more efficient than any distributed file with a centralized directory, or a static parallel or distributed hash file.
- ~ABEYSUNDARA,B.W.AND KAMAL, A. E. 1991. High-speed local area networks and their ~performance: A survey. ACM Comput. Surv. 23, 2 (June). Google ScholarDigital Library
- ~AMIN,M.B.,SCHNEIDER,D.A.,AND SINGH, V. 1994. An adaptive, load balancing parallel ~join algorithm. In the 6th International Conference on Management of Data (Bangalore, ~India, Dec.).Google Scholar
- ~DEVINE, R. 1993. Design and implementation of DDH: A distributed dynamic hashing ~algorithm. In Proceedings of the 4th International Conference on Foundations of Data ~Organization and Algorithms (FODO). Google ScholarDigital Library
- ~DEWITT,D.AND GRAY, J. 1992. Parallel database systems: The future of high performance ~database systems. Commun. ACM 35, 6, (June). Google ScholarDigital Library
- ~DEWITT, D., GERBER, R., GRAEFE, G., HEYTENS, M., KUMAR, K., AND MURALIKRISHNA, M. 1986. ~GAMMA: A high performance dataflow database machine. In Proceedings of VLDB, (Aug.). Google ScholarDigital Library
- ~ENBODY,R.AND DU, H. 1988. Dynamic hashing systems. ACM Comput. Surv. 20, 2 (June). Google ScholarDigital Library
- ~FAGIN, R., NIEVERGELT, J., PEPPENGER, N., AND STRONG, H. R. 1979. Extendible hashing:A ~fast access method for dynamic files. ACM Trans. Database Syst. 4, 3, 315-344. Google ScholarDigital Library
- ~GALLANT, J. 1992. FDDI routers and bridges create niche for memories. In EDN (April).Google Scholar
- ~KNUTH, D. E. 1973. The Art of Computer Programming. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- ~KITSUREGAWA, M., TANAKA, H., AND MOTO-OKA, T. 1984. Architecture and performance of ~relational algebra machine GRACE. In Proceedings of the International Conference on ~Parallel Processing, (Chicago).Google Scholar
- ~KROLL,B.AND WIDMAYER, P. 1994. Distributing a search tree among a growing number of ~processors. In Proceedings of ACM-SIGMOD, (May). Google ScholarDigital Library
- ~LARSON, P. A. 1978. Dynamic hashing. BIT, 184-201.Google ScholarCross Ref
- ~LARSON, P. A. 1980. Linear hashing with partial expansions. In Proceedings of VLDB.Google Scholar
- ~LARSON, P. A. 1988. Dynamic hash tables. Commun. ACM 31, 4 (April) 446-57. Google ScholarDigital Library
- ~LITWIN, W. 1980. Linear hashing: A new tool for file and table addressing. In Proceedings of ~VLDB, (Montreal, Canada). Reprinted in Reading in Database Systems, M. Stonebraker Ed., ~Morgan Kaufmann, 2nd ed., 1995. ~ Google ScholarDigital Library
- ~LITWIN, W., NEIMAT, M.-A., AND SCHNEIDER, D. A. 1993. LH*:linear hashing for distributed ~files. In Proceedings of ACM-SIGMOD, (May). Google ScholarDigital Library
- ~LITWIN, W., NEIMAT, M.-A., AND SCHNEIDER, D. 1994. RP*: A family of order-preserving ~scalable distributed data structures. In Proceedings of VLDB, (Sept.). Google ScholarDigital Library
- ~LEVY,E.AND SILBERSCHATZ, A. 1990. Distributed file systems: Concepts and examples. ACM ~Comput. Surv. 22, 4 (Dec.). Google ScholarDigital Library
- ~NANCE, B. 1992. The fastest LAN alive. Byte, (June) 70-74. Google ScholarDigital Library
- ~RAMAMOHANARAO,K.AND SACKS-DAVIS, R. 1984. Recursive linear hashing. ACM Trans. ~Database Syst. 9, 3, 369-391. Google ScholarDigital Library
- ~SALZBERG, B. 1988. File Structures. Prentice Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
- ~SAMET, H. 1989. The Design and Analysis of Spatial Data Structures. Addison Wesley, ~Reading, MA. Google ScholarDigital Library
- ~SCHWETMAN, H. 1990. Csim reference manual (revision 14). Tech. Rep. ACT-ST-252-87, Rev. ~14, MCC, March.Google Scholar
- ~SEVERANCE, C., PRAMANIK, S., AND WOLBERG, P. 1990. Distributed linear hashing and ~parallel projection in main memory databases. In Proceedings of VLDB, Google ScholarDigital Library
- ~STONEBRAKER, M. 1986. The case for shared nothing. Database Eng. 9, 1.Google Scholar
- ~TANENBAUM, A. S. 1995. Distributed Operating Systems. Prentice Hall, Englewood Cliffs, ~NJ. Google ScholarDigital Library
- ~TERADATA CORP. 1988. DBC/1012 data base computer concepts and facilities. Teradata ~Document C02-001-05.Google Scholar
- ~VASKEVITCH, D. 1994. Database in crisis and transition: A technical agenda for the year ~2001. In Proceedings of ACM-SIGMOD (May). Google ScholarDigital Library
- ~VINGRALEK, R., BREITBART, Y., AND WEIKUM, G. 1994. Distributed file organization with ~scalable cost/performance. In Proceedings of ACM-SIGMOD (May). Google ScholarDigital Library
Index Terms
- LH*—a scalable, distributed data structure
Recommendations
LH*RS---a highly-available scalable distributed data structure
LH*RS is a high-availability scalable distributed data structure (SDDS). An LH*RS file is hash partitioned over the distributed RAM of a multicomputer, for example, a network of PCs, and supports the unavailability of any k ≥ 1 of its server nodes. The ...
LH: Linear Hashing for distributed files
SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of dataLH* generalizes Linear Hashing to parallel or distributed RAM and disk files. An LH* file can be created from objects provided by any number of distributed and autonomous clients. It can grow gracefully, one bucket at a time, to virtually any number of ...
LH*s: a high-availability and high-security scalable distributed data structure
RIDE '97: Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale ApplicationsLH*s is high availability variant of LH*, a Scalable Distributed Data Structure. An LH*s record is striped onto different server nodes. A parity segment allows one to reconstruct the record if a segment fails. The insert or key search time is about a ...
Comments