article

Free Access

LH*—a scalable, distributed data structure

Authors:
Witold Litwin

Hewlett-Packard Labs, Palo Alto, CA

Hewlett-Packard Labs, Palo Alto, CA
View Profile

,
Marie-Anna Neimat

Hewlett-Packard Labs, Palo Alto, CA

Hewlett-Packard Labs, Palo Alto, CA
View Profile

,
Donovan A. Schneider

Hewlett-Packard Labs, Palo Alto, CA

Hewlett-Packard Labs, Palo Alto, CA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 21 Issue 4pp 480–525https://doi.org/10.1145/236711.236713

Published:01 December 1996Publication History

ACM Transactions on Database Systems

Abstract

We present a scalable distributed data structure called LH*. LH* generalizes Linear Hashing (LH) to distributed RAM and disk files. An LH* file can be created from records with primary keys, or objects with OIDs, provided by any number of distributed and autonomous clients. It does not require a central directory, and grows gracefully, through splits of one bucket at a time, to virtually any number of servers. The number of messages per random insertion is one in general, and three in the worst case, regardless of the file size. The number of messages per key search is two in general, and four in the worst case. The file supports parallel operations, e.g., hash joins and scans. Performing a parallel operation on a file of M buckets costs at most 2M + 1 messages, and between 1 and O(log₂ Mrounds of messages.

We first describle the basic LH* scheme where a coordinator site manages abucket splits, and splits a bucket every time a collision occurs. We show that the average load factor of an LH* file is 65%–70% regardless of file size, and bucket capacity. We then enhance the scheme with load control, performed at no additional message cost. The average load factor then increases to 80–95%. These values are about that of LH, but the load factor for LH* varies more.

We nest define LH* schemes without a coordinator. We show that insert and search costs are the same as for the basic scheme. The splitting cost decreases on the average, but becomes more variable, as cascading splits are needed to prevent file overload. Next, we briefly describe two variants of splitting policy, using parallel splits and presplitting that should enhance performance for high-performance applications.

All together, we show that LH* files can efficiently scale to files that are orders of magnitude larger in size than single-site files. LH* files that reside in main memory may also be much faster than single-site disk files. Finally, LH* files can be more efficient than any distributed file with a centralized directory, or a static parallel or distributed hash file.

References

~ABEYSUNDARA,B.W.AND KAMAL, A. E. 1991. High-speed local area networks and their ~performance: A survey. ACM Comput. Surv. 23, 2 (June). Google ScholarDigital Library
~AMIN,M.B.,SCHNEIDER,D.A.,AND SINGH, V. 1994. An adaptive, load balancing parallel ~join algorithm. In the 6th International Conference on Management of Data (Bangalore, ~India, Dec.).Google Scholar
~DEVINE, R. 1993. Design and implementation of DDH: A distributed dynamic hashing ~algorithm. In Proceedings of the 4th International Conference on Foundations of Data ~Organization and Algorithms (FODO). Google ScholarDigital Library
~DEWITT,D.AND GRAY, J. 1992. Parallel database systems: The future of high performance ~database systems. Commun. ACM 35, 6, (June). Google ScholarDigital Library
~DEWITT, D., GERBER, R., GRAEFE, G., HEYTENS, M., KUMAR, K., AND MURALIKRISHNA, M. 1986. ~GAMMA: A high performance dataflow database machine. In Proceedings of VLDB, (Aug.). Google ScholarDigital Library
~ENBODY,R.AND DU, H. 1988. Dynamic hashing systems. ACM Comput. Surv. 20, 2 (June). Google ScholarDigital Library
~FAGIN, R., NIEVERGELT, J., PEPPENGER, N., AND STRONG, H. R. 1979. Extendible hashing:A ~fast access method for dynamic files. ACM Trans. Database Syst. 4, 3, 315-344. Google ScholarDigital Library
~GALLANT, J. 1992. FDDI routers and bridges create niche for memories. In EDN (April).Google Scholar
~KNUTH, D. E. 1973. The Art of Computer Programming. Addison-Wesley, Reading, MA. Google ScholarDigital Library
~KITSUREGAWA, M., TANAKA, H., AND MOTO-OKA, T. 1984. Architecture and performance of ~relational algebra machine GRACE. In Proceedings of the International Conference on ~Parallel Processing, (Chicago).Google Scholar
~KROLL,B.AND WIDMAYER, P. 1994. Distributing a search tree among a growing number of ~processors. In Proceedings of ACM-SIGMOD, (May). Google ScholarDigital Library
~LARSON, P. A. 1978. Dynamic hashing. BIT, 184-201.Google ScholarCross Ref
~LARSON, P. A. 1980. Linear hashing with partial expansions. In Proceedings of VLDB.Google Scholar
~LARSON, P. A. 1988. Dynamic hash tables. Commun. ACM 31, 4 (April) 446-57. Google ScholarDigital Library
~LITWIN, W. 1980. Linear hashing: A new tool for file and table addressing. In Proceedings of ~VLDB, (Montreal, Canada). Reprinted in Reading in Database Systems, M. Stonebraker Ed., ~Morgan Kaufmann, 2nd ed., 1995. ~ Google ScholarDigital Library
~LITWIN, W., NEIMAT, M.-A., AND SCHNEIDER, D. A. 1993. LH*:linear hashing for distributed ~files. In Proceedings of ACM-SIGMOD, (May). Google ScholarDigital Library
~LITWIN, W., NEIMAT, M.-A., AND SCHNEIDER, D. 1994. RP*: A family of order-preserving ~scalable distributed data structures. In Proceedings of VLDB, (Sept.). Google ScholarDigital Library
~LEVY,E.AND SILBERSCHATZ, A. 1990. Distributed file systems: Concepts and examples. ACM ~Comput. Surv. 22, 4 (Dec.). Google ScholarDigital Library
~NANCE, B. 1992. The fastest LAN alive. Byte, (June) 70-74. Google ScholarDigital Library
~RAMAMOHANARAO,K.AND SACKS-DAVIS, R. 1984. Recursive linear hashing. ACM Trans. ~Database Syst. 9, 3, 369-391. Google ScholarDigital Library
~SALZBERG, B. 1988. File Structures. Prentice Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
~SAMET, H. 1989. The Design and Analysis of Spatial Data Structures. Addison Wesley, ~Reading, MA. Google ScholarDigital Library
~SCHWETMAN, H. 1990. Csim reference manual (revision 14). Tech. Rep. ACT-ST-252-87, Rev. ~14, MCC, March.Google Scholar
~SEVERANCE, C., PRAMANIK, S., AND WOLBERG, P. 1990. Distributed linear hashing and ~parallel projection in main memory databases. In Proceedings of VLDB, Google ScholarDigital Library
~STONEBRAKER, M. 1986. The case for shared nothing. Database Eng. 9, 1.Google Scholar
~TANENBAUM, A. S. 1995. Distributed Operating Systems. Prentice Hall, Englewood Cliffs, ~NJ. Google ScholarDigital Library
~TERADATA CORP. 1988. DBC/1012 data base computer concepts and facilities. Teradata ~Document C02-001-05.Google Scholar
~VASKEVITCH, D. 1994. Database in crisis and transition: A technical agenda for the year ~2001. In Proceedings of ACM-SIGMOD (May). Google ScholarDigital Library
~VINGRALEK, R., BREITBART, Y., AND WEIKUM, G. 1994. Distributed file organization with ~scalable cost/performance. In Proceedings of ACM-SIGMOD (May). Google ScholarDigital Library

Index Terms

LH*—a scalable, distributed data structure

Recommendations

LH*_RS---a highly-available scalable distributed data structure

LH*_RS is a high-availability scalable distributed data structure (SDDS). An LH*_RS file is hash partitioned over the distributed RAM of a multicomputer, for example, a network of PCs, and supports the unavailability of any k ≥ 1 of its server nodes. The ...
Read More
LH: Linear Hashing for distributed files
SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data

LH* generalizes Linear Hashing to parallel or distributed RAM and disk files. An LH* file can be created from objects provided by any number of distributed and autonomous clients. It can grow gracefully, one bucket at a time, to virtually any number of ...
Read More
LH*s: a high-availability and high-security scalable distributed data structure
RIDE '97: Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications

LH*s is high availability variant of LH*, a Scalable Distributed Data Structure. An LH*s record is striped onto different server nodes. A parity segment allows one to reconstruct the record if a segment fails. The insert or key search time is about a ...
Read More

Reviews

Reviewer: Adam Drozdek

Linear hashing (LH) is a directoryless, dynamic hashing technique developed by Litwin. LH* is a generalization of LH that allows for hashing in a distributed environment. A file stored at different sites can be shared by different clients. The client's image of the file can differ from the file; in particular, the local pointer n ? ,<__?__Pub Caret> indicating the next bucket to be split, may differ from the actual pointer n . Thus, the address of a key is calculated by a client and then by a server. This may lead to forwarding the key to another server, after which the client's file image is adjusted. A search needs between two and four messages, and insertion needs between one and three messages, not counting messages needed to manage a split, which can be performed asynchronously. Extensive simulations indicate that, for a system using buckets of at least 250 keys, the average number of messages per search is 2.01 and the average number of messages per insert is below 1.05, that is, almost ideal. The number of addressing errors never exceeds log 2 (number of buckets), and less active clients—more prone to making addressing errors—make these errors only about 10 percent more often than others. Moreover, the average load factor is 65 to 70 percent, and after the load control is used, the factor increases to between 80 and 95 percent. The only centralized component of the system is a split coordinator that manages splits and merges of buckets, but the coordinator is not necessary. The authors discuss a variant of LH* without a split coordinator. In that version, the splits are accomplished by cascading them. Another variant concerns concurrent splits, in which a key component is a committed split pointer to indicate that a split is finished and thus can be committed. The paper is well and clearly written, and it includes helpful examples and diagrams.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 21, Issue 4
Dec. 1996
145 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/236711
Issue’s Table of Contents

Copyright © 1996 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 1996
Published in tods Volume 21, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithms
data structures
distributed access methods
extensible hashing
linear hashing
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 162
  Total Citations
  View Citations
- 1,744
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LH*—a scalable, distributed data structure

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

LH*_RS---a highly-available scalable distributed data structure

LH: Linear Hashing for distributed files

LH*s: a high-availability and high-security scalable distributed data structure

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

LH*—a scalable, distributed data structure

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

LH*RS---a highly-available scalable distributed data structure

LH: Linear Hashing for distributed files

LH*s: a high-availability and high-security scalable distributed data structure

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

LH*_RS---a highly-available scalable distributed data structure