Article

ViST: a dynamic index method for querying XML data by tree structures

Authors:
Haixun Wang

IBM Thomas J. Watson Research Center, Hawthorne, NY

IBM Thomas J. Watson Research Center, Hawthorne, NY
View Profile

,
Sanghyun Park

POSTECH, Pohang, Korea

POSTECH, Pohang, Korea
View Profile

,
Wei Fan

IBM Thomas J. Watson Research Center, Hawthorne, NY

IBM Thomas J. Watson Research Center, Hawthorne, NY
View Profile

,
Philip S. Yu

IBM Thomas J. Watson Research Center, Hawthorne, NY

IBM Thomas J. Watson Research Center, Hawthorne, NY
View Profile

SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of dataJune 2003Pages 110–121https://doi.org/10.1145/872757.872774

Published:09 June 2003Publication History

SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Pages 110–121

ABSTRACT

With the growing importance of XML in data exchange, much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B⁺ Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is effective, scalable, and efficient in supporting structural queries.

References

S. Abiteboul, P. Buneman, and D. Suciu. Data on the web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 1999.]] Google ScholarDigital Library
S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2001.]] Google ScholarDigital Library
S. Alstrup and T. Rauhe. Improved labeling scheme for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2002.]] Google ScholarDigital Library
D. Chamberlin, D. Florescu, J. Robie, J. Simon, and M. Stefanescu. XQuery: A query language for XML W3C working draft. Technical Report WD-xquery-20010215, World Wide Web Consortium, 2001.]]Google Scholar
D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In WebDB, May 2000.]] Google ScholarDigital Library
C. Chung, J. Min, and K. Shim. APEX: An adaptive path index for XML data. In ACM SIGMOD, June 2002.]] Google ScholarDigital Library
J. Clark and S. DeRose. XML path language (XPath) version 1.0 w3c recommendation. Technical Report REC-xpath-19991116, World Wide Web Consortium, 1999.]]Google Scholar
Edith Cohen, Haim Kaplan, and Tova Milo. Labeling dynamic XML trees. In PODS, pages 271--281, 2002.]] Google ScholarDigital Library
Brian F. Cooper, Neal Sample, Michael Franklin, Am (Bsli Hjaltason G., and Moshe Shadmon. A fast index for semistructured data. In VLDB, pages 341--350, September 2001.]] Google ScholarDigital Library
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML. In Proceedings of the 8th International World Wide Web Conference, pages 77--91, May 1999.]] Google ScholarDigital Library
R. Goldman and J. Widom. DataGuides: Enable query formulation and optimization in semistructured databases. In VLDB, pages 436--445, August 1997.]] Google ScholarDigital Library
Dan Gus eld. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.]]Google Scholar
H. Kaplan, T. Milo, and R.Shabo. A comparison of labeling schemes for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2002.]] Google ScholarDigital Library
R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In ACM SIGMOD, June 2002.]] Google ScholarDigital Library
Michael Ley. DBLP database web site. http://www.informatik.uni-trier.de/ ley/db, 2000.]]Google Scholar
Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In VLDB, pages 361--370, September 2001.]] Google ScholarDigital Library
E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262--272, April 1976.]] Google ScholarDigital Library
T. Milo and D. Suciu. Index structures for path expression. In Proceedings of 7th International Conference on Database Theory (ICDT), pages 277--295, January 1999.]] Google ScholarDigital Library
A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML benchmark project. Technical Report INS-R0103, Centrum voor Wiskunde en Informatica, 2001.]] Google ScholarDigital Library
Sleepycat Software, http://www.sleepycat.com. The Berkeley Database (Berkeley DB).]]Google Scholar
Haixun Wang, Chang shing Perng, Wei Fan, Sanghyun Park, and Philip S. Yu. Indexing weighted sequences in large databases. In ICDE, 2003.]]Google ScholarCross Ref
The internet movie database. http://www.imdb.com, 2000.]]Google Scholar
XMARK: The XML-benchmark project. http://monetdb.cwi.nl/ xml, 2002.]]Google Scholar

Index Terms

ViST: a dynamic index method for querying XML data by tree structures

Recommendations

Mapping of bibliographical standards into XML

The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Read More
XML: Visual QuickStart Guide
Read More
XML-based information mediation with MIX
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.¹ MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data
June 2003
702 pages
ISBN:158113634X
DOI:10.1145/872757
Conference Chair:
Zachary Ives
University of Pennsylvania
,
General Chair:
Yannis Papakonstantinou
University of California, San Diego
,
Program Chair:
Alon Halevy
University of Washington
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '03 Paper Acceptance Rate53of342submissions,15%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 187
  Total Citations
  View Citations
- 1,021
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ViST: a dynamic index method for querying XML data by tree structures

SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mapping of bibliographical standards into XML

XML: Visual QuickStart Guide

XML-based information mediation with MIX