ABSTRACT
With the growing importance of XML in data exchange, much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+ Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is effective, scalable, and efficient in supporting structural queries.
- S. Abiteboul, P. Buneman, and D. Suciu. Data on the web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 1999.]] Google ScholarDigital Library
- S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2001.]] Google ScholarDigital Library
- S. Alstrup and T. Rauhe. Improved labeling scheme for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2002.]] Google ScholarDigital Library
- D. Chamberlin, D. Florescu, J. Robie, J. Simon, and M. Stefanescu. XQuery: A query language for XML W3C working draft. Technical Report WD-xquery-20010215, World Wide Web Consortium, 2001.]]Google Scholar
- D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In WebDB, May 2000.]] Google ScholarDigital Library
- C. Chung, J. Min, and K. Shim. APEX: An adaptive path index for XML data. In ACM SIGMOD, June 2002.]] Google ScholarDigital Library
- J. Clark and S. DeRose. XML path language (XPath) version 1.0 w3c recommendation. Technical Report REC-xpath-19991116, World Wide Web Consortium, 1999.]]Google Scholar
- Edith Cohen, Haim Kaplan, and Tova Milo. Labeling dynamic XML trees. In PODS, pages 271--281, 2002.]] Google ScholarDigital Library
- Brian F. Cooper, Neal Sample, Michael Franklin, Am (Bsli Hjaltason G., and Moshe Shadmon. A fast index for semistructured data. In VLDB, pages 341--350, September 2001.]] Google ScholarDigital Library
- A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML. In Proceedings of the 8th International World Wide Web Conference, pages 77--91, May 1999.]] Google ScholarDigital Library
- R. Goldman and J. Widom. DataGuides: Enable query formulation and optimization in semistructured databases. In VLDB, pages 436--445, August 1997.]] Google ScholarDigital Library
- Dan Gus eld. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.]]Google Scholar
- H. Kaplan, T. Milo, and R.Shabo. A comparison of labeling schemes for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2002.]] Google ScholarDigital Library
- R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In ACM SIGMOD, June 2002.]] Google ScholarDigital Library
- Michael Ley. DBLP database web site. http://www.informatik.uni-trier.de/ ley/db, 2000.]]Google Scholar
- Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In VLDB, pages 361--370, September 2001.]] Google ScholarDigital Library
- E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262--272, April 1976.]] Google ScholarDigital Library
- T. Milo and D. Suciu. Index structures for path expression. In Proceedings of 7th International Conference on Database Theory (ICDT), pages 277--295, January 1999.]] Google ScholarDigital Library
- A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML benchmark project. Technical Report INS-R0103, Centrum voor Wiskunde en Informatica, 2001.]] Google ScholarDigital Library
- Sleepycat Software, http://www.sleepycat.com. The Berkeley Database (Berkeley DB).]]Google Scholar
- Haixun Wang, Chang shing Perng, Wei Fan, Sanghyun Park, and Philip S. Yu. Indexing weighted sequences in large databases. In ICDE, 2003.]]Google ScholarCross Ref
- The internet movie database. http://www.imdb.com, 2000.]]Google Scholar
- XMARK: The XML-benchmark project. http://monetdb.cwi.nl/ xml, 2002.]]Google Scholar
Index Terms
- ViST: a dynamic index method for querying XML data by tree structures
Recommendations
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
XML-based information mediation with MIX
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataThe MIX mediator system, MIXm, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.1 MIXm uses XML as the common model for data exchange. Mediator views are expressed in XMAS (XML ...
Comments