skip to main content
10.1145/872757.872774acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

ViST: a dynamic index method for querying XML data by tree structures

Published:09 June 2003Publication History

ABSTRACT

With the growing importance of XML in data exchange, much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST, a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, we show that querying XML data is equivalent to finding subsequence matches. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+ Trees without using any specialized data structures that are not well supported by DBMSs. Our experiments show that ViST is effective, scalable, and efficient in supporting structural queries.

References

  1. S. Abiteboul, P. Buneman, and D. Suciu. Data on the web: from relations to semistructured data and XML. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Alstrup and T. Rauhe. Improved labeling scheme for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chamberlin, D. Florescu, J. Robie, J. Simon, and M. Stefanescu. XQuery: A query language for XML W3C working draft. Technical Report WD-xquery-20010215, World Wide Web Consortium, 2001.]]Google ScholarGoogle Scholar
  5. D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In WebDB, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chung, J. Min, and K. Shim. APEX: An adaptive path index for XML data. In ACM SIGMOD, June 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Clark and S. DeRose. XML path language (XPath) version 1.0 w3c recommendation. Technical Report REC-xpath-19991116, World Wide Web Consortium, 1999.]]Google ScholarGoogle Scholar
  8. Edith Cohen, Haim Kaplan, and Tova Milo. Labeling dynamic XML trees. In PODS, pages 271--281, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brian F. Cooper, Neal Sample, Michael Franklin, Am (Bsli Hjaltason G., and Moshe Shadmon. A fast index for semistructured data. In VLDB, pages 341--350, September 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML. In Proceedings of the 8th International World Wide Web Conference, pages 77--91, May 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Goldman and J. Widom. DataGuides: Enable query formulation and optimization in semistructured databases. In VLDB, pages 436--445, August 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dan Gus eld. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.]]Google ScholarGoogle Scholar
  13. H. Kaplan, T. Milo, and R.Shabo. A comparison of labeling schemes for ancestor queries. In Proc. ACM-SIAM Symposium on Discrete Algorithms(SODA), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In ACM SIGMOD, June 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael Ley. DBLP database web site. http://www.informatik.uni-trier.de/ ley/db, 2000.]]Google ScholarGoogle Scholar
  16. Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In VLDB, pages 361--370, September 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262--272, April 1976.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Milo and D. Suciu. Index structures for path expression. In Proceedings of 7th International Conference on Database Theory (ICDT), pages 277--295, January 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML benchmark project. Technical Report INS-R0103, Centrum voor Wiskunde en Informatica, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sleepycat Software, http://www.sleepycat.com. The Berkeley Database (Berkeley DB).]]Google ScholarGoogle Scholar
  21. Haixun Wang, Chang shing Perng, Wei Fan, Sanghyun Park, and Philip S. Yu. Indexing weighted sequences in large databases. In ICDE, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  22. The internet movie database. http://www.imdb.com, 2000.]]Google ScholarGoogle Scholar
  23. XMARK: The XML-benchmark project. http://monetdb.cwi.nl/ xml, 2002.]]Google ScholarGoogle Scholar

Index Terms

  1. ViST: a dynamic index method for querying XML data by tree structures

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data
              June 2003
              702 pages
              ISBN:158113634X
              DOI:10.1145/872757

              Copyright © 2003 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 June 2003

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              SIGMOD '03 Paper Acceptance Rate53of342submissions,15%Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader