Skip to main content

Retrieval of Semistructured Web Data

  • Chapter
Intelligent Exploration of the Web

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

  • 225 Accesses

Abstract

The ability to manage data whose structure is less rigid and strict than in conventional databases is important in many new application areas, such as biological databases, digital libraries, data integration and Web databases. Such data is called semistructured, since it cannot be constrained by a fixed predefined schema: the information that is normally associated with a schema is contained within the data, which is sometimes called self-describing. Such data has recently emerged as a particularly interesting research topic in which new data modelling and querying techniques are investigated.

In this paper, we consider how constraint-based technology can be used to query and reason about semistructured data. The constraint system FT [37] provides information ordering constraints interpreted over feature trees. Here, we show how a generalization of FT combined with path constraints allows one to formally represent, state constraints, and reason about semistructured data. The constraint languages we propose provide possibilities to straightforwardly capture, for example, what it means for a tree to be a subtree or subsumed by another, or what it means for two paths to be divergent. We establish a logical semantics for our constraints thanks to axiom schemes presenting our first-order theory constraint system. We propose using the constraint systems for querying semistructured Web data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul S., (1997). Querying Semi-Structured Data. In Proceedings of the International Conference on Database Theory (ICDT’97), Delphi, Greece, pages 1–18.

    Google Scholar 

  2. Abiteboul S., Quass D., McHugh J., Widom J., and Wiener J. L., (1997). The Lorel Query Language for Semistructured Data. International Journal on Digital Libraries, 1 (1): 68–88.

    Article  Google Scholar 

  3. Abiteboul S. and Vianu V., (1997). Queries and Computation on the Web. In Foto-N. Afrati and Phokion Kolaitis, editors, proceedings of the 6th International Conference on Database Theory (ICDT’97), Delphi, Greece, volume 1186 of Lecture Notes in Computer Science, pages 662–675. Springer Verlag.

    Google Scholar 

  4. Abiteboul S. and Vianu V., (1997). Regular Path Queries with Constraints. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Databases (PODS’97), Tucson, Arizona, pages 122–133. ACM Press.

    Google Scholar 

  5. Aït-Kaci H., (1986). An Algebraic Semantics Approach to the Effective Resolution of Type Equations. Theoretical Computer Science, 45: 293–351.

    Article  MathSciNet  MATH  Google Scholar 

  6. Aït-Kaci H. and Nasr R., (1986). LOGIN: A Logic Programming Language with Built-in Inheritance. Journal of Logic Programming, 3 (3): 185–215.

    Article  Google Scholar 

  7. Aït-Kaci H. and Podelski A., (1993). Towards a Meaning of LIFE. The Journal of Logic Programming, 16 (3–4).

    Google Scholar 

  8. Aït-Kaci H., Podelski A., and Smolka G., (1994). A Feature-Based Constraint System for Logic Programming with Entailment. Theoretical Computer Science, 122: 263–283.

    Article  MathSciNet  MATH  Google Scholar 

  9. Baader F., Bürckert H. J., Nebel B., Nutt W., and Smolka G., (1993). On the Expressivity of Feature Logics with Negation, Functional Uncertainty, and Sort Equations. Journal of Logic, Language and Information, 2: 1–18.

    Article  MATH  Google Scholar 

  10. Backofen R., (1994). Regular Path Expressions in Feature Logic. Journal of Symbolic Computation, 17: 421–455.

    Article  MathSciNet  MATH  Google Scholar 

  11. Beeri C. and Kornatski Y., (1994). A Logic Query Language for Hypermedia Systems. Information Systems, 77 (1/2): 1–37.

    MATH  Google Scholar 

  12. Blake G., Consens M., Kilpeläinen P., Larson P., Snider T., and Tompa T., (1994). Text/Relational Database Management Systems: Harmonizing SQL and SGML. In Proceedings of the First International Conference on Applications of Databases, Vadstena, Sweden, pages 267–280.

    Google Scholar 

  13. Buneman P., (1997). Semistructured Data. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS’97), Tucson, Az, USA, pages 117–121.

    Google Scholar 

  14. Buneman P., Davidson S., Hillebrand G., and Suciu D., (1996). A Query Language and Optimization Techniques for Unstructured Data. In proceedings of SIGMOD-96, pages 505–516.

    Google Scholar 

  15. Buneman P., Fan W., and Weinstein S., (1998). Path Constraints on Semistructured and Structured Data. In Proceedings of the seventeenth ACMSIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98), Seattle, Washington, pages 129–138. ACM Press.

    Google Scholar 

  16. Bürckert H.-J., (1994). A Resolution Principle for Constrained Logics. Artificial Intelligence, 66: 235–271.

    Article  MathSciNet  MATH  Google Scholar 

  17. Cattell R. G. G., (1994). The Object Database Standard: ODMG-93. Morgan Kaufmann, San Francisco, California.

    Google Scholar 

  18. l8. Christophides V., Abiteboul S., Cluet S., and Scholl M., (1994). From Structured Documents to Novel Query Facilities. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’94), pages 313–324.

    Google Scholar 

  19. Christophides V., Cluet S., and Moerkotte G., (1996). Evaluating Queries with Generalized Path Expressions. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’96), pages 313–322.

    Google Scholar 

  20. Clark J. and DeRose S., (1999). XML Path Language ( XPath ). Technical report, http://www.w3.org/TR/xpath.

    Google Scholar 

  21. Codd E. F., (1979). Extending the Database Relational Model to Capture more Meaning. ACM Transactions on Database Systems, 4: 397–434.

    Article  Google Scholar 

  22. Consens M. P. and Mendelzon A.O., (1989). Expressing Structural Hypertext Queries in Graphlog. In Proceedings of the Second ACM Conference on Hypertext, Pittsburgh, Pennsylvania, pages 269–292.

    Google Scholar 

  23. Fernandez M., Florescu D., Levy A., and Suciu D., (1997). A Query Language for a Web Site Management System. SIGMOD Record, 26 (3): 4–11.

    Article  Google Scholar 

  24. Genesereth M. and Fikes R., (1994). Knowledge Interchange Format Reference Manual. Available as http://www.logic.Stanford.edu/sharing/papers/kif.ps.

  25. Goldfarb C. F. and Rubinski Y., (1990). The SGML Handbook. Clarendon Press, Oxford, UK.

    Google Scholar 

  26. Hacid M.-S., Decleir C., and Kouloumdjian J., (2000). A Database Approach for Modeling and Querying Video Data. IEEE Transactions on Knowledge and Data Engineering (TKDE), 12 (5): 729–750.

    Article  Google Scholar 

  27. Höhfeld M. and Smolka G., (1988). Definite Relations over Constraint Languages. LILOG Report-53, IWBS, IBM Deutschland, Postfach 80 08 80, 7000 Stuttgart 80, Germany.

    Google Scholar 

  28. Kanza Y., Nutt W., and Sagiv Y., (1999). Queries with Incomplete Answers over Semistructured Data. In Proceedings of the Eighteenth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database systems (PODS’99), Philadelphia, Pennsylvania, pages 227–236. ACM Press.

    Google Scholar 

  29. Kim W., (1994). On Object Oriented Database Technology. UniSQL Product Literature.

    Google Scholar 

  30. Konopnicki D. and Shmueli 0., (1995). W3QS: A Query System for the World-Wide Web. In Umeshwar Dayal, Peter M.-D. Gray, and Shojiro Nishio, editors, Proceedings of the 2Ith International Conference on Very Large Databases (VLDB’95), Zurich, Switzerland, pages 54–65. Morgan Kaufmann.

    Google Scholar 

  31. Lakshmanan L. V. S., Sadri F., and Subramanian I. N., (1996). A Declarative Language for Querying and Restructuring the Web. In Proceedings of the Sixth International Workshop on Research Issues in Data Engineering (RIDE’96), pages 12–21.

    Google Scholar 

  32. LIoyd J. W., (1987). Foundations of Logic Programming. Springer-Verlag. Second edition.

    Google Scholar 

  33. McHugh J., Abiteboul S., Goldman R., Quass D., and Widom J., (1997). LORE: A Database Management System for Semistructured Data. SIGMOD Record, 26 (3): 54–66.

    Article  Google Scholar 

  34. Mendelzon A. and Wood P. T., (1995). Finding Regular Simple Paths in Graph Databases. SIAM Journal of Computing, 24 (6): 1235–1258.

    Article  MathSciNet  MATH  Google Scholar 

  35. Mendelzon A. O., Mihaila G. A., and Milo T., (1996). Querying the World Wide Web. International Journal on Digital Libraries, 1 (1): 54–67.

    Google Scholar 

  36. Minohara T. and Watanabe R., (1993). Queries on Structure in Hypertext. In Foundations of Data Organization and Algorithms (FODO’93), pages 394–411.

    Google Scholar 

  37. Müller M., Niehren J., and Podelski A., (1997). Ordering Constraints over Feature Trees. In Gert Smolka, editor, Principles and Practice of Constraint Programming - CP97, Third International Conference (CP’97), Linz, Austria, LNCS 1330, pages 297–311. Springer Verlag.

    Google Scholar 

  38. Rafii A., Ahmed R., Ketabchi M., DeSmedt P., and Du W., (1992). Integrating Strategies in the Pegasus Object Oriented Multidatabase System. In Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, volume-41, pages 323–334.

    Google Scholar 

  39. Rao R., Janssen B., and Rajaraman A., (1994). GAIA Technical Overview. Technical Report, Xerox Palo Alto Research Center.

    Google Scholar 

  40. Rounds W. C., (1997). Feature Logics. In Johan van Benthem and Alice ter Meulen, editors, Handbook of Logic and Language, pages 475–533. Elsevier Science Publishers B.V. (North Holland). Part 2: General Topics.

    Google Scholar 

  41. Schwartz R. L., (1993). Learning Perl. O’Reilly \8c Associates, Inc. Ch. Regular expressions.

    Google Scholar 

  42. Smolka G., (1992). Feature Constraint Logics for Unification Grammars. Journal of Logic Programming, 12 (1–2): 51–87.

    Article  MathSciNet  MATH  Google Scholar 

  43. Sudarshan C., Garcia-Molina H., Hammer J., Ireland K., Papakonstantinou Y., Ullman J., and Widom J., (1994). The TSIMMIS project: Integration of heterogeneous information sources. In proceedings of IPSJ, Tokyo, Japan.

    Google Scholar 

  44. Smolka G. and Treinen R., (1994). Records for Logic Programming. Journal of Logic Programming, 18 (3): 229–258.

    Article  MathSciNet  MATH  Google Scholar 

  45. Thierry-Mieg J. and Durbin R., (1992). Syntactic Definitions for the ACeDB Data Base Manager. Technical report mrc-lmb, MRC Laboratory for Molecular Biology.

    Google Scholar 

  46. Widom J., Papakonstantinou Y., Garcia–Molina H., (1995). Object Exchange Across Heterogeneous Information Sources. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Taipei, Taiwan, pages 251–260.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bertino, E., Hacid, MS., Toumani, F. (2003). Retrieval of Semistructured Web Data. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1772-0_19

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2519-0

  • Online ISBN: 978-3-7908-1772-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics