skip to main content
10.1145/1516360.1516464acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Ranking objects based on relationships and fixed associations

Published:24 March 2009Publication History

ABSTRACT

Text corpora are often enhanced by additional metadata which relate real-world entities, with each document in which such entities are discussed. Such relationships are typically obtained through widely available Information Extraction tools. At the same time, interesting known associations typically hold among these entities. For instance, a corpus might contain discussions on hotels, cities and airlines; fixed associations among these entities may include: airline A operates a flight to city C, hotel H is located in city C.

A plethora of applications necessitate the identification of associated entities, each best matching a given set of keywords. Consider the sample query: Find a holiday package in a "pet-friendly" hotel, located in a "historical" yet "lively" city, with travel operated by an "economical" and "safe" airline. These keywords are unlikely to occur in the textual description of entities themselves, (e.g., the actual hotel name or the city name or the airline name). Consequently to answer such queries, one needs to exploit both relationships between entities and documents (e.g., keyword "pet-friendly" occurs in a document that contains an entity specifying a hotel name H), and the known associations between entities (e.g., hotel H is located in city C).

In this work, we focus on the class of "entity package finder" queries outlined above. We demonstrate that existing techniques cannot be efficiently adapted to solve this problem, as the resulting algorithm relies on estimations with excessive runtime and/or storage overheads. We propose an efficient algorithm to process such queries, over large corpora. We devise early pruning and termination strategies, in the presence of joins and aggregations (executed on entities extracted from text), that do not depend on any estimates. Our analysis and experimental evaluation on real and synthetic data demonstrates the efficiency and scalability of our approach.

References

  1. Opencalais. http://www.opencalais.com. Retrieved on June 23, 2008.Google ScholarGoogle Scholar
  2. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD Conference, pages 275--286, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Angel, S. Chaudhuri, G. Das, and N. Koudas. Ranking objects based on relationships and fixed associations. Tech.report, 2008. Available at http://www.cs.toronto.edu/albert/docs/acdk-edbt09.pdf.Google ScholarGoogle Scholar
  4. D. E. Appelt and D. Israel. Introduction to information extraction. In IJCAI Tutorial, 1999.Google ScholarGoogle Scholar
  5. N. Bansal, S. Guha, and N. Koudas. Ad-hoc aggregations of ranked lists in the presence of hierarchies. In SIGMOD Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Bansal and N. Koudas. Blogscope: A system for online analysis of high volume text streams. In VLDB, pages 1410--1413, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475--486, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247--1250, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects based on relationships. In SIGMOD Conference, pages 371--382, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. In VLDB, pages 754--765, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. F. Ilyas, W. G. Aref, A. K. Elmagarmid, H. G. Elmongui, R. Shah, and J. S. Vitter. Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst., 31(4):1257--1304, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Li, K. C.-C. Chang, and I. F. Ilyas. Supporting ad-hoc ranking aggregates. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 61--72, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song. Ranksql: Query algebra and optimization for relational top-k queries. In SIGMOD Conference, pages 131--142, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Schnaitter and N. Polyzotis. Evaluating rank joins with optimal cost. In PODS, pages 43--52, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Schnaitter, J. Spiegel, and N. Polyzotis. Depth estimation for ranking query optimization. In VLDB, pages 902--913, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Singhal. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35--43, 2001.Google ScholarGoogle Scholar
  18. M. A. Soliman, I. F. Ilyas, and K. C.-C. Chang. Top-k query processing in uncertain databases. In ICDE, pages 896--905, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  1. Ranking objects based on relationships and fixed associations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
        March 2009
        1180 pages
        ISBN:9781605584225
        DOI:10.1145/1516360

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 March 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate7of10submissions,70%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader