skip to main content
10.1145/375551.375567acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Optimal aggregation algorithms for middleware

Authors Info & Claims
Published:01 May 2001Publication History

ABSTRACT

Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, or combining rule, such as min or average, that combines the individual grades to obtain an overall grade.

To determine objects that have the best overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagin's Algorithm”, or FA) that is much more efficient. For some distributions on grades, and for some monotone aggregation functions, FA is optimal in a high-probability sense.

We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer.

We distinguish two types of access: sorted access (where the middleware system obtains the grade of an object in some sorted list by proceeding through the list sequentially from the top), and random access (where the middleware system requests the grade of object in a list, and obtains it in one step). We consider the scenarios where random access is either impossible, or expensive relative to sorted access, and provide algorithms that are essentially optimal for these cases as well.

References

  1. D Aksoy and M Franklin RxW A scheduling approach for large scale on demand data broadcast IEEE ACM Transactions On Networking 7(6):846-880, December 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A Borodin and R El Yaniv Online Computation and Competitive Analysis Cambridge University Press New York, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M J Carey L M Haas P M Schwarz M Arya W F Cody R Fagin M Flickner A W Luniewski W Niblack D Petkovic J Thomas J H Williams and E L Wimmers Towards heterogeneous multimedia information systems the Garlic approach In RIDE DOM th Int l Workshop on Research Issues in Data Engineering Distributed Object Management pages 124-131, 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R Fagin Combining fuzzy information from multiple systems J Comput System Sci., 58:83-99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U Guntzer W T Balke and W Kiessling Optimizing multi feature queries in image databases In Proc th Very Large Databases VLDB Conference pages Cairo Egypt, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U G untzer W T Balke and W Kiessling Towards e cientmulti feature queries in heterogeneous environments In Proc of the IEEE International Conference on Information Technology Coding and Computing ITCC LasVegas USA April 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D S Hochbaum editor Approximation Algorithms for NP HardProblems PWS Publishing Company Boston, MA, 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R Motwani and P Raghavan RandomizedAlgorithms Cambridge University Press Cambridge U.K., 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S Nepal and M V Ramakrishna Query processing issues in image multimedia databases In Proc th International Conference on Data Engineering ICDE pages March, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W Niblack R Barber W Equitz M Flickner E Glasman D Petkovic and P Yanker The QBIC project Querying images bycontent using color texture and shape In SPIE Conference on Storage and Retrieval for Image and Video Databases volume 1908, pages 173-187, 1983. QBIC Web server is http wwwqbic almaden ibm comGoogle ScholarGoogle Scholar
  11. G Salton Automatic Text Processing the Transformation Analysis and Retrieval of Information by Computer Addison Wesley Reading MA, 1989 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D Sleator and R E Tarjan Amortized efficiency of list update and paging rules Comm ACM 28:202-208, 1985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E L Wimmers L M Haas M T Roth and C Braendli Using Fagin s algorithm for merging ranked results in multimedia middleware In Fourth IFCIS International ConferenceonCooperative Information Systems pages IEEE Computer Society Press September 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L A Zadeh Fuzzy sets Information and Control 8:338-363, 1999Google ScholarGoogle ScholarCross RefCross Ref
  15. H J Zimmermann Fuzzy Set Theory Kluwer Academic Publishers Boston 3rd edition, 1996.Google ScholarGoogle Scholar

Index Terms

  1. Optimal aggregation algorithms for middleware

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
                May 2001
                301 pages
                ISBN:1581133618
                DOI:10.1145/375551

                Copyright © 2001 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 May 2001

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                PODS '01 Paper Acceptance Rate26of99submissions,26%Overall Acceptance Rate642of2,707submissions,24%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader