ABSTRACT
This paper presents a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries, which provide the k groups with the highest aggregates as results. Essential support of such queries is lacking in current systems, which process the queries in a naïve materialize-group-sort scheme that can be prohibitively inefficient. Our framework is based on three fundamental principles. The Upper-Bound Principle dictates the requirements of early pruning, and the Group-Ranking and Tuple-Ranking Principles dictate group-ordering and tuple-ordering requirements. They together guide the query processor toward a provably optimal tuple schedule for aggregate query processing. We propose a new execution framework to apply the principles and requirements. We address the challenges in realizing the framework and implementing new query operators, enabling efficient group-aware and rank-aware query plans. The experimental study validates our framework by demonstrating orders of magnitude performance improvement in the new query plans, compared with the traditional plans.
- F. N. Afrati and R. Chirkova. Selecting and using views to compute aggregate queries (extended abstract). In ICDT, pages 383--397, 2005. Google ScholarDigital Library
- S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In VLDB, pages 506--521, 1996. Google ScholarDigital Library
- K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cube. SIGMOD, pages 359--370, 1999. Google ScholarDigital Library
- N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, 2002.Google ScholarCross Ref
- M. J. Carey and D. Kossmann. On saying "enough already!" in SQL. In SIGMOD, pages 219--230, 1997. Google ScholarDigital Library
- K. C.-C. Chang and S. Hwang. Minimal probing: Supporting expensive predicates for top-k queries. In SIGMOD, 2002. Google ScholarDigital Library
- S. Chaudhuri and L. Gravano. Evaluating top-k selection queries. In VLDB, pages 397--410, 1999. Google ScholarDigital Library
- S. Chaudhuri, R. Ramakrishnan, and G. Weikum. Integrating DB and IR technologies: What is the sound of one hand clapping? In CIDR, pages 1--12, 2005.Google Scholar
- S. Chaudhuri and K. Shim. Including group-by in query optimization. In VLDB, pages 354--366, 1994. Google ScholarDigital Library
- J. Claussen, A. Kemper, D. Kossmann, and C. Wiesner. Exploiting early sorting and early partitioning for decision support query processing. VLDB J., 9(3), 2000. Google ScholarDigital Library
- S. Cohen, W. Nutt, and A. Serebrenik. Rewriting aggregate queries using views. In PODS, pages 155--166, 1999. Google ScholarDigital Library
- D. Donjerkovic and R. Ramakrishnan. Probabilistic optimization of top n queries. In VLDB, 1999. Google ScholarDigital Library
- R. Fagin. Combining fuzzy information from multiple systems. In PODS, pages 216--226, 1996. Google ScholarDigital Library
- R. Fagin, A. Lote, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarDigital Library
- M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB, pages 299--310, San Francisco, CA, USA, 1998. Google ScholarDigital Library
- J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J. Data Mining and Knowledge Discovery, 1(1):29--53, 1997. Google ScholarDigital Library
- A. Gupta, V. Harinarayan, and D. Quass. Aggregate query processing in data warehousing environments. In VLDB, pages 358--369, 1995. Google ScholarDigital Library
- H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index selection for OLAP. In ICDE, pages 208--219, 1997. Google ScholarDigital Library
- P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD, pages 287--298, 1999. Google ScholarDigital Library
- J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. In SIGMOD, 2001. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD Conference, pages 205--216, 1996. Google ScholarDigital Library
- J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, pages 171--182, 1997. Google ScholarDigital Library
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. In VLDB, pages 754--765, 2003. Google ScholarDigital Library
- I. F. Ilyas, R. Shah, W. G. Aref, J. S. Vitter, and A. K. Elmagarmid. Rank-aware query optimization. In SIGMOD, pages 203--214, 2004. Google ScholarDigital Library
- C. Li, K. C.-C. Chang, and I. F. Ilyas. Efficient processing of ad-hoc top-k aggregate queries in OLAP. Technical Report UIUCDCS-R-2005-2596, Department of Computer Science, UIUC, June 2005. http://aim.cs.uiuc.edu.Google Scholar
- C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song. RankSQL: Query algebra and optimization for relational top-k queries. In SIGMOD, pages 131--142, 2005. Google ScholarDigital Library
- H.-G. Li, H. Yu, D. Agrawal, , and A. E. Abbadi. Ranking aggregates. Technical report, UCSB, July 2004.Google Scholar
- V. Lin, V. Vassalos, and P. Malakasiotis. MiniCount: Efficient rewriting of COUNT-queries using views. In ICDE, 2006. Google ScholarDigital Library
- T. Neumann and G. Moerkotte. A combined framework for grouping and order optimization. In VLDB, 2004.Google ScholarDigital Library
- K. A. Ross and D. Srivastava. Fast computation of sparse datacubes. In VLDB, pages 116--125, 1997. Google ScholarDigital Library
- D. E. Simmen, E. J. Shekita, and T. Malkemus. Fundamental techniques for order optimization. In SIGMOD, 1996. Google ScholarDigital Library
- D. Srivastava, S. Dar, H. V. Jagadish, and A. Y. Levy. Answering queries with aggregation using views. In VLDB, pages 318--329, 1996. Google ScholarDigital Library
- A. Tsois and T. K. Sellis. The generalized pre-grouping transformation: Aggregate-query optimization in the presence of dependencies. In VLDB, pages 644--655, 2003. Google ScholarDigital Library
- W. P. Yan and P.-Å. Larson. Performing group-by before join. In ICDE, pages 89--100, 1994. Google ScholarDigital Library
- W. P. Yan and P.-Å. Larson. Eager aggregation and lazy aggregation. In VLDB'95, pages 345--357, 1995. Google ScholarDigital Library
- Y. Zhao, P. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD, pages 159--170, 1997. Google ScholarDigital Library
Index Terms
- Supporting ad-hoc ranking aggregates
Recommendations
Probabilistic top-k and ranking-aggregate queries
Ranking and aggregation queries are widely used in data exploration, data analysis, and decision-making scenarios. While most of the currently proposed ranking and aggregation techniques focus on deterministic data, several emerging applications involve ...
On contextual ranking queries in databases
In this paper, we identify a novel and interesting type of queries, contextual ranking queries, which return the ranks of query tuples among some context tuples given in the queries. Contextual ranking queries are useful for olap and decision support ...
Efficient Top-k Query Answering through its Top-N Rewritings Using Views
PIKM '15: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge ManagementRecently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the ...
Comments