Distributed top-k query processing by exploiting skyline summaries

Vlachou, Akrivi; Doulkeridis, Christos; Nørvåg, Kjetil

doi:10.1007/s10619-012-7094-2

Distributed top-k query processing by exploiting skyline summaries

Published: 12 June 2012

Volume 30, pages 239–271, (2012)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Akrivi Vlachou¹,
Christos Doulkeridis¹ &
Kjetil Nørvåg¹

468 Accesses
18 Citations
Explore all metrics

Abstract

Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Skyline Queries and How to Choose from Pareto Sets

Ranking the big sky: efficient top-k skyline computation on massive data

Article 01 September 2018

FOPA: A Final Object Pruning Algorithm to Efficiently Produce Skyline Points

Notes

As discussed in [29] the magnitude of the query vector does not influence the query result as long as the direction remains the same.
The K-skyband is the set of points which are dominated by at most K−1 other ones. The K-skyband is a set of points, such that there exists no other point that can belong to the result of any top-k query for any increasingly monotone function.
The distance d(p,q) of two points p and q based on the L _∞ distance is \(d(p,q)=\max_{\forall d_{i}}(|p[i]-q[i]|)\).
Our implementation uses the XXL library available at: http://www.xxl-library.de.
Available at: http://www.cc.gatech.edu/projects/gtitm/.

References

Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)
Article Google Scholar
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 495–506 (2007)
Google Scholar
Balke, W.T., Güntzer, U.: Multi-objective query processing for database systems. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 936–947 (2004)
Google Scholar
Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 174–185 (2005)
Google Scholar
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 421–430 (2001)
Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27(2), 153–187 (2002)
Article Google Scholar
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 206–215 (2004)
Google Scholar
Chang, Y.-C., Bergman, L.D., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 391–402 (2000)
Chapter Google Scholar
Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 397–410 (1999)
Google Scholar
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)
Article Google Scholar
Chaudhuri, S., Dalvi, N.N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), p. 64 (2006)
Google Scholar
Chen, C.M., Ling, Y.: A sampling-based estimator for top-k selection query. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 617–627 (2002)
Google Scholar
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 717–816 (2003)
Google Scholar
Dedzoe, W.K., Lamarre, P., Akbarinia, R., Valduriez, P.: ASAP top-k query processing in unstructured P2P systems. In: Proceedings of International Conference on Peer-to-Peer Computing (P2P), pp. 1–10 (2010)
Chapter Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)
Google Scholar
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)
Article MATH Google Scholar
Güntzer, U., Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 419–428 (2000)
Google Scholar
Hose, K., Karnstedt, M., Sattler, K.U., Zinn, D.: Processing top-N queries in P2P-based web integration systems with probabilistic guarantees. In: Proceedings of International Workshop on Web and Databases (WebDB), pp. 109–114 (2005)
Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 259–270 (2001)
Google Scholar
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K., Elmongui, H.G., Shah, R., Vitter, J.S.: Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31(4), 1257–1304 (2006)
Article Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (2008)
Article Google Scholar
Lu, J., Callan, J.: Merging retrieval results in hierarchical peer-to-peer networks. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 472–473 (2004)
Google Scholar
Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Proceedings of European Conference on IR Research (ECIR), pp. 52–66 (2005)
Google Scholar
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Article Google Scholar
Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 637–648 (2005)
Google Scholar
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 635–646 (2006)
Chapter Google Scholar
Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Proceedings of DASFAA, vol. 2, pp. 280–295 (2011)
Google Scholar
Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Sci. 32(3), 424–445 (2007)
Google Scholar
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 277–288 (2003)
Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 753–764 (2008)
Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: Skyline-based peer-to-peer top-k query processing. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 1421–1423 (2008)
Google Scholar
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 365–376 (2010)
Chapter Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. Proc. VLDB Endow. 3(1), 364–372 (2010)
Google Scholar
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011)
Article Google Scholar
Vlachou, A., Doulkeridis, C., Nørvåg, K.: Monitoring reverse top-k queries over mobile devices. In: Proceedings of ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE) (2011)
Google Scholar
Zhao, K., Tao, Y., Zhou, S.: Efficient top-k processing in large-scaled distributed environments. Data Knowl. Eng. 63(2), 315–335 (2007)
Article Google Scholar
Zou, L., Chen, L.: Pareto-based dominant graph: An efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, NTNU, Trondheim, Norway
Akrivi Vlachou, Christos Doulkeridis & Kjetil Nørvåg

Authors

Akrivi Vlachou
View author publications
You can also search for this author in PubMed Google Scholar
Christos Doulkeridis
View author publications
You can also search for this author in PubMed Google Scholar
Kjetil Nørvåg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kjetil Nørvåg.

Additional information

Communicated by Kaushik Chakrabarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vlachou, A., Doulkeridis, C. & Nørvåg, K. Distributed top-k query processing by exploiting skyline summaries. Distrib Parallel Databases 30, 239–271 (2012). https://doi.org/10.1007/s10619-012-7094-2

Download citation

Published: 12 June 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10619-012-7094-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed top-k query processing by exploiting skyline summaries

Abstract

Access this article

Similar content being viewed by others

On Skyline Queries and How to Choose from Pareto Sets

Ranking the big sky: efficient top-k skyline computation on massive data

FOPA: A Final Object Pruning Algorithm to Efficiently Produce Skyline Points

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed top-k query processing by exploiting skyline summaries

Abstract

Access this article

Similar content being viewed by others

On Skyline Queries and How to Choose from Pareto Sets

Ranking the big sky: efficient top-k skyline computation on massive data

FOPA: A Final Object Pruning Algorithm to Efficiently Produce Skyline Points

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation