Skip to main content
Log in

Distributed top-k query processing by exploiting skyline summaries

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Algorithm 2
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. As discussed in [29] the magnitude of the query vector does not influence the query result as long as the direction remains the same.

  2. The K-skyband is the set of points which are dominated by at most K−1 other ones. The K-skyband is a set of points, such that there exists no other point that can belong to the result of any top-k query for any increasingly monotone function.

  3. The distance d(p,q) of two points p and q based on the L distance is \(d(p,q)=\max_{\forall d_{i}}(|p[i]-q[i]|)\).

  4. Our implementation uses the XXL library available at: http://www.xxl-library.de.

  5. Available at: http://www.cc.gatech.edu/projects/gtitm/.

References

  1. Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)

    Article  Google Scholar 

  2. Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 495–506 (2007)

    Google Scholar 

  3. Balke, W.T., Güntzer, U.: Multi-objective query processing for database systems. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 936–947 (2004)

    Google Scholar 

  4. Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 174–185 (2005)

    Google Scholar 

  5. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 421–430 (2001)

    Google Scholar 

  6. Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27(2), 153–187 (2002)

    Article  Google Scholar 

  7. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 206–215 (2004)

    Google Scholar 

  8. Chang, Y.-C., Bergman, L.D., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 391–402 (2000)

    Chapter  Google Scholar 

  9. Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 397–410 (1999)

    Google Scholar 

  10. Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)

    Article  Google Scholar 

  11. Chaudhuri, S., Dalvi, N.N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), p. 64 (2006)

    Google Scholar 

  12. Chen, C.M., Ling, Y.: A sampling-based estimator for top-k selection query. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 617–627 (2002)

    Google Scholar 

  13. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 717–816 (2003)

    Google Scholar 

  14. Dedzoe, W.K., Lamarre, P., Akbarinia, R., Valduriez, P.: ASAP top-k query processing in unstructured P2P systems. In: Proceedings of International Conference on Peer-to-Peer Computing (P2P), pp. 1–10 (2010)

    Chapter  Google Scholar 

  15. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)

    Google Scholar 

  16. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)

    Article  MATH  Google Scholar 

  17. Güntzer, U., Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 419–428 (2000)

    Google Scholar 

  18. Hose, K., Karnstedt, M., Sattler, K.U., Zinn, D.: Processing top-N queries in P2P-based web integration systems with probabilistic guarantees. In: Proceedings of International Workshop on Web and Databases (WebDB), pp. 109–114 (2005)

    Google Scholar 

  19. Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 259–270 (2001)

    Google Scholar 

  20. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K., Elmongui, H.G., Shah, R., Vitter, J.S.: Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31(4), 1257–1304 (2006)

    Article  Google Scholar 

  21. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (2008)

    Article  Google Scholar 

  22. Lu, J., Callan, J.: Merging retrieval results in hierarchical peer-to-peer networks. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 472–473 (2004)

    Google Scholar 

  23. Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Proceedings of European Conference on IR Research (ECIR), pp. 52–66 (2005)

    Google Scholar 

  24. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)

    Article  Google Scholar 

  25. Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 637–648 (2005)

    Google Scholar 

  26. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 635–646 (2006)

    Chapter  Google Scholar 

  27. Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Proceedings of DASFAA, vol. 2, pp. 280–295 (2011)

    Google Scholar 

  28. Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Sci. 32(3), 424–445 (2007)

    Google Scholar 

  29. Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 277–288 (2003)

    Google Scholar 

  30. Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 753–764 (2008)

    Google Scholar 

  31. Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: Skyline-based peer-to-peer top-k query processing. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 1421–1423 (2008)

    Google Scholar 

  32. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 365–376 (2010)

    Chapter  Google Scholar 

  33. Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. Proc. VLDB Endow. 3(1), 364–372 (2010)

    Google Scholar 

  34. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011)

    Article  Google Scholar 

  35. Vlachou, A., Doulkeridis, C., Nørvåg, K.: Monitoring reverse top-k queries over mobile devices. In: Proceedings of ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE) (2011)

    Google Scholar 

  36. Zhao, K., Tao, Y., Zhou, S.: Efficient top-k processing in large-scaled distributed environments. Data Knowl. Eng. 63(2), 315–335 (2007)

    Article  Google Scholar 

  37. Zou, L., Chen, L.: Pareto-based dominant graph: An efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kjetil Nørvåg.

Additional information

Communicated by Kaushik Chakrabarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vlachou, A., Doulkeridis, C. & Nørvåg, K. Distributed top-k query processing by exploiting skyline summaries. Distrib Parallel Databases 30, 239–271 (2012). https://doi.org/10.1007/s10619-012-7094-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-012-7094-2

Keywords

Navigation