Abstract
Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.
Similar content being viewed by others
Notes
As discussed in [29] the magnitude of the query vector does not influence the query result as long as the direction remains the same.
The K-skyband is the set of points which are dominated by at most K−1 other ones. The K-skyband is a set of points, such that there exists no other point that can belong to the result of any top-k query for any increasingly monotone function.
The distance d(p,q) of two points p and q based on the L ∞ distance is \(d(p,q)=\max_{\forall d_{i}}(|p[i]-q[i]|)\).
Our implementation uses the XXL library available at: http://www.xxl-library.de.
Available at: http://www.cc.gatech.edu/projects/gtitm/.
References
Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-k queries. Distrib. Parallel Databases 19(2–3), 67–86 (2006)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 495–506 (2007)
Balke, W.T., Güntzer, U.: Multi-objective query processing for database systems. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 936–947 (2004)
Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 174–185 (2005)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 421–430 (2001)
Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27(2), 153–187 (2002)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 206–215 (2004)
Chang, Y.-C., Bergman, L.D., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 391–402 (2000)
Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 397–410 (1999)
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. 16(8), 992–1009 (2004)
Chaudhuri, S., Dalvi, N.N., Kaushik, R.: Robust cardinality and cost estimation for skyline operator. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), p. 64 (2006)
Chen, C.M., Ling, Y.: A sampling-based estimator for top-k selection query. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 617–627 (2002)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 717–816 (2003)
Dedzoe, W.K., Lamarre, P., Akbarinia, R., Valduriez, P.: ASAP top-k query processing in unstructured P2P systems. In: Proceedings of International Conference on Peer-to-Peer Computing (P2P), pp. 1–10 (2010)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)
Güntzer, U., Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 419–428 (2000)
Hose, K., Karnstedt, M., Sattler, K.U., Zinn, D.: Processing top-N queries in P2P-based web integration systems with probabilistic guarantees. In: Proceedings of International Workshop on Web and Databases (WebDB), pp. 109–114 (2005)
Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: A system for the efficient execution of multi-parametric ranked queries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 259–270 (2001)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K., Elmongui, H.G., Shah, R., Vitter, J.S.: Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31(4), 1257–1304 (2006)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (2008)
Lu, J., Callan, J.: Merging retrieval results in hierarchical peer-to-peer networks. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 472–473 (2004)
Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Proceedings of European Conference on IR Research (ECIR), pp. 52–66 (2005)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 637–648 (2005)
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 635–646 (2006)
Ryeng, N.H., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Efficient distributed top-k query processing with caching. In: Proceedings of DASFAA, vol. 2, pp. 280–295 (2011)
Tao, Y., Hristidis, V., Papadias, D., Papakonstantinou, Y.: Branch-and-bound processing of ranked queries. Inf. Sci. 32(3), 424–445 (2007)
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 277–288 (2003)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: On efficient top-k query processing in highly distributed environments. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 753–764 (2008)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Vazirgiannis, M.: Skyline-based peer-to-peer top-k query processing. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 1421–1423 (2008)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-k queries. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 365–376 (2010)
Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis, Y.: Identifying the most influential data objects with reverse top-k queries. Proc. VLDB Endow. 3(1), 364–372 (2010)
Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Monochromatic and bichromatic reverse top-k queries. IEEE Trans. Knowl. Data Eng. 23(8), 1215–1229 (2011)
Vlachou, A., Doulkeridis, C., Nørvåg, K.: Monitoring reverse top-k queries over mobile devices. In: Proceedings of ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE) (2011)
Zhao, K., Tao, Y., Zhou, S.: Efficient top-k processing in large-scaled distributed environments. Data Knowl. Eng. 63(2), 315–335 (2007)
Zou, L., Chen, L.: Pareto-based dominant graph: An efficient indexing structure to answer top-k queries. IEEE Trans. Knowl. Data Eng. 23(5), 727–741 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Kaushik Chakrabarti.
Rights and permissions
About this article
Cite this article
Vlachou, A., Doulkeridis, C. & Nørvåg, K. Distributed top-k query processing by exploiting skyline summaries. Distrib Parallel Databases 30, 239–271 (2012). https://doi.org/10.1007/s10619-012-7094-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7094-2