Effective Query Grouping Strategy in Clouds

Liu, Qin; Guo, Yuhong; Wu, Jie; Wang, Guojun

doi:10.1007/s11390-017-1797-9

Effective Query Grouping Strategy in Clouds

Regular Paper
Published: 08 December 2017

Volume 32, pages 1231–1249, (2017)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Qin Liu^1,2,
Yuhong Guo³,
Jie Wu⁴ &
…
Guojun Wang⁵

143 Accesses
58 Citations
Explore all metrics

Abstract

As the demand for the development of cloud computing grows, more and more organizations have outsourced their data and query services to the cloud for cost-saving and flexibility. Suppose an organization that has a great number of users querying the cloud-deployed multiple proxy servers to achieve cost efficiency and load balancing. Given n queries, each of which is expressed as several keywords, and k proxy servers, the problem to be solved is how to classify n queries into k groups, in order to minimize the difference between each group and the number of distinct keywords in all groups. Since this problem is NP-hard, it is solved in mathematic and heuristic ways. Mathematic grouping uses a local optimization method, and heuristic grouping is based on k-means. Specifically, two extensions are provided: the first one focuses on robustness, i.e., each user obtains search results even if some proxy servers fail; the second one focuses on benefit, i.e., each user can retrieve as many files as possible that may be of interest without increasing the sum. Extensive evaluations have been conducted on both a synthetic dataset and real query traces to verify the effectiveness of our strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Mell P M, Grance T. The NIST definition of cloud computing. Communications of the ACM, 2010, 53(6): Article No. 50.
Fu Z J, Shu J G, Sun X M, Zhang D X. Semantic keyword search based on trie over encrypted cloud data. In Proc. the 2nd Int. Workshop on Security in Cloud Computing, June 2014, pp.59-62.
Fu Z J, Ren K, Shu J G, Sun X M, Huang F X. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel and Distributed Systems, 2016, 27(9): 2546-2559.
Article Google Scholar
Liu Q, Tan C C, Wu J, Wang G J. Cooperative private searching in clouds. Journal of Parallel and Distributed Computing, 2012, 72(8): 1019-1031.
Article MATH Google Scholar
Liu Q, Tan C C, Wu J, Wang G J. Towards differential query services in costefficient clouds. IEEE Trans. Parallel and Distributed Systems, 2014, 25(6): 1648-1658.
Article Google Scholar
Sweeney L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557-570.
Article MathSciNet MATH Google Scholar
Niu B, Li Q H, Zhu X Y, Cao G H, Li H. Achieving k-anonymity in privacy-aware location-based services. In Proc. IEEE INFOCOM, April 27-May 2, 2014, pp.754-762.
Yi X, Paulet R, Bertino E, Varadharajan V. Practical approximate k nearest neighbor queries with location and query privacy. IEEE Trans. Knowledge and Data Engineering, 2016, 28(6): 1546-1559.
Article Google Scholar
Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002, 24(7): 881-892.
Guo Y H. Active instance sampling via matrix partition. In Proc. NIPS, December 2010, pp.802-810.
Hamerly G. Making k-means even faster. In Proc. SIAM Int. Conf. Data Mining, April 2010, pp.130-140.
Pass G, Chowdhury A, Torgeson C. A picture of search. In Proc. the 1st Int. Conf. Scalable Information Systems, May 30-June 1, 2006.
Gates A F, Natkovich O, Chopra S, Kamath P, Narayanamurthy S M, Olston C, Reed B, Srinivasan S, Srivastava U. Building a high-level dataflow system on top of Map-Reduce: The pig experience. In Proc. VLDB Endowment, August 2009, pp.1414-1425.
Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare: Sharing across multiple queries in MapReduce. In Proc. VLDB Endowment, September 2010, pp.494-505.
Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin F B, Babu S. Starfish: A self-tuning system for big data analytics. In Proc. Biennial Conf. Innovative Data Systems Research, January 2011, pp.261-272.
Lei C, Zhuang Z F, Rundensteiner E A, Eltabakh M. Shared execution of recurring workloads in MapReduce. In Proc. VLDB Endowment, September 2015, pp.714-725.
Aggarwal C C, Zhai C X. A survey of text clustering algorithms. In Mining Text Data, Aggarwal C C, Zhai C X (eds.), Springer, 2012, pp.77-128.
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A Y, Foufou S, Bouras A. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans. Emerging Topics in Computing, 2014, 2(3): 267-279.
Article Google Scholar
Vu T T, Willis A, Song D W. Modelling time-aware search tasks for search personalisation. In Proc. the 24th Int. Conf. World Wide Web, May 2015, pp.131-132.
Zhao Y, Karypis G. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 2004, 55(3): 311-331.
Article MATH Google Scholar
Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Record, 1996, 25(2): 103-114.
Article Google Scholar
Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. Information Systems, 2001, 26(1): 35-58.
Article MATH Google Scholar
Karypis G, Han E H, Kumar V. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68-75.
Article Google Scholar
Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. In Proc. the 15th Int. Conf. Data Engineering, March 1999, pp.512-521.
Schütz H, Silverstein C. Projections for efficient document clustering. ACM SIGIR Forum, 1997, 31(SI): 74-81.
Cutting D R, Karger D R, Pedersen J O, Tukey J W. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proc. the 15th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, June 1992, pp.318-329.
Sarle W S. Finding groups in data: An introduction to cluster analysis. Journal of the American Statistical Association, 1991, 86(415): 830-833.
Article Google Scholar
Ng R J, Han J W. Efficient and effective clustering methods for spatial data mining. In Proc. the 20th Int. Conf. Very Large Data Bases, September 1994, pp.144-155.
Ng R T, Han J W. CLARANS: A method for clustering objects for spatial data mining. IEEE Trans. Knowledge and Data Engineering, 2002, 14(5): 1003-1016.
Article Google Scholar
Wei C P, Lee Y H, Hsu C M. Empirical comparison of fast clustering algorithms for large data sets. In Proc. the 33rd Annual Hawaii Int. Conf. System Sciences, January 2000.

Download references

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Qin Liu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Qin Liu
School of Computer Science, Carleton University, Ottawa, ON, K155B6, Canada
Yuhong Guo
Department of Computer and Information Sciences, Temple University, Philadelphia, PA, 19122, U.S.A.
Jie Wu
School of Computer Science and Educational Software, Guangzhou University, Guangzhou, 510006, China
Guojun Wang

Authors

Qin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Guojun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guojun Wang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 339 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Guo, Y., Wu, J. et al. Effective Query Grouping Strategy in Clouds. J. Comput. Sci. Technol. 32, 1231–1249 (2017). https://doi.org/10.1007/s11390-017-1797-9

Download citation

Received: 18 May 2016
Revised: 12 January 2017
Published: 08 December 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11390-017-1797-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective Query Grouping Strategy in Clouds

Abstract

Access this article

Similar content being viewed by others

Big data analytics in Cloud computing: an overview

A survey of Kubernetes scheduling algorithms

Clustering graph data: the roadmap to spectral techniques

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective Query Grouping Strategy in Clouds

Abstract

Access this article

Similar content being viewed by others

Big data analytics in Cloud computing: an overview

A survey of Kubernetes scheduling algorithms

Clustering graph data: the roadmap to spectral techniques

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation