Abstract
Given a set of facility objects and a set of client objects, where each client is served by her nearest facility and each facility is constrained by a service capacity, we study how to find all the locations on which if a new facility with a given capacity is established, the number of served clients is maximized (in other words, the utility of the facilities is maximized). This problem is intrinsically difficult. An existing algorithm with an exponential complexity is not scalable and cannot handle this problem on large data sets. Therefore, we propose to solve the problem through parallel computing, in particular using MapReduce. We propose an arc-based method to divide the search space into disjoint partitions. For load balancing, we propose a dynamic strategy to assign partitions to reducers so that the estimated load difference is within a threshold. We conduct extensive experiments using both real and synthetic data sets of large sizes. The results demonstrate the efficiency and scalability of the algorithm.
Similar content being viewed by others
Notes
References
Al-Khateeb A, Rashid NA, Abdullah R (2012) An enhanced meta-scheduling system for grid computing that considers the job type and priority. Computing, pp 389–410
Dean J, Ghemawat S (2004) Mapreduce: Simplified data processing on large clusters. OSDI, pp 137–150
Gufler B, Augsten N, Reiser A, Kemper A (2011) Handling data skew in mapreduce. In: The first international conference on cloud computing and services, science, pp 574–583
Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. ICDE, pp 522–533
Hale TS, Moberg CR (2003) Location science research: a review. Ann Oper Res 123(1–4):21–35
Huang J, Wen Z, Pathan M, Taylor K, Xue Y, Zhang R (2011) Ranking locations for facility selection based on potential influences. In: The 37th annual conference on IEEE industrial electronics society, pp 2411–2416
Huang J, Wen Z, Qi J, Zhang R, Chen J, He Z (2011) Top-k most influential locations selection. CIKM, pp 2377–2380
Huang J, Zhang R, Buyya R, Chen J (2014) Melody-join: efficient earth mover’s distance similarity join using mapreduce. ICDE
Kahraman C, Ruan D, Doan I (2003) Fuzzy group decision-making for facility location selection. Inf Sci 157:135–153
Klose A, Drexl A (2005) Facility location models for distribution system design. Eur J Oper Res 162(1):4–29
Kolb L, Thor A, Rahm E (2012) Load balancing for mapreduce-based entity resolution. ICDE, pp 618–629
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. SIGMOD, pp 201–212
Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. SIGMOD, pp 25–36
Lu W, Shen Y, Chen S, Ooi BC (2012) Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5(10):1016–1027
Melkote S, Daskin MS (2001) Capacitated facility location/network design problems. Eur J Oper Res 129(3):481–495
Melo M, Nickel S, Saldanha da Gama F (2006) Dynamic multi-commodity capacitated facility location: a mathematical modeling framework for strategic supply chain planning. Comput Oper Res 33(1):181–208
Melo MT, Nickel S, Saldanha-Da-Gama F (2009) Facility location and supply chain management-a review. Eur J Oper Res 196(2):401–412
Nutanong S, Tanin E, Zhang R (2010) Incremental evaluation of visible nearest neighbor queries. TKDE 22(5):665–681
Nutanong S, Zhang R, Tanin E, Kulik L (2010) Analysis and evaluation of v*-knn: an efficient algorithm for moving knn queries. VLDB J 19(3):307–332
Qi J, Zhang R, Kulik L, Lin D, Xue Y (2012) The min-dist location selection query. ICDE, pp 366–377
Qiao Y, von Bochmann G (2012) Load balancing in peer-to-peer systems using a diffusive approach. Computing, pp 649–678
Quan X, Wenyin L, Dou W, Xiong H, Ge Y (2012) Link graph analysis for business site selection. Computer 45(3):64–69
Revelle CS, Eiselt HA, Daskin MS (2008) A bibliography for some fundamental problem categories in discrete location science. Eur J Oper Res 184(3):817–848
Sun Y, Huang J, Chen Y, Du X, Zhang R (2012) Top-k most incremental location selection with capacity constraint. WAIM, pp 165–171
Sun Y, Huang J, Chen Y, Zhang R, Du X (2012) Location selection for utility maximization with capacity constraints. CIKM, pp 2154–2158
Tao Y, Lin W, Xiao X (2013) Minimal mapreduce algorithms. SIGMOD
Mouratidis LHUK, Yiu ML, Mamoulis N (2010) Optimal matching between spatial datasets under capacity constraints. TODS 35(2):9:1–9:44
Wong RC-W, Özsu MT, Fu AW-C, Yu PS, Liu L, Liu Y (2011) Maximizing bichromatic reverse nearest neighbor for l p -norm in two- and three-dimensional spaces. VLDB J 20(6):893–919
Wong RC-W, Tao Y, Fu AW-C, Xiao X (2007) On efficient spatial matching. VLDB, pp 579–590
Xia T, Zhang D, Kanoulas E, Du Y (2005) On computing top-t most influential spatial sites. VLDB, pp 946–957
Yan D, Wong RC-W, Ng W (2011) Efficient methods for finding influential locations with adaptive grids. CIKM, pp 1475–1484
Yu C, Zhang R, Huang Y, Xiong H (2010) High-dimensional knn joins with incremental updates. GeoInformatica 14(1):55–82
Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. KDD, pp 186–194
Zhan L, Zhang Y, Zhang W, Lin X (2012) Finding top k most influential spatial facilities over uncertain objects. CIKM, pp 922–931
Zhang D, Du Y, Xia T, Tao Y (2006) Progressive computation of the min-dist optimal-location query. VLDB, pp 643–654
Zhang R, Jagadish HV, Dai BT, Ramamohanarao K (2010) Optimized algorithms for predictive range and knn queries on moving objects. Inf Syst 35(8):911–932
Zheng K, Huang Z, zhou A, Zhou X (2012) Discovering the most influential sites over uncertain data: a rank-based approach. TKDE 24(12):2156–2169
Zhou Z, Wu W, Li X, Lee ML, Hsu W (2011) Maxfirst for maxbrknn. ICDE, pp 828–839
Acknowledgments
This work is supported by the Australian Research Council (ARC) Discovery Project DP130104587. Dr. Rui Zhang is supported by the ARC Future Fellowships Project FT120100832. Dr. Yueguo Chen is partially supported by the National Science Foundation of China under Grant No. 61003085. Dr. Xiaoyong Du is partially supported by the National Science Foundation of China under Grant No. 61170010.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, Y., Qi, J., Zhang, R. et al. MapReduce based location selection algorithm for utility maximization with capacity constraints. Computing 97, 403–423 (2015). https://doi.org/10.1007/s00607-013-0382-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0382-5