Abstract
This article addresses the problem of self-tuning the data placement in replicated key-value stores. The goal is to automatically optimize replica placement in a way that leverages locality patterns in data accesses, such that internode communication is minimized. To do this efficiently is extremely challenging, as one needs not only to find lightweight and scalable ways to identify the right assignment of data replicas to nodes but also to preserve fast data lookup. The article introduces new techniques that address these challenges. The first challenge is addressed by optimizing, in a decentralized way, the placement of the objects generating the largest number of remote operations for each node. The second challenge is addressed by combining the usage of consistent hashing with a novel data structure, which provides efficient probabilistic data placement. These techniques have been integrated in a popular open-source key-value store. The performance results show that the throughput of the optimized system can be six times better than a baseline system employing the widely used static placement based on consistent hashing.
- M. Ahmad, B. Kemme, I. Brondino, M. Patiño-Martínez, and R. Jiménez-Peris. 2013. Transactional failure recovery for a distributed key-value store. In Proceedings of the 14th Middleware (Middleware'13). Springer, Berlin, China, 267--286.Google Scholar
- P. Almeida, C. Baquero, N. Preguiça, and D. Hutchison. 2007. Scalable Bloom filters. Information Processing Letters 101, 6 (March 2007), 255--261. Google ScholarDigital Library
- C. Amza, A. Cox, and W. Zwaenepoel. 2003. Conflict-aware scheduling for dynamic content applications. In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems (USITS'03). USENIX Association, Berkeley, CA. Google ScholarDigital Library
- B. Ban and V. Blagojevic. 2002. Reliable Group Communication with JGroups 3.x. Technical Report. Red Hat, Inc. Retrieved from http://www.jgroups.org.Google Scholar
- C. Bauer and G. King. 2006. Java Persistence with Hibernate. Manning Publications. Google ScholarDigital Library
- C. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, New York. Google ScholarDigital Library
- B. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (July 1970), 422--426. Google ScholarDigital Library
- K. Chandy and J. Hewes. 1976. File allocation in distributed systems. In Proceedings of the ACM SIGMETRICS (SIGMETRICS'76). ACM, New York, 10--13. Google ScholarDigital Library
- F. Chang and others. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Compututer Systems 26, 2 (June 2008), 4:1--4:26. Google ScholarDigital Library
- B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. 2004. The Bloomier filter: An efficient data structure for static support lookup tables. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'04). Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- H. Chen, M. Song, J. Song, A. Gavrilovska, and K. Schwan. 2011. HEaRS: A hierarchical energy-aware resource scheduler for virtualized data centers. In Proceedings of the International Conference on Cluster Computing (CLUSTER'11). IEEE, New York, 508--512. Google ScholarDigital Library
- N. Cook, D. Milojicic, and V. Talwar. 2012. Cloud management. Journal of Internet Services and Applications 3, 1 (2012), 67--75.Google ScholarCross Ref
- B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. 2008. PNUTS: Yahoo!'s hosted data serving platform. In Proceedings of the 34th International Conference on Very Large Databases (VLDB'08). VLDB Endowment, Auckland, New Zealand. Google ScholarDigital Library
- B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC'10). ACM, New York, 143--154. Google ScholarDigital Library
- J. Corbett and others. 2012. Spanner: Google's globally-distributed database. In Proceedings of the 10th Symposium on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, 251--264. Google ScholarDigital Library
- F. Cruz, F. Maia, M. Matos, R. Oliveira, J. Paulo, J. Pereira, and R. Vilaça. 2013. MeT: Workload aware elasticity for NoSQL. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys'13). ACM, New York, 183--196. Google ScholarDigital Library
- C. Curino, E. Jones, Y. Zhang, and S. Madden. 2010. Schism: A workload-driven approach to database replication and partitioning. In Proceedings of the 36th International Conference on Very Large Databases (VLDB'10). VLDB Endowment, Singapore. Google ScholarDigital Library
- G. DeCandia and others. 2007. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP'07). ACM, New York, 205--220. Google ScholarDigital Library
- D. Didona, P. Romano, S. Peluso, and F. Quaglia. 2012. Transactional auto scaler: Elastic scaling of in-memory transactional data grids. In Proceedings of the 9th ACM International Conference on Autonomic Computing (ICAC'12). ACM, San Jose, CA, 125--134. Google ScholarDigital Library
- P. Domingos and G. Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (SIGKDD'12). ACM, Boston, Massachusetts, USA. Google ScholarDigital Library
- L. Dowdy and D. Foster. 1982. Comparative models of the file assignment problem. ACM Computing Surveys 14, 2 (June 1982), 287--313. Google ScholarDigital Library
- B. Fleisch and G. Popek. 1989. Mirage: A coherent distributed shared memory design. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP'89). ACM, New York, 211--223. Google ScholarDigital Library
- T. Forell, D. Milojicic, and V. Talwar. 2011. Cloud management: Challenges and opportunities. In IPDPS Workshops. IEEE, Los Alamitos, CA, 881--889. Google ScholarDigital Library
- S. Garbatov and J. Cachopo. 2011. Data access pattern analysis and prediction for object-oriented applications. INFOCOMP Journal of Computer Science 10, 4 (December 2011), 1--14.Google Scholar
- Y. Jia, I. Brondino, R. Jiménez-Peris, M. Patiño Martínez, and D. Ma. 2013. A multi-resource load balancing algorithm for cloud cache systems. In Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC'13). ACM, New York, 463--470. Google ScholarDigital Library
- R. Jiménez-Peris, M. Patiño Martínez, and G. Alonso. 2002. Non-intrusive, parallel recovery of replicated data. In Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02). IEEE, Los Alamitos, CA, 150--159. Google ScholarDigital Library
- P. Krishnan, D. Raz, and Y. Shavitt. 2000. The cache location problem. IEEE/ACM Transactions on Networking 8, 5 (October 2000), 568--582. Google ScholarDigital Library
- L. Sangyeol and L. Taewook. 2004. CUSUM test for parameter change based on the maximum likelihood estimator. Sequential Analysis: Design Methods and Applications 23, 2 (2004), 239--256.Google Scholar
- A. Lakshman and P. Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Operating Systems Review 44, 2 (April 2010), 35--40. Google ScholarDigital Library
- N. Laoutaris, O. Telelis, V. Zissimopoulos, and I. Stavrakakis. 2006. Distributed selfish replication. IEEE Transactions on Parallel and Distributed Systems 17, 12 (December 2006), 1401--1413. Google ScholarDigital Library
- A. Leff, J. Wolf, and P. Yu. 1993. Replication algorithms in a remote caching architecture. IEEE Transactions on Parallel and Distributed Systems 4, 11 (November 1993), 1185--1204. Google ScholarDigital Library
- S. Leutenegger and D. Dias. 1993. A modeling study of the TPC-C benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD'93). ACM, New York, 22--31. Google ScholarDigital Library
- S. Li, T. Abdelzaher, and M. Yuan. 2011. TAPA: Temperature aware power allocation in data center with Map-Reduce. In Proceedings of the IGCC Workshops. 1--8. Google ScholarDigital Library
- S. Li, S. Wang, F. Yang, S. Hu, F. Saremi, and T. Abdelzaher. 2013. Proteus: Power proportional memory cache cluster in data centers. In Proceedings of the 33rd International Conference on Distributed Computing Systems (ICDCS'13). IEEE, New York, 73--82. Google ScholarDigital Library
- H. Liu and H. Motoda. 1998. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, Norwell, MA. Google ScholarDigital Library
- F. Marchioni and M. Surtani. 2012. Infinispan Data Grid Platform. PACKT Publishing.Google Scholar
- A. Metwally, D. Agrawal, and A. El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th International Conference on Database Theory (ICDT'05). Springer-Verlag, 398--412. Google ScholarDigital Library
- T. Mitchell. 1997. Machine Learning. McGraw-Hill, New York. Google ScholarDigital Library
- A. Pavlo, C. Curino, and S. Zdonik. 2012. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD'12). ACM, New York, 61--72. Google ScholarDigital Library
- S. Peluso, P. Romano, and F. Quaglia. 2012a. SCORe: A scalable one-copy serializable partial replication protocol. In Proceedings of the 13th Middleware (Middleware'12). Springer-Verlag, New York, 456--475. Google ScholarDigital Library
- S. Peluso, P. Ruivo, P. Romano, F. Quaglia, and L. Rodrigues. 2012b. When scalability meets consistency: Genuine multiversion update-serializable partial data replication. In Proceedings of the 32nd International Conference on Distributed Computing Systems (ICDCS'12). IEEE, Los Alamitos, CA, 455--465. Google ScholarDigital Library
- J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- RedHat/JBoss. 2013. Non Blocking State Transfer V2. Retrieved from https://github.com/infinispan/infinispan/wiki/Non-Blocking-State-Transfer-V2.Google Scholar
- P. Romano, M. Little, F. Quaglia, L. Rodrigues, and V. Ziparo. 2014. Cloud-TM: Transactional, Object-oriented, Self-tuning Cloud Data Store. Technical Report 7. INESC-ID.Google Scholar
- P. Ruivo, M. Couceiro, P. Romano, and L. Rodrigues. 2011. Exploiting total order multicast in weakly consistent transactional caches. In Proceedings of the the 17th Pacific Rim International Symposium on Dependable Computing (PRDC'11). IEEE, Los Alamitos, CA. Google ScholarDigital Library
- A. L. Tatarowicz, C. Curino, E. Jones, and S. Madden. 2012. Lookup tables: Fine-grained partitioning for distributed databases. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). IEEE Computer Society, Washington, DC, 102--113. Google ScholarDigital Library
- R. Vilaça, R. Oliveira, and J. Pereira. 2011. A correlation-aware data placement strategy for key-value stores. In Proceedings of the 11th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS'11). Springer-Verlag, 214--227. Google ScholarDigital Library
- L. Wang, J. Xu, M. Zhao, and J. Fortes. 2011. Adaptive virtual resource management with fuzzy model predictive control. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC'11). ACM, New York, 191--192. Google ScholarDigital Library
- I. Witten and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- G.-Won You, S.-Won Hwang, and N. Jain. 2013. Ursa: Scalable load and power management in cloud storage systems. ACM Transactions on Storage 9, 1, Article 1 (March 2013), 29 pages. Google ScholarDigital Library
- S. Zaman and D. Grosu. 2011. A distributed algorithm for the replica placement problem. IEEE Transactions on Parallel and Distributed Systems 22, 9 (September 2011), 1455--1468. Google ScholarDigital Library
- V. Ziparo, F. Cottefoglie, D. Calisi, M. Zaratti, F. Giannone, and P. Romano. 2013. D4.3 - Prototype of pilot application I. In Cloud-TM Project. Retrieved from http://cloudtm.ist.utl.pt/.Google Scholar
Index Terms
- AutoPlacer: Scalable Self-Tuning Data Placement in Distributed Key-Value Stores
Recommendations
A machine learning assisted data placement mechanism for hybrid storage systems
AbstractEmerging applications produce massive files that show different properties in file size, lifetime, and read/write frequency. Existing hybrid storage systems place these files onto different storage mediums assuming that the access ...
Sibyl: adaptive and extensible data placement in hybrid storage systems using online reinforcement learning
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitectureHybrid storage systems (HSS) use multiple different storage devices to provide high and scalable storage capacity at high performance. Data placement across different devices is critical to maximize the benefits of such a hybrid system. Recent research ...
A priority-based data placement method for databases using solid-state drives
RACS '18: Proceedings of the 2018 Conference on Research in Adaptive and Convergent SystemsWhen applications require high I/O performance, solid-state drives (SSDs) are often preferable because they perform better than traditional hard-disk drives (HDDs). Therefore, database system response time can be improved by moving frequently used data ...
Comments