ABSTRACT
Motivated by emerging memory technologies and the increasing importance of energy and bandwidth, we study the Writeback-Aware Caching Problem. This problem modifies the caching problem by explicitly accounting for the cost of writing data to memory. In the offline setting with maximum writeback cost ømega > 0, we show that the writeback-oblivious optimal policy is only (ømega+1)-competitive for writeback-aware caching, and that writeback-aware caching is NP-complete and Max-SNP hard. In the online setting, we present a deterministic online replacement policy, called Writeback-Aware Landlord, and show that it obtains the optimal competitive ratio. Finally, we perform an experimental study on real-world traces which shows that Writeback-Aware Landlord outperforms state-of-the-art cache replacement policies when writebacks are costly.
- Ameen Akel, Adrian M. Caulfield, Todor I. Mollov, Rajesh K. Gupta, and Steven Swanson. 2011. Onyx: A Prototype Phase Change Memory Storage Array. HotStorage, Vol. 1 (2011), 1.Google Scholar
- Susanne Albers, Sanjeev Arora, and Sanjeev Khanna. 1999. Page replacement for general caching problems. In SODA, Vol. 99. Citeseer, 31--40.Google Scholar
- Joy Arulraj and Andrew Pavlo. 2017. How to build a non-volatile memory database management system. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1753--1758.Google ScholarDigital Library
- Amotz Bar-Noy, Reuven Bar-Yehuda, Ari Freund, Joseph Naor, and Baruch Schieber. 2001. A unified approach to approximating resource allocation and scheduling. Journal of the ACM (JACM), Vol. 48, 5 (2001), 1069--1090. Google ScholarDigital Library
- Daniel Bausch, Ilia Petrov, and Alejandro Buchmann. 2012. Making cost-based query optimization asymmetry-aware. In Proceedings of the Eighth International Workshop on Data Management on New Hardware. ACM, 24--32.Google ScholarDigital Library
- Nathan Beckmann and Daniel Sanchez. 2017. Maximizing cache performance under uncertainty. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 109--120.Google ScholarCross Ref
- Laszlo A. Belady. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal, Vol. 5, 2 (1966), 78--101. Google ScholarDigital Library
- Laszlo A. Belady and Frank P. Palermo. 1974. On-line measurement of paging behavior by the multivalued MIN algorithm. IBM Journal of Research and Development, Vol. 18, 1 (1974), 2--19. Google ScholarDigital Library
- Avraham Ben-Aroya and Sivan Toledo. 2006. Competitive analysis of flash-memory algorithms. In European Symposium on Algorithms. Springer, 100--111. Google ScholarDigital Library
- Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian Shun. 2016. Parallel algorithms for asymmetric read-write costs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 145--156.Google ScholarDigital Library
- Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian Shun. 2018. Implicit Decomposition for Write-Efficient Connnectivity Algorithms. In International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 711--722.Google Scholar
- Daniel S. Berger, Nathan Beckmann, and Mor Harchol-Balter. 2018. Practical Bounds on Optimal Caching with Variable Object Sizes. Proc. ACM Meas. Anal. Comput. Syst. (SIGMETRICS'18) (2018). Google ScholarDigital Library
- Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. 2015. Sorting with asymmetric read and write costs. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 1--12.Google ScholarDigital Library
- Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. 2016. Efficient algorithms with asymmetric read and write costs. In European Symposium on Algorithms .Google Scholar
- Guy E. Blelloch, Yan Gu, Julian Shun, and Yihan Sun. 2018. Parallel write-efficient algorithms and data structures for computational geometry. In Proceedings of the 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 235--246.Google ScholarDigital Library
- Mark Brehob, Stephen Wagner, Eric Torng, and Richard Enbody. 2004. Optimal replacement is NP-hard for nonstandard caches. IEEE Transactions on computers, Vol. 53, 1 (2004), 73--76. Google ScholarDigital Library
- Pei Cao and Sandy Irani. 1997. Cost-aware www proxy caching algorithms.. In Usenix symposium on internet technologies and systems, Vol. 12. 193--206. Google ScholarDigital Library
- Erin Carson, James Demmel, Laura Grigori, Nicholas Knight, Penporn Koanantakool, Oded Schwartz, and Harsha Vardhan Simhadri. 2016. Write-avoiding algorithms. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 648--658.Google ScholarCross Ref
- Shimin Chen, Phillip B. Gibbons, and Suman Nath. 2011. Rethinking Database Algorithms for Phase Change Memory. In Proc. Conference on Innovative Data Systems Research (CIDR) .Google Scholar
- Shimin Chen and Qin Jin. 2015. Persistent b+-trees in non-volatile main memory. Proceedings of the VLDB Endowment, Vol. 8, 7 (2015), 786--797. Google ScholarDigital Library
- Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 347--357. Google ScholarDigital Library
- Marek Chrobak, Howard J. Karloff, T. H. Payne, and Sundar Vishwanathan. 1991. New Results on Server Problems. In SIAM Journal on Discrete Mathematics. 172--181. Google ScholarDigital Library
- Marek Chrobak, Gerhard J. Woeginger, Kazuhisa Makino, and Haifeng Xu. 2012. Caching is hard-even in the fault model. Algorithmica, Vol. 63, 4 (2012), 781--794. Google ScholarDigital Library
- Alexei Colin and Brandon Lucia. 2018. Termination checking and task decomposition for task-based intermittent programs. In Proceedings of the 27th International Conference on Compiler Construction. ACM, 116--127. Google ScholarDigital Library
- Intel Corporation. 2018. Optane SSD DC P4800X Series. Retrieved online on 11 Jan 2019 at https://ark.intel.com/products/97161/Intel-Optane-SSD-DC-P4800X-Series-375GB-2--5in-PCIe-x4--3D-XPoint-.Google Scholar
- Subhasis Das, Tor M. Aamodt, and William J. Dally. 2016. Reuse distance-based probabilistic cache replacement. ACM Transactions on Architecture and Code Optimization (TACO), Vol. 12, 4 (2016), 33. Google ScholarDigital Library
- Robert H. Dennard, Fritz H. Gaensslen, V. Leo Rideout, Ernest Bassous, and Andre R. LeBlanc. 1974. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid-State Circuits, Vol. 9, 5 (1974), 256--268.Google ScholarCross Ref
- Xiangyu Dong, Norman P. Jouppi, and Yuan Xie. 2009. PCRAMsim: System-level performance, energy, and area modeling for phase-change RAM. In Computer-Aided Design-Digest of Technical Papers, 2009. ICCAD 2009. IEEE/ACM International Conference on. IEEE, 269--275. Google ScholarDigital Library
- Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, Helen Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE. IEEE, 554--559. Google ScholarDigital Library
- Nam Duong, Dali Zhao, Taesu Kim, Rosario Cammarota, Mateo Valero, and Alexander V. Veidenbaum. 2012. Improving cache management policies using dynamic reuse distances. In Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on. IEEE, 389--400. Google ScholarDigital Library
- David Eppstein, Michael T. Goodrich, Michael Mitzenmacher, and Paweł Pszona. 2014. Wear minimization for cuckoo hashing: How not to throw a lot of eggs into one basket. In International Symposium on Experimental Algorithms. Springer, 162--173. Google ScholarDigital Library
- Guy Even, Moti Medina, and Dror Rawitz. 2018. Online generalized caching with varying weights and costs. In Proceedings of the 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 205--212.Google ScholarDigital Library
- Martin Farach-Colton and Vincenzo Liberatore. 2000. On local register allocation. Journal of Algorithms, Vol. 37, 1 (2000), 37--65. Google ScholarDigital Library
- Amos Fiat, Richard M. Karp, Michael Luby, Lyle A McGeoch, Daniel D. Sleator, and Neal E. Young. 1991. Competitive paging algorithms. Journal of Algorithms, Vol. 12, 4 (1991), 685--699. Google ScholarDigital Library
- Eran Gal and Sivan Toledo. 2005. Algorithms and data structures for flash memories. ACM Computing Surveys (CSUR), Vol. 37, 2 (2005), 138--163. Google ScholarDigital Library
- Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: anomalies, observations, and applications. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on. IEEE, 24--33. Google ScholarDigital Library
- Yan Gu, Yihan Sun, and Guy E. Blelloch. 2018. Algorithmic building blocks for asymmetric memories. In European Symposium on Algorithms. 44:1--44:15.Google Scholar
- Riko Jacob and Nodari Sitchinava. 2017. Lower bounds in the asymmetric external memory model. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, 247--254.Google ScholarDigital Library
- Akanksha Jain and Calvin Lin. 2016. Back to the future: leveraging Belady's algorithm for improved cache replacement. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 78--89. Google ScholarDigital Library
- Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr, and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 60--71. Google ScholarDigital Library
- Daniel A. Jiménez. 2013. Insertion and promotion for tree-based PseudoLRU last-level caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 284--296. Google ScholarDigital Library
- Georgios Keramidas, Pavlos Petoumenos, and Stefanos Kaxiras. 2007. Cache replacement based on reuse-distance prediction. In Computer Design, 2007. ICCD 2007. 25th International Conference on. IEEE, 245--250.Google ScholarCross Ref
- Samira Manabi Khan, Yingying Tian, and Daniel A. Jimenez. 2010. Sampling dead block prediction for last-level caches. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 175--186. Google ScholarDigital Library
- Hyojun Kim, Sangeetha Seshadri, Clement L. Dickey, and Lawrence Chiu. 2014. Evaluating phase change memory for enterprise storage systems: A study of caching and tiering approaches. ACM Transactions on Storage (TOS), Vol. 10, 4 (2014), 15. Google ScholarDigital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 2--13. Google ScholarDigital Library
- Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2010. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report. U.T. Austin.Google Scholar
- Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. 1970. Evaluation techniques for storage hierarchies. IBM Systems journal, Vol. 9, 2 (1970), 78--117. Google ScholarDigital Library
- Jagan Singh Meena, Simon Min Sze, Umesh Chand, and Tseung-Yuen Tseng. 2014. Overview of emerging nonvolatile memory technologies. Nanoscale research letters, Vol. 9, 1 (2014), 526.Google Scholar
- Gerhard Muller, Nicolas Nagel, C-U Pinnow, and T. Rohr. 2003. Emerging non-volatile memory technologies. In Solid-State Circuits Conference, 2003. ESSCIRC'03. Proceedings of the 29th European. IEEE, 37--44.Google Scholar
- Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage (TOS), Vol. 4, 3 (2008), 10.Google ScholarDigital Library
- Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data. ACM, 371--386.Google ScholarDigital Library
- Hyoungmin Park and Kyuseok Shim. 2009. FAST: Flash-aware external sorting for mobile database systems. Journal of Systems and Software, Vol. 82, 8 (2009), 1298--1312. Google ScholarDigital Library
- Hanfeng Qin and Hai Jin. 2017. Warstack: Improving LLC Replacement for NVM with a Writeback-Aware Reuse Stack. In Parallel, Distributed and Network-based Processing (PDP), 2017 25th Euromicro International Conference on. IEEE, 233--236.Google ScholarCross Ref
- Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran. 2011. Phase change memory: From devices to systems. Synthesis Lectures on Computer Architecture, Vol. 6, 4 (2011), 1--134. Google ScholarCross Ref
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In ACM SIGARCH Computer Architecture News, Vol. 35. ACM, 381--391. Google ScholarDigital Library
- Daniel D. Sleator and Robert E. Tarjan. 1985. Amortized efficiency of list update and paging rules. Commun. ACM, Vol. 28, 2 (1985), 202--208.Google ScholarDigital Library
- Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. ACM SIGARCH Computer Architecture News, Vol. 38, 3 (2010), 72--82.Google ScholarDigital Library
- Masamichi Takagi and Kei Hiraki. 2004. Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches. In Proceedings of the 18th annual international conference on Supercomputing. ACM, 20--30. Google ScholarDigital Library
- Qinghui Tang, Sandeep Kumar S. Gupta, and Georgios Varsamopoulos. 2008. Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach. IEEE Transactions on Parallel and Distributed Systems, Vol. 19, 11 (2008), 1458--1472.Google ScholarDigital Library
- Stratis D. Viglas. 2012. Adapting the B+-tree for asymmetric I/O. In East European Conference on Advances in Databases and Information Systems. Springer, 399--412. Google ScholarDigital Library
- Stratis D. Viglas. 2014. Write-limited sorts and joins for persistent memory. Proceedings of the VLDB Endowment, Vol. 7, 5 (2014), 413--424. Google ScholarDigital Library
- Zhe Wang, Samira M. Khan, and Daniel A. Jiménez. 2012. Improving writeback efficiency with decoupled last-write prediction. In ACM SIGARCH Computer Architecture News, Vol. 40. IEEE Computer Society, 309--320. Google ScholarDigital Library
- Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A. Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Transactions on Architecture and Code Optimization (TACO), Vol. 10, 4 (2013), 51.Google Scholar
- Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely Jr, and Joel Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 430--441. Google ScholarDigital Library
- Byung-Do Yang, Jae-Eun Lee, Jang-Su Kim, Junghyun Cho, Seung-Yun Lee, and Byoung-Gon Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on. IEEE, 3014--3017.Google ScholarCross Ref
- Neal Young. 1994. The k-server dual and loose competitiveness for paging. Algorithmica, Vol. 11, 6 (1994), 525--541. Google ScholarDigital Library
- Neal E. Young. 2002. On-line file caching. Algorithmica, Vol. 33, 3 (2002), 371--383.Google ScholarCross Ref
- Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, and Daniel Mossé. 2012. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM Transactions on Architecture and Code Optimization (TACO), Vol. 8, 4 (2012), 53.Google Scholar
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In ACM SIGARCH computer architecture news, Vol. 37. ACM, 14--23. Google ScholarDigital Library
- Omer Zilberberg, Shlomo Weiss, and Sivan Toledo. 2013. Phase-change memory: An architectural perspective. ACM Computing Surveys (CSUR), Vol. 45, 3 (2013), 29. Google ScholarDigital Library
Index Terms
- Writeback-Aware Caching (Brief Announcement)
Recommendations
Block-Granularity-Aware Caching
SPAA '21: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and ArchitecturesA common feature of computer systems is that block granularity changes at different levels of the storage hierarchy. This paper presents the first study of how granularity change affects caching. We define the Block-Granularity-Aware (BGA) Caching Model,...
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitectureOn-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Writeback Modeling: Theory and Application to Zipfian Workloads
MEMSYS '21: Proceedings of the International Symposium on Memory SystemsAs per-core CPU performance plateaus and data-bound applications like graph analytics and key-value stores become more prevalent, understanding memory performance is more important than ever. Many existing techniques to predict and measure cache ...
Comments