ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0

Authors:
Asif Ali Khan

Chair for Compiler Construction, Technische Universität Dresden, Dresden, Germany

Chair for Compiler Construction, Technische Universität Dresden, Dresden, Germany

0000-0002-5130-9855
View Profile

,
Fazal Hameed

Chair for Compiler Construction, Technische Universitat Dresden, Germany and Institute of Space Technology, Pakistan

Chair for Compiler Construction, Technische Universitat Dresden, Germany and Institute of Space Technology, Pakistan
View Profile

,
Robin Bläsing

Max Planck Institute of Microstructure Physics, Halle (Saale), Germany

Max Planck Institute of Microstructure Physics, Halle (Saale), Germany
View Profile

,
Stuart S. P. Parkin

Max Planck Institute of Microstructure Physics, Halle (Saale), Germany

Max Planck Institute of Microstructure Physics, Halle (Saale), Germany
View Profile

,
Jeronimo Castrillon

Chair for Compiler Construction, Technische Universität Dresden, Germany

Chair for Compiler Construction, Technische Universität Dresden, Germany
View Profile

ACM Transactions on Architecture and Code Optimization Volume 16 Issue 4Article No.: 56pp 1–23https://doi.org/10.1145/3372489

Published:26 December 2019Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This article presents data-placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime, thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and we revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%.

References

Ehsan Atoofian. 2015. Reducing shift penalty in domain wall memory through register locality. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’15). IEEE Press, Piscataway, NJ, 177--186. Retrieved from http://dl.acm.org/citation.cfm?id=2830689.2830689.2830711.Google ScholarCross Ref
Sunil Atri, J. Ramanujam, and Mahmut T. Kandemir. 2001. Improving offset assignment for embedded processors. In Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers (LCPC’00). Springer-Verlag, London, 158--172. Retrieved from http://dl.acm.org/citation.cfm?id=645678.663953.Google Scholar
David H. Bartley. 1992. Optimizing stack frame accesses for processors with restricted addressing modes. Softw. Pract. Exper. 22, 2 (Feb. 1992), 101--110. DOI:https://doi.org/10.1002/spe.4380220202Google Scholar
Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. 1998. Cache-conscious data placement. SIGPLAN Not. 33, 11 (Oct. 1998), 139--149. DOI:https://doi.org/10.1145/291006.291036Google ScholarDigital Library
Xianzhang Chen, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Chun Jason Xue, Weiwen Jiang, and Yuangang Wang. 2016. Efficient data placement for improving data access performance on domain-wall memory. IEEE Trans. Very Large Scale Integr. Syst. 24, 10 (Oct. 2016), 3094--3104. DOI:https://doi.org/10.1109/TVLSI.2016.2537400Google ScholarDigital Library
Sangyeun Cho and Hyunjin Lee. 2009. Flip-n-write: A simple deterministic technique to improve pram write performance, energy and endurance. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). ACM, New York, NY, 347--357. DOI:https://doi.org/10.1145/1669112.1669157Google ScholarDigital Library
LLC Gurobi Optimization. 2018. Gurobi Optimizer Reference Manual. Retrieved from http://www.gurobi.com.Google Scholar
F. Hameed, A. A. Khan, and J. Castrillon. 2018. Performance and energy-efficient design of STT-RAM last-level cache. IEEE Trans. Very Large Scale Integr. Syst. 26, 6 (June 2018), 1059--1072. DOI:https://doi.org/10.1109/TVLSI.2018.2804938Google ScholarCross Ref
M. Hayashi, L. Thomas, C. Rettner, R. Moriya, Y. B. Bazaliy, and S. Parkin. 2007. Current driven domain wall velocities exceeding the spin angular momentum transfer rate in permalloy nanowires. Phys Rev Lett. 98, 3 (2007), 037204.Google ScholarCross Ref
Mario Jino and Jane W. S. Liu. 1978. Intelligent magnetic bubble memories. In Proceedings of the 5th Annual Symposium on Computer Architecture (ISCA’78). ACM, 166--174.Google Scholar
Michael Jünger and Sven Mallach. 2013. Solving the simple offset assignment problem as a traveling salesman. In Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems (M-SCOPES’13). ACM, New York, NY, 31--39. DOI:https://doi.org/10.1145/2463596.2463601Google ScholarDigital Library
A. A. Khan, F. Hameed, R. Bläsing, S. Parkin, and J. Castrillon. 2019. RTSim: A cycle-accurate simulator for racetrack memories. IEEE Comput. Architect. Lett. 18, 1 (Jan. 2019), 43--46. DOI:https://doi.org/10.1109/LCA.2019.2899306Google ScholarCross Ref
Asif Ali Khan, Norman A. Rink, Fazal Hameed, and Jeronimo Castrillon. 2019. Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads. In Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’19). ACM, New York, NY, 5--18. DOI:https://doi.org/10.1145/3316482.3326351Google ScholarDigital Library
Hoda Aghaei Khouzani and Chengmo Yang. 2017. A DWM-based stack architecture implementation for energy harvesting systems. ACM Trans. Embed. Comput. Syst. 16, 5s (Sept. 2017). DOI:https://doi.org/10.1145/3126543Google ScholarDigital Library
E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’13). 256--267.Google Scholar
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. SIGARCH Comput. Archit. News 37, 3 (June 2009), 2--13. DOI:https://doi.org/10.1145/1555815.1555758Google ScholarDigital Library
B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger. 2010. Phase-change technology and the future of main memory. IEEE Micro 30, 1 (Jan 2010), 143--143. DOI:https://doi.org/10.1109/MM.2010.24Google ScholarDigital Library
Rainer Leupers. 2003. Offset assignment showdown: Evaluation of DSP address code optimization algorithms. In Proceedings of the 12th International Conference on Compiler Construction (CC’03). Springer-Verlag, Berlin, 290--302. Retrieved from http://dl.acm.org/citation.cfm?id=1765931.1765960.Google ScholarDigital Library
R. Leupers and F. David. 1998. A uniform optimization technique for offset assignment problems. In Proceedings of the 11th International Symposium on System Synthesis. 3--8. DOI:https://doi.org/10.1109/ISSS.1998.730589Google Scholar
R. Leupers and P. Marwedel. 1996. Algorithms for address assignment in DSP code generation. In Proceedings of the International Conference on Computer Aided Design. 109--112. DOI:https://doi.org/10.1109/ICCAD.1996.569409Google ScholarCross Ref
Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, and Yanxiang He. 2012. MAC: Migration-aware compilation for STT-RAM-based hybrid cache in embedded systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’12). ACM, New York, NY, 351--356. DOI:https://doi.org/10.1145/2333660.2333738Google ScholarDigital Library
Q. Li, J. Li, L. Shi, M. Zhao, C. J. Xue, and Y. He. 2014. Compiler-assisted STT-RAM-based hybrid cache for energy efficient embedded systems. IEEE Trans. Very Large Scale Integr. Syst. 22, 8 (Aug. 2014), 1829--1840. DOI:https://doi.org/10.1109/TVLSI.2013.2278295Google ScholarCross Ref
Y. Li, S. Ghose, J. Choi, J. Sun, H. Wang, and O. Mutlu. 2017. Utility-based hybrid memory management. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’17). 152--165. DOI:https://doi.org/10.1109/CLUSTER.2017.130Google Scholar
Yun Liang and Shuo Wang. 2016. Performance-centric optimization for racetrack memory-based register file on GPUs. J. Comput. Sci. Technol. 31, 1 (Jan. 2016), 36--49.Google ScholarCross Ref
Stan Liao, Srinivas Devadas, Kurt Keutzer, Steve Tjiang, and Albert Wang. 1995. Storage assignment to decrease code size. SIGPLAN Not. 30, 6 (June 1995), 186--195. DOI:https://doi.org/10.1145/223428.207139Google ScholarDigital Library
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). ACM, New York, NY, 190--200. DOI:https://doi.org/10.1145/1065010.1065034Google ScholarDigital Library
Sven Mallach. 2015. More general optimal offset assignment. Leibniz Trans. Embed. Syst. 2, 1 (2015), 02--1--02:18. DOI:https://doi.org/10.4230/LITES-v002-i001-a002Google Scholar
Sven Mallach and Roberto Castañeda Lozano. 2014. Optimal general offset assignment. In Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems (SCOPES’14). ACM, New York, NY, 50--59. DOI:https://doi.org/10.1145/2609248.2609251Google ScholarDigital Library
H. Mao, C. Zhang, G. Sun, and J. Shu. 2015. Exploring data placement in racetrack memory-based scratchpad memory. In Proceedings of the IEEE Non-Volatile Memory System and Applications Symposium (NVMSA’15). 1--5. DOI:https://doi.org/10.1109/NVMSA.2015.7304358Google ScholarCross Ref
M. Mao, W. Wen, Y. Zhang, Y. Chen, and H. Li. 2014. Exploration of GPGPU register file architecture using domain-wall-shift-write-based racetrack memory. In Proceedings of the 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14). 1--6.Google Scholar
I. Mihai Miron, T. Moore, H. Szambolics, L. Buda-Prejbeanu, S. Auffret, B. Rodmacq, S. Pizzini, J. Vogel, M. Bonfim, A. Schuhl, and G. Gaudin. 2011. Fast current-induced domain-wall motion controlled by the Rashba effect. Nat Mater. 10, 6 (2011), 419--23. DOI:10.1038/nmat3020Google ScholarCross Ref
Sparsh Mittal and Jeffrey Vetter. 2015. A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans. Parallel Distrib. Syst. 27 (Jan. 2015). DOI:https://doi.org/10.1109/TPDS.2015.2442980Google Scholar
S. Mittal, J. S. Vetter, and D. Li. 2015. A survey of architectural approaches for managing embedded DRAM and non-volatile on-chip caches. IEEE Trans. Parallel Distrib. Syst. 26, 6 (June 2015), 1524--1537.Google ScholarDigital Library
Joonas Multanen, Asif Ali Khan, Pekka Jääskeläinen, Fazal Hameed, and Jeronimo Castrillon. 2019. SHRIMP: Efficient instruction delivery with domain wall memory. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’19). ACM, New York, NY.Google ScholarCross Ref
E. Park, S. Yoo, S. Lee, and H. Li. 2014. Accelerating graph computation with racetrack memory and pointer-assisted graph representation. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’14). 1--4. DOI:https://doi.org/10.7873/DATE.2014.172Google Scholar
Stuart Parkin, Masamitsu Hayashi, and Luc Thomas. 2008. Magnetic domain-wall racetrack memory. Science 320 (2008), 5873, 190--194. DOI:10.1126/science.1145799Google Scholar
Stuart Parkin and See-Hun Yang. 2015. Memory on the racetrack. Nat Nanotechnol. 10, 3 (March 2015), 195--198.Google ScholarCross Ref
S. S. Parkin. 2004. Shiftable Magnetic Shift Register and Method of Using the Same. US patent 6834005B1.Google Scholar
Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, and Stefano Markidis. 2017. RTHMS: A tool for data placement on hybrid memory system. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM’17). ACM, New York, NY, 82--91. DOI:https://doi.org/10.1145/3092255.3092273Google ScholarDigital Library
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 24--33. DOI:https://doi.org/10.1145/1555754.1555760Google Scholar
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 85--95. DOI:https://doi.org/10.1145/1995896.1995911Google ScholarDigital Library
A. Ranjan, S. G. Ramasubramanian, R. Venkatesan, V. Pai, K. Roy, and A. Raghunathan. 2015. DyReCTape: A dynamically reconfigurable cache using domain wall memory tapes. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’15). 181--186. DOI:https://doi.org/10.7873/DATE.2015.0838Google Scholar
Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. 2003. Hybrid analysis: Static 8 dynamic memory reference analysis. Int. J. Parallel Program. 31, 4 (Aug. 2003), 251--283. DOI:https://doi.org/10.1023/A:1024597010150Google ScholarDigital Library
K.-Su Ryu, L. Thomas, S-Hun Yang, and S. Parkin. 2013. Chiral spin torque at magnetic domain wall. Nat Nanotechnol. 8, 7 (2013), 527--33. DOI:10.1038/nnano.2013Google ScholarCross Ref
H. Servat, A. J. Peña, G. Llort, E. Mercadal, H. Hoppe, and J. Labarta. 2017. Automating the application data placement in hybrid memory systems. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’17). 126--136. DOI:https://doi.org/10.1109/CLUSTER.2017.50Google Scholar
Zhenyu Sun, Xiuyuan Bi, Alex K. Jones, and Hai Li. 2014. Design exploration of racetrack lower-level caches. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). ACM, New York, NY, 263--266. DOI:https://doi.org/10.1145/2627369.2627651Google ScholarDigital Library
Z. Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra-high density and low power consumption. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 1--6.Google ScholarDigital Library
Rangharajan Venkatesan, Vivek Kozhikkottu, Charles Augustine, Arijit Raychowdhury, Kaushik Roy, and Anand Raghunathan. 2012. TapeCache: A high-density, energy-efficient cache based on domain wall memory. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’12). ACM, New York, NY, 185--190. DOI:https://doi.org/10.1145/2333660.2333707Google ScholarDigital Library
O. Voegeli, B. A. Calhoun, L. L. Rosier, and J. C. Slonczewski. 1975. The use of bubble lattices for information storage. AIP Conf. Proc. 24, 1 (1975), 617--619.Google ScholarCross Ref
Shuo Wang, Yun Liang, Chao Zhang, Xiaolong Xie, Guangyu Sun, Yongpan Liu, Yu Wang, and Xiuhong Li. 2016. Performance-centric register file design for GPUs using racetrack memory. In Proceedings of the 21st Asia and South Pacific Design Automation Conference (ASP-DAC’16). 25--30. DOI:https://doi.org/10.1109/ASPDAC.2016.7427984Google Scholar
Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie. 2014. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 13--24. DOI:https://doi.org/10.1109/HPCA.2014.6835933Google Scholar
Wei Wei, Dejun Jiang, Sally A. McKee, Jin Xiong, and Mingyu Chen. 2015. Exploiting program semantics to place data in hybrid memory. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT’15). IEEE Computer Society, Washington, DC, 163--173. DOI:https://doi.org/10.1109/PACT.2015.10Google ScholarDigital Library
C. K. Wong and P. C. Yue. 1976. Data organization in magnetic bubble lattice files. IBM J. Res. Dev. 20, 6 (Nov. 1976), 576--581.Google ScholarDigital Library
H. P. Wong, H. Lee, S. Yu, Y. Chen, Y. Wu, P. Chen, B. Lee, F. T. Chen, and M. Tsai. 2012. Metal-Oxide RRAM. Proc. IEEE 100, 6 (June 2012), 1951--1970. DOI:https://doi.org/10.1109/JPROC.2012.2190369Google ScholarCross Ref
H.-S. Philip Wong, Simone Raoux, Sangbum Kim, Jiale Liang, John Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth Goodson. 2010. Phase change memory. Proc. of the IEEE 98, 12 (2010), 2201--2227. DOI:10.1109/JPROC.2010.2070050Google ScholarCross Ref
H. Xu, Y. Alkabani, R. Melhem, and A. K. Jones. 2016. FusedCache: A naturally inclusive, racetrack memory, dual-level private cache. IEEE Trans. Multi-Scale Comput. Syst. 2, 2 (Apr. 2016), 69--82. DOI:https://doi.org/10.1109/TMSCS.2016.2536020Google ScholarCross Ref
Haifeng Xu, Yong Li, R. Melhem, and A. K. Jones. 2015. Multilane racetrack caches: Improving efficiency through compression and independent shifting. In Proceedings of the 20th Asia and South Pacific Design Automation Conference. 417--422. DOI:https://doi.org/10.1109/ASPDAC.2015.7059042Google Scholar
See-Hun Yang, Kwang-Su Ryu, and Stuart Parkin. 2015. Domain-wall velocities of up to 750 m/s driven by exchange-coupling torque in synthetic antiferromagnets. Nat Nanotechnol. 10, 3 (2015), 221--6. DOI:10.1038/nnano.2014.324Google ScholarCross Ref
HanBin Yoon. 2012. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the IEEE 30th International Conference on Computer Design (ICCD’12). IEEE Computer Society, Washington, DC, 337--344. DOI:https://doi.org/10.1109/ICCD.2012.6378661Google ScholarDigital Library
Hanbin Yoon, Justin Meza, Naveen Muralimanohar, Norman P. Jouppi, and Onur Mutlu. 2014. Efficient data mapping and buffering techniques for multilevel cell phase-change memories. ACM Trans. Archit. Code Optim. 11, 4 (Dec. 2014). DOI:https://doi.org/10.1145/2669365Google ScholarDigital Library
Chao Zhang, Guangyu Sun, Weiqi Zhang, Fan Mi, Hai Li, and W. Zhao. 2015. Quantitative modeling of racetrack memory, a tradeoff among area, performance, and power. In Proceedings of the 20th Asia and South Pacific Design Automation Conference. 100--105. DOI:https://doi.org/10.1109/ASPDAC.2015.7058988Google ScholarCross Ref
Y. Zhang, W. Zhao, J. Klein, D. Ravelsona, and C. Chappert. 2012. Ultra-high density content addressable memory based on current induced domain wall motion in magnetic track. IEEE Trans. Magnet. 48, 11 (Nov. 2012), 3219--3222. DOI:https://doi.org/10.1109/TMAG.2012.2198876Google ScholarCross Ref
W. Zhao, N. Ben Romdhane, Y. Zhang, J. Klein, and D. Ravelosona. 2013. Racetrack memory-based reconfigurable computing. In Proceedings of the IEEE Faible Tension Faible Consommation. 1--4. DOI:https://doi.org/10.1109/FTFC.2013.6577771Google Scholar
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. SIGARCH Comput. Archit. News 37, 3 (June 2009), 14--23. DOI:https://doi.org/10.1145/1555815.1555759Google ScholarDigital Library

Index Terms

ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0
1. Social and professional topics
  1. Professional topics
    1. History of computing
      1. History of programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages

Recommendations

Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories
Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers

Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on ...
Read More
Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on ...
Read More
TapeCache: a high density, energy efficient cache based on domain wall memory
ISLPED '12: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

Domain Wall Memory (DWM) is a recently developed spin-based memory technology in which several bits of data are densely packed into the domains of a ferromagnetic wire. DWM has shown great promise in enabling non-volatile memory with unprecedented ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 16, Issue 4
December 2019
572 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3366460
Editor:
Koen De Bosschere
Ghent University, Belgium
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 December 2019
- Accepted: 1 November 2019
- Revised: 1 October 2019
- Received: 1 January 2019
Published in taco Volume 16, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compiler optimization
data placement
domain wall memory
heuristics
integer linear programming
racetrack memory
shifts minimization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 1,144
  Total Downloads
- Downloads (Last 12 months)195
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM Memories

Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads

TapeCache: a high density, energy efficient cache based on domain wall memory