skip to main content
10.1145/2593069.2593127acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

dTune: Leveraging Reliable Code Generation for Adaptive Dependability Tuning under Process Variation and Aging-Induced Effects

Authors Info & Claims
Published:01 June 2014Publication History

ABSTRACT

Designing dependable on-chip manycore systems is subjected to consideration of multiple reliability threats, i.e. soft errors, aging, process variation, etc. In this paper, we introduce a novel adaptive Dependability Tuning (dTune) scheme for many-core processors. It leverages the knowledge of varying vulnerability and error masking properties of different applications along with multiple compiled versions (each offering distinct reliability and performance properties). Our dTune system dynamically tunes the dependability mode at the hardware level through hybrid Redundant Multithreading tuning and at the software level through selection of reliable code version under given performance constraints. It jointly accounts for soft errors and cores' performance variations due to design-time process variation and/or run-time aging-induced performance degradation. We compare our dTune system with four different state-of-the-art techniques and achieve on average 44% and up to 63% improved task reliability for different chip configurations, different variability maps, and different aging years.

References

  1. J. Henkel, L. Bauer, N. Dutt, P. Gupta, S. Nassif, M. Shafique, M. Tahoori, N. Wehn, "Reliable On-Chip Systems in the Nano-Era: Lessons Learnt and Future Trends", IEEE Design Automation Conference (DAC), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Int'l technology roadmap for semiconductors, 2009.Google ScholarGoogle Scholar
  3. S. Herbert, S. Garg, D. Marculescu, "Exploiting process variability in voltage/frequency control", IEEE TVLSI, 20(8):1392--1404, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Raghunathan, Y. Turakhia, S. Garg, D. Marculescu, "Cherry-Picking: Exploiting Process Variations in Dark-Silicon Homogeneous Chip Multi-Processors", IEEE DATE, pp. 39--44, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Herbert, D. Marculescu, "Characterizing chip-multiprocessor variability-tolerance", IEEE DAC, pp. 313--318, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K K.Rangan, M. Powell, G.-Y. Wei, D. Brooks, "Achieving Uniform Performance and Maximizing Throughput in the Presence of Heterogeneity", IEEE HPCA, pp. 3--14, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Lin, Wayne Burleson, "Analysis and Mitigation of Process Variation Impacts on Power-Attack Tolerance", IEEE DAC, pp. 238--243, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K.A. Bowman, S.G. Duvall, J.D. Meindl, "Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration", IEEE Journal of Solid-State Circuits, 37(2):183--190, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Zheng et al., "Circuit Aging Prediction for Low Power Operation", CICC, pp. 427--430, 2009.Google ScholarGoogle Scholar
  10. J. Henkel, T. Ebi, H. Amrouch, H. Khdr, "Thermal management for dependable on-chip systems", ASP-DAC, pp. 113--118, 2013.Google ScholarGoogle Scholar
  11. I. Kadayif, M. Kandemir, I. Kolcu, "Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors", IEEE DATE, pp. 1158--1163, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Srinivasan, K. S. Chatha, "Integer linear programming and heuristic techniques for system-level low power scheduling on multiprocessor architectures under throughput constraints", Integration VLSI, vol. 40, no.3, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Masrur, et al., "Schedulability Analysis for Processors with Aging-Aware Automatic Frequency Scaling", IEEE (RTCSA), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. B. Velamala, K. Sutaria, T. Sato, Y. Cao, "Physics matters: statistical aging prediction under trapping/detrapping", IEEE DAC, pp. 139--144, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Vadlamani, J. Zhao, W. Burleson, R. Tessier, "Multicore soft error rate stabilization using adaptive dual modular redundancy", IEEE DATE, pp. 27--32, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hu, S. Wang, S. G. Ziavras, "In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability," IEEE DSN, pp. 281--290, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Shafique, S. Rehman, P. V. Aceituno, J. Henkel, "Exploiting Program-Level Masking and Error Propagation for Constrained Reliability Optimization", IEEE DAC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Rehman, M. Shafique, P. V. Aceituno, F. Kriebel, J.-J. Chen, J. Henkel, "Leveraging Variable Function Resilience for Selective Software Reliability on Unreliable Hardware", IEEE DATE, pp. 1759--1764, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Rehman, M. Shafique, F. Kriebel, J. Henkel, "Reliable software for unreliable hardware: Embedded code generation aiming at reliability", IEEE Codess+ISSS, pp. 237--246, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Rehman, M. Shafique, J. Henkel, "Instruction Scheduling for Reliability-Aware Compilation", IEEE DAC, pp. 1288--1296, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rehman. A. Toma, F. Kriebel, M. Shafique, J.-J. Chen, J. Henkel, "Reliable Code Generation and Execution on Unreliable Hardware under Joint Functional and Timing Reliability Considerations", IEEE RTAS, pp. 273--282, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. C. Smolens, B. T. Gold, B. Falsafi, J. C. Hoe, "Reunion: complexity-effective multicore redundancy", IEEE MICRO, pp. 223--234, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A.Shye, T. Moseley, V. Janapa Reddi, J. Blomstedt, D. A. Connors, "Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance", IEEE DSN, pp. 297--306, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Li, M. Shafique, S. Rehman, J. A. Ambrose, J. Henkel, S. Parameswaran, "DHASER: Dynamic Heterogeneous Adaptation for Soft-Error Resiliency in ASIP-based Multi-core Systems", IEEE ICCAD, pp. 646--653, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S K. Reinhardt, S. S. Mukherjee, "Transient Fault Detection via Simultaneous Multithreading", IEEE ISCA, pp. 25--34, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. S. Mukherjee, M. Kontz, S. K. Reinhardt, "Detailed design and evaluation of redundant multithreading alternatives", IEEE ISCA, pp. 99--110, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Rehman M. Shafique, F. Kriebel, J. Henkel, "Compiler-Driven Dynamic Reliability Management for On-Chip Systems under Variabilities", IEEE DATE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Xiong, V. Zolotov, L. He, "Robust extraction of spatial correlation", IEEE TCAD, 26(4):619--631, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M.A. Alam, S. Mahapatra, "A comprehensive model of pmos nbti degradation", Microelectronics Reliability, pp. 71--81, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  30. Flux calculator: www.seutest.com/cgi-bin/FluxCalculator.cgi.Google ScholarGoogle Scholar
  31. CES Aging Estimation Tools: http://ces.itec.kit.edu/download/Google ScholarGoogle Scholar
  1. dTune: Leveraging Reliable Code Generation for Adaptive Dependability Tuning under Process Variation and Aging-Induced Effects

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      DAC '14: Proceedings of the 51st Annual Design Automation Conference
      June 2014
      1249 pages
      ISBN:9781450327305
      DOI:10.1145/2593069

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,770of5,499submissions,32%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader