skip to main content
10.1145/3050748.3050756acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
research-article

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

Authors Info & Claims
Published:08 April 2017Publication History

ABSTRACT

Current computer architectures --- ARM, MIPS, PowerPC, SPARC, x86 --- have evolved from a 32-bit architecture to a 64-bit one. Computer architects often consider whether it could be possible to eliminate hardware support for a subset of the instruction set as to reduce hardware complexity, which could improve performance, reduce power usage and accelerate processor development. This paper considers the scenario where we want to eliminate 32-bit hardware support from the ARMv8 architecture.

Dynamic binary translation can be used for this purpose and generally comes in one of two forms: application-level translators that translate a single user mode process on top of a native operating system, and system-level translators that translate an entire operating system and all its processes.

Application-level translators can have good performance but is not totally transparent; system-level translators may be 100% compatible but performance suffers. HyperMAMBO-X64 uses a new approach that gets the best of both worlds, being able to run the translator as an application under the hypervisor but still react to the behavior of guest operating systems. It works with complete transparency with regards to the virtualized system whilst delivering performance close to that provided by hardware execution.

A key factor in the low overhead of HyperMAMBO-X64 is its deep integration with the virtualization and memory management features of ARMv8. These are exploited to support caching of translations across multiple address spaces while ensuring that translated code remains consistent with the source instructions it is based on. We show how these attributes are achieved without sacrificing either performance or accuracy.

References

  1. K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, pages 2--13. ACM, 2006. doi: 10.1145/1168857. 1168860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apple. Apple --- Rosetta, 2006. URL https://www.apple.com/rosetta/. [Archived at http://web.archive.org/web/20060113055505/http://www.apple.com/rosetta/].Google ScholarGoogle Scholar
  3. ARM. big.LITTLE technology: The future of mobile, 2013. URL https://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf. (Visited on 13/07/2016).Google ScholarGoogle Scholar
  4. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1--12. ACM, 2000. doi: 10.1145/349299.349303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th Annual International Symposium on Microarchitecture, pages 191--204. ACM/IEEE Computer Society, 2003. doi: 10.1109/MICRO.2003.1253195. Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, pages 164--177. ACM, 2003. doi: 10.1145/945445.945462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 41--46. USENIX, 2005. URL http://www.usenix.org/events/usenix05/tech/freenix/bellard.html.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Bhardwaj, P. Reames, R. Greenspan, V. S. Nori, and E. Ucan. A Choices hypervisor on the ARM architecture. Department of Computer Science, University of Illinois at Urbana-Champaign, 2006. CS523 Course Project Report.Google ScholarGoogle Scholar
  9. D. Boggs, G. Brown, N. Tuck, and K. S. Venkatraman. Denver: Nvidia's first 64-bit ARM processor. IEEE Micro, 35(2): 46--55, 2015. doi: 10.1109/MM.2015.12. Google ScholarGoogle ScholarCross RefCross Ref
  10. D. Bruening and V. Kiriansky. Process-shared and persistent code caches. In Proceedings of the 4th International Conference on Virtual Execution Environments, VEE 2008, pages 61--70. ACM, 2008. doi: 10.1145/1346256.1346265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. P. Bungale and C. Luk. PinOS: a programmable framework for whole-system dynamic instrumentation. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pages 137--147. ACM, 2007. doi: 10.1145/1254810.1254830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Chang, J. Wu, W. Hsu, P. Liu, and P. Yew. Efficient memory virtualization for cross-ISA system mode emulation. In 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '14, pages 117--128. ACM, 2014. doi: 10.1145/2576195.2576201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Chapman, D. J. Magenheimer, and P. Ranganathan. Magixen: Combining binary translation and virtualization. Technical report, Technical Report HPL-2007-77, Hewlett-Packard Laboratories, 2007.Google ScholarGoogle Scholar
  14. A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, and J. Yates. FX! 32: A profile-directed binary translator. IEEE Micro, (2):56--64, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Corbet. Supporting filesystems in persistent memory, 2014. URL https://lwn.net/Articles/610174/.Google ScholarGoogle Scholar
  16. K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 39th International Symposium on Computer Architecture (ISCA 2012), pages 213--224. IEEE Computer Society, 2012. doi: 10.1109/ISCA.2012. 6237019.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Dall and J. Nieh. KVM/ARM: the design and implementation of the Linux ARM hypervisor. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 333--348. ACM, 2014. doi: 10.1145/2541940. 2541946. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. d'Antras, C. Gorgovan, J. D. Garside, and M. Luján. Optimizing indirect branches in dynamic binary translators. ACM Transactions on Architecture and Code Optimization, 13(1): 7, 2016. doi: 10.1145/2866573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. d'Antras, C. Gorgovan, J. D. Garside, and M. Luján. Low overhead dynamic binary translation on ARM. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017. ACM, 2017.Google ScholarGoogle Scholar
  20. J. C. Dehnert, B. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta code morphing software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), pages 15--24. IEEE Computer Society, 2003. doi: 10.1109/CGO.2003.1191529. Google ScholarGoogle ScholarCross RefCross Ref
  21. J.-H. Ding, C.-J. Lin, P.-H. Chang, C.-H. Tsang, W.-C. Hsu, and Y.-C. Chung. ARMvisor: System virtualization for ARM. In Proceedings of the Ottawa Linux Symposium (OLS), pages 93--107, 2012.Google ScholarGoogle Scholar
  22. E. Duesterwald and V. Bala. Software profiling for hot path prediction: Less is more. In ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 202--211. ACM Press, 2000. doi: 10.1145/356989.357008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Grisenthwaite. ARMv8 Technology Preview, 2011.Google ScholarGoogle Scholar
  24. B. Hawkins, B. Demsky, D. Bruening, and Q. Zhao. Optimizing binary translation of dynamically generated code. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, pages 68--78. IEEE Computer Society, 2015. doi: 10.1109/CGO.2015.7054188. Google ScholarGoogle ScholarCross RefCross Ref
  25. R. J. Hookway and M. A. Herdeg. DIGITAL fx!32: Combining emulation and binary translation. Digital Technical Journal, 9(1), 1997. URL http://www.hpl.hp.com/hpjournal/dtj/vol9num1/vol9num1art1.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 190--200. ACM, 2005. doi: 10.1145/1065010.1065034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Moseley, D. A. Connors, D. Grunwald, and R. Peri. Identifying potential parallelism via loop-centric profiling. In Proceedings of the 4th Conference on Computing Frontiers, pages 143--152. ACM, 2007. doi: 10.1145/1242531.1242554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Patel, M. Daftedar, M. Shalan, and M. W. El-Kharashi. Embedded hypervisor xvisor: A comparative analysis. In 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015, pages 682--691. IEEE Computer Society, 2015. doi: 10.1109/PDP.2015.108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Penneman, D. Kudinskas, A. Rawsthorne, B. D. Sutter, and K. D. Bosschere. Formal virtualization requirements for the ARM architecture. Journal of Systems Architecture - Embedded Systems Design, 59(3):144--154, 2013. doi: 10. 1016/j.sysarc.2013.02.003.Google ScholarGoogle Scholar
  30. G. J. Popek and R. P. Goldberg. Formal requirements for virtualizable third generation architectures. Communications of the ACM, 17(7):412--421, 1974. doi: 10.1145/361011. 361073.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Fifth International Symposium on Code Generation and Optimization (CGO 2007), pages 74--88. IEEE Computer Society, 2007. doi: 10.1109/CGO.2007.29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Sato, Y. Inoguchi, and T. Nakamura. On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In Proceedings of the 8th Conference on Computing Frontiers, page 25. ACM, 2011. doi: 10.1145/2016604.2016634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Seal. ARM Architecture Reference Manual. Pearson Education, 2001.Google ScholarGoogle Scholar
  34. J. Seward and N. Nethercote. Using Valgrind to detect undefined value errors with bit-precision. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 17--30. USENIX, 2005. URL http://www.usenix.org/events/usenix05/tech/general/seward.html.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Smirnov, M. Zhidko, Y. Pan, P. Tsao, K. Liu, and T. Chiueh. Evaluation of a server-grade software-only ARM hypervisor. In 2013 IEEE Sixth International Conference on Cloud Computing, pages 855--862. IEEE, 2013. doi: 10.1109/CLOUD. 2013.71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  37. X. Tong, T. Koju, M. Kawahito, and A. Moshovos. Optimizing memory translation emulation in full system emulators. ACM Transactions on Architecture and Code Optimization, 11(4): 60:1--60:24, 2014. doi: 10.1145/2686034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Transitive. Transitive, 2008. URL http://www.transitive.com. [Archived at https://web.archive.org/web/20080914184751/http://www.transitive.com].Google ScholarGoogle Scholar
  39. A. van de Ven. An introduction to clear containers, 2015. URL https://lwn.net/Articles/644675/.Google ScholarGoogle Scholar
  40. C. Wang, S. Hu, H. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. StarDBT: An efficient multi-platform dynamic binary translation system. In Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Proceedings, volume 4697 of Lecture Notes in Computer Science, pages 4--15. Springer, 2007. doi: 10.1007/978-3-540-74309-5_3. Google ScholarGoogle ScholarCross RefCross Ref
  41. W. Wang, P. Yew, A. Zhai, and S. McCamant. A general persistent code caching framework for dynamic binary translation (DBT). In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, pages 591--603. USENIX Association, 2016. URL https://www.usenix.org/conference/atc16/technical-sessions/presentation/wang.Google ScholarGoogle Scholar
  42. J. Watson. Virtualbox: bits and bytes masquerading as machines. Linux Journal, 2008(166):1, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Q. Zhao, D. Koh, S. Raza, D. Bruening, W. Wong, and S. P. Amarasinghe. Dynamic cache contention detection in multithreaded applications. In Proceedings of the 7th International Conference on Virtual Execution Environments, VEE 2011, pages 27--38. ACM, 2011. doi: 10.1145/1952682.1952688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Zheng and C. L. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. IEEE Computer, 33(3):47--52, 2000. doi: 10.1109/2.825695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
      April 2017
      261 pages
      ISBN:9781450349482
      DOI:10.1145/3050748

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 April 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      VEE '17 Paper Acceptance Rate18of43submissions,42%Overall Acceptance Rate80of235submissions,34%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader