research-article

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

Authors:
Amanieu d'Antras

School of Computer Science, University of Manchester

School of Computer Science, University of Manchester
View Profile

,
Cosmin Gorgovan

School of Computer Science, University of Manchester

School of Computer Science, University of Manchester
View Profile

,
Jim Garside

School of Computer Science, University of Manchester

School of Computer Science, University of Manchester
View Profile

,
John Goodacre

School of Computer Science, University of Manchester

School of Computer Science, University of Manchester
View Profile

,
Mikel Luján

School of Computer Science, University of Manchester

School of Computer Science, University of Manchester
View Profile

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsApril 2017Pages 228–241https://doi.org/10.1145/3050748.3050756

Published:08 April 2017Publication History

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Pages 228–241

ABSTRACT

Current computer architectures --- ARM, MIPS, PowerPC, SPARC, x86 --- have evolved from a 32-bit architecture to a 64-bit one. Computer architects often consider whether it could be possible to eliminate hardware support for a subset of the instruction set as to reduce hardware complexity, which could improve performance, reduce power usage and accelerate processor development. This paper considers the scenario where we want to eliminate 32-bit hardware support from the ARMv8 architecture.

Dynamic binary translation can be used for this purpose and generally comes in one of two forms: application-level translators that translate a single user mode process on top of a native operating system, and system-level translators that translate an entire operating system and all its processes.

Application-level translators can have good performance but is not totally transparent; system-level translators may be 100% compatible but performance suffers. HyperMAMBO-X64 uses a new approach that gets the best of both worlds, being able to run the translator as an application under the hypervisor but still react to the behavior of guest operating systems. It works with complete transparency with regards to the virtualized system whilst delivering performance close to that provided by hardware execution.

A key factor in the low overhead of HyperMAMBO-X64 is its deep integration with the virtualization and memory management features of ARMv8. These are exploited to support caching of translations across multiple address spaces while ensuring that translated code remains consistent with the source instructions it is based on. We show how these attributes are achieved without sacrificing either performance or accuracy.

References

K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, pages 2--13. ACM, 2006. doi: 10.1145/1168857. 1168860. Google ScholarDigital Library
Apple. Apple --- Rosetta, 2006. URL https://www.apple.com/rosetta/. [Archived at http://web.archive.org/web/20060113055505/http://www.apple.com/rosetta/].Google Scholar
ARM. big.LITTLE technology: The future of mobile, 2013. URL https://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf. (Visited on 13/07/2016).Google Scholar
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1--12. ACM, 2000. doi: 10.1145/349299.349303. Google ScholarDigital Library
L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th Annual International Symposium on Microarchitecture, pages 191--204. ACM/IEEE Computer Society, 2003. doi: 10.1109/MICRO.2003.1253195. Google ScholarCross Ref
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, pages 164--177. ACM, 2003. doi: 10.1145/945445.945462. Google ScholarDigital Library
F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 41--46. USENIX, 2005. URL http://www.usenix.org/events/usenix05/tech/freenix/bellard.html.Google ScholarDigital Library
R. Bhardwaj, P. Reames, R. Greenspan, V. S. Nori, and E. Ucan. A Choices hypervisor on the ARM architecture. Department of Computer Science, University of Illinois at Urbana-Champaign, 2006. CS523 Course Project Report.Google Scholar
D. Boggs, G. Brown, N. Tuck, and K. S. Venkatraman. Denver: Nvidia's first 64-bit ARM processor. IEEE Micro, 35(2): 46--55, 2015. doi: 10.1109/MM.2015.12. Google ScholarCross Ref
D. Bruening and V. Kiriansky. Process-shared and persistent code caches. In Proceedings of the 4th International Conference on Virtual Execution Environments, VEE 2008, pages 61--70. ACM, 2008. doi: 10.1145/1346256.1346265. Google ScholarDigital Library
P. P. Bungale and C. Luk. PinOS: a programmable framework for whole-system dynamic instrumentation. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pages 137--147. ACM, 2007. doi: 10.1145/1254810.1254830. Google ScholarDigital Library
C. Chang, J. Wu, W. Hsu, P. Liu, and P. Yew. Efficient memory virtualization for cross-ISA system mode emulation. In 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '14, pages 117--128. ACM, 2014. doi: 10.1145/2576195.2576201. Google ScholarDigital Library
M. Chapman, D. J. Magenheimer, and P. Ranganathan. Magixen: Combining binary translation and virtualization. Technical report, Technical Report HPL-2007-77, Hewlett-Packard Laboratories, 2007.Google Scholar
A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, and J. Yates. FX! 32: A profile-directed binary translator. IEEE Micro, (2):56--64, 1998. Google ScholarDigital Library
J. Corbet. Supporting filesystems in persistent memory, 2014. URL https://lwn.net/Articles/610174/.Google Scholar
K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 39th International Symposium on Computer Architecture (ISCA 2012), pages 213--224. IEEE Computer Society, 2012. doi: 10.1109/ISCA.2012. 6237019.Google ScholarCross Ref
C. Dall and J. Nieh. KVM/ARM: the design and implementation of the Linux ARM hypervisor. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 333--348. ACM, 2014. doi: 10.1145/2541940. 2541946. Google ScholarDigital Library
A. d'Antras, C. Gorgovan, J. D. Garside, and M. Luján. Optimizing indirect branches in dynamic binary translators. ACM Transactions on Architecture and Code Optimization, 13(1): 7, 2016. doi: 10.1145/2866573. Google ScholarDigital Library
A. d'Antras, C. Gorgovan, J. D. Garside, and M. Luján. Low overhead dynamic binary translation on ARM. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017. ACM, 2017.Google Scholar
J. C. Dehnert, B. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta code morphing software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), pages 15--24. IEEE Computer Society, 2003. doi: 10.1109/CGO.2003.1191529. Google ScholarCross Ref
J.-H. Ding, C.-J. Lin, P.-H. Chang, C.-H. Tsang, W.-C. Hsu, and Y.-C. Chung. ARMvisor: System virtualization for ARM. In Proceedings of the Ottawa Linux Symposium (OLS), pages 93--107, 2012.Google Scholar
E. Duesterwald and V. Bala. Software profiling for hot path prediction: Less is more. In ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 202--211. ACM Press, 2000. doi: 10.1145/356989.357008. Google ScholarDigital Library
R. Grisenthwaite. ARMv8 Technology Preview, 2011.Google Scholar
B. Hawkins, B. Demsky, D. Bruening, and Q. Zhao. Optimizing binary translation of dynamically generated code. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, pages 68--78. IEEE Computer Society, 2015. doi: 10.1109/CGO.2015.7054188. Google ScholarCross Ref
R. J. Hookway and M. A. Herdeg. DIGITAL fx!32: Combining emulation and binary translation. Digital Technical Journal, 9(1), 1997. URL http://www.hpl.hp.com/hpjournal/dtj/vol9num1/vol9num1art1.pdf.Google ScholarDigital Library
C. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 190--200. ACM, 2005. doi: 10.1145/1065010.1065034. Google ScholarDigital Library
T. Moseley, D. A. Connors, D. Grunwald, and R. Peri. Identifying potential parallelism via loop-centric profiling. In Proceedings of the 4th Conference on Computing Frontiers, pages 143--152. ACM, 2007. doi: 10.1145/1242531.1242554. Google ScholarDigital Library
A. Patel, M. Daftedar, M. Shalan, and M. W. El-Kharashi. Embedded hypervisor xvisor: A comparative analysis. In 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2015, pages 682--691. IEEE Computer Society, 2015. doi: 10.1109/PDP.2015.108. Google ScholarDigital Library
N. Penneman, D. Kudinskas, A. Rawsthorne, B. D. Sutter, and K. D. Bosschere. Formal virtualization requirements for the ARM architecture. Journal of Systems Architecture - Embedded Systems Design, 59(3):144--154, 2013. doi: 10. 1016/j.sysarc.2013.02.003.Google Scholar
G. J. Popek and R. P. Goldberg. Formal requirements for virtualizable third generation architectures. Communications of the ACM, 17(7):412--421, 1974. doi: 10.1145/361011. 361073.Google ScholarDigital Library
V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Fifth International Symposium on Code Generation and Optimization (CGO 2007), pages 74--88. IEEE Computer Society, 2007. doi: 10.1109/CGO.2007.29. Google ScholarDigital Library
Y. Sato, Y. Inoguchi, and T. Nakamura. On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system. In Proceedings of the 8th Conference on Computing Frontiers, page 25. ACM, 2011. doi: 10.1145/2016604.2016634. Google ScholarDigital Library
D. Seal. ARM Architecture Reference Manual. Pearson Education, 2001.Google Scholar
J. Seward and N. Nethercote. Using Valgrind to detect undefined value errors with bit-precision. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 17--30. USENIX, 2005. URL http://www.usenix.org/events/usenix05/tech/general/seward.html.Google ScholarDigital Library
A. Smirnov, M. Zhidko, Y. Pan, P. Tsao, K. Liu, and T. Chiueh. Evaluation of a server-grade software-only ARM hypervisor. In 2013 IEEE Sixth International Conference on Cloud Computing, pages 855--862. IEEE, 2013. doi: 10.1109/CLOUD. 2013.71.Google ScholarDigital Library
Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/.Google Scholar
X. Tong, T. Koju, M. Kawahito, and A. Moshovos. Optimizing memory translation emulation in full system emulators. ACM Transactions on Architecture and Code Optimization, 11(4): 60:1--60:24, 2014. doi: 10.1145/2686034. Google ScholarDigital Library
Transitive. Transitive, 2008. URL http://www.transitive.com. [Archived at https://web.archive.org/web/20080914184751/http://www.transitive.com].Google Scholar
A. van de Ven. An introduction to clear containers, 2015. URL https://lwn.net/Articles/644675/.Google Scholar
C. Wang, S. Hu, H. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. StarDBT: An efficient multi-platform dynamic binary translation system. In Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Proceedings, volume 4697 of Lecture Notes in Computer Science, pages 4--15. Springer, 2007. doi: 10.1007/978-3-540-74309-5_3. Google ScholarCross Ref
W. Wang, P. Yew, A. Zhai, and S. McCamant. A general persistent code caching framework for dynamic binary translation (DBT). In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, pages 591--603. USENIX Association, 2016. URL https://www.usenix.org/conference/atc16/technical-sessions/presentation/wang.Google Scholar
J. Watson. Virtualbox: bits and bytes masquerading as machines. Linux Journal, 2008(166):1, 2008.Google ScholarDigital Library
Q. Zhao, D. Koh, S. Raza, D. Bruening, W. Wong, and S. P. Amarasinghe. Dynamic cache contention detection in multithreaded applications. In Proceedings of the 7th International Conference on Virtual Execution Environments, VEE 2011, pages 27--38. ACM, 2011. doi: 10.1145/1952682.1952688. Google ScholarDigital Library
C. Zheng and C. L. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. IEEE Computer, 33(3):47--52, 2000. doi: 10.1109/2.825695. Google ScholarDigital Library

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains

Recommendations

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation
VEE '17

Current computer architectures --- ARM, MIPS, PowerPC, SPARC, x86 --- have evolved from a 32-bit architecture to a 64-bit one. Computer architects often consider whether it could be possible to eliminate hardware support for a subset of the instruction ...
Read More
How far can we go on the x64 processors?
FSE'06: Proceedings of the 13th international conference on Fast Software Encryption

This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on the new 64-bit x64 processors, AMD Athlon64 (AMD64) and Intel Pentium 4 (EM64T). We fully utilize newly introduced 64-bit registers and ...
Read More
x64 Windows Debugging: Practical Foundations
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
April 2017
261 pages
ISBN:9781450349482
DOI:10.1145/3050748
ACM SIGPLAN Notices Volume 52, Issue 7
VEE '17
July 2017
256 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140607
Editor:
Matthew Fluet
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
VEE '17 Paper Acceptance Rate18of43submissions,42%Overall Acceptance Rate80of235submissions,34%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 384
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

ABSTRACT

References

Cited By

Recommendations

HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation

How far can we go on the x64 processors?

x64 Windows Debugging: Practical Foundations