skip to main content
10.1145/3343737.3343742acmconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

ExtOS: Data-centric Extensible OS

Published:19 August 2019Publication History

ABSTRACT

Today's computer architectures are fundamentally different than a decade ago: IO devices and interfaces can sustain much higher data rates than the compute capacity of a single threaded CPU. To meet the computational requirements of modern applications, the operating system (OS) requires lean and optimized software running on CPUs for applications to fully exploit the IO resources. Despite the changes in hardware, today's traditional system software unfortunately uses the same assumptions of a decade ago---the IO is slow, and the CPU is fast.

This paper makes a case for the data-centric extensible OS, which enables full exploitation of emerging high-performance IO hardware. Based on the idea of minimizing data movements in software, a top-to-bottom lean and optimized architecture is proposed, which allows applications to customize the OS kernel's IO subsystems with application-provided code. This enables sharing and high-performance IO among applications---initial microbenchmarks on a Linux prototype where we used eBPF to specialize the Linux kernel show performance improvements of up to 1.8× for database primitives and 4.8× for UNIX utility tools.

References

  1. Extension framework for file systems in user-space. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, 2019. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mike Accetta, Robert Baron, William Bolosky, David Golub, Richard Rashid, Avadis Tevanian, and Michael Young. Mach: A new kernel foundation for unix development. pages 93--112, 1986.Google ScholarGoogle Scholar
  3. Ethernet Alliance. Ethernet's terabit future seen in new ethernet alliance roadmap, 2018. ttps://ethernetalliance.org/wp-content/uploads/2018/03/EA_Roadmap18_FINAL_12Mar18.pdf.Google ScholarGoogle Scholar
  4. Antonio Barbalace, Martin Decky, and Javier Picorel. Smart software caches. In The 8th Workshop on Systems for Multi-core and Heterogeneous Architectures, 2018.Google ScholarGoogle Scholar
  5. Antonio Barbalace, Anthony Iliopoulos, Holm Rauchfuss, and Goetz Brasche. It's time to think about an operating system for near data processing architectures. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems, HotOS '17, pages 56--61, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Antonio Barbalace, Rob Lyerly, Christopher Jelesnianski, Anthony Carno, Ho-ren Chuang, and Binoy Ravindran. Breaking the boundaries in heterogeneous-isa datacenters. In Proceedings of the 22th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. Popcorn: Bridging the programmability gap in heterogeneous-isa platforms. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 29:1--29:16, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. Attack of the killer microseconds. Commun. ACM, 60(4):48--54, March 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrew Baumann, Dongyoon Lee, Pedro Fonseca, Lisa Glendenning, Jacob R. Lorch, Barry Bond, Reuben Olinsky, and Galen C. Hunt. Composing os extensions safely and efficiently with bascule. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 239--252, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazieres, and Christos Kozyrakis. Dune: Safe User-level Access to Privileged CPU Features. page 14.Google ScholarGoogle Scholar
  12. Adam Belay, George Prekas, Mia Primorac, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. The ix operating system: Combining low latency, high throughput, and efficiency in a protected dataplane. ACM Trans. Comput. Syst., 34(4):11:1--11:39, December 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Extensibility safety and performance in the spin operating system. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 267--283, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Bos and B. Samwel. Safe kernel programming in the oke. In 2002 IEEE Open Architectures and Network Programming Proceedings. OPENARCH 2002 (Cat. No.02EX571), pages 141--152, June 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. Bryan M. Cantrill, Michael W. Shapiro, and Adam H. Leventhal. Dynamic instrumentation of production systems. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '04, pages 2--2, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. CCIX Consortium. Cache Coherent Interconnect for Accelerators (CCIX), 2017. http://www.ccixconsortium.com/.Google ScholarGoogle Scholar
  17. Shenghsun Cho, Amoghavarsha Suresh, Tapti Palit, Michael Ferdman, and Nima Honarmand. Taming the killer microsecond. In 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. OpenCAPI Consortium. Welcome to OpenCAPI consortium, 2017. http://opencapi.org/.Google ScholarGoogle Scholar
  19. Byron Cook, Andreas Podelski, and Andrey Rybalchenko. Termination proofs for systems code. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pages 415--426, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nathan Dautenhahn, Theodoros Kasampalis, Will Dietz, John Criswell, and Vikram Adve. Nested kernel: An operating system architecture for intra-kernel privilege separation. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 191--206, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Willem de Bruijn and Herbert Bos. Pipesfs: Fast linux i/o in the unix tradition. SIGOPS Oper. Syst. Rev., 42(5):55--63, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Willem de Bruijn, Herbert Bos, and Henri Bal. Application-tailored i/o with streamline. ACM Trans. Comput. Syst., 29(2):6:1--6:33, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ulrich Drepper. The cost of virtualization. Acm Queue, 6(1):28--35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Weimin Du, Ravi Krishnamurthy, and Ming-Chien Shan. Query optimization in a heterogeneous dbms. In Proceedings of the 18th International Conference on Very Large Data Bases, VLDB '92, pages 277--291, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dawson R Engler, M Frans Kaashoek, and James O'Toole Jr. Exokernel: An Operating System Architecture for Application-Level Resource Management. page 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Matt Fleming. A thorough introduction to eBPF, 2017. https://lwn.net/Articles/740157/.Google ScholarGoogle Scholar
  27. Gen-Z Consortium. Gen-Z A New Approach to Data Access, 2017. http://genzconsortium.org/.Google ScholarGoogle Scholar
  28. Elazar Gershuni, Nadav Amit, Arie Gurfinkel, Nina Narodytska, Jorge A. Navas, Noam Rinetzky, Leonid Ryzhyk, and Mooly Sagiv. Simple and precise static analysis of untrusted linux kernel extensions. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pages 1069--1084, New York, NY, USA, 2019. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Brendan Gregg. Kpti/kaiser meltdown initial performance regressions, 2018. http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html.Google ScholarGoogle Scholar
  30. Brendan Gregg. Linux Extended BPF (eBPF) Tracing Tools, 2018. http://www.brendangregg.com/ebpf.html.Google ScholarGoogle Scholar
  31. Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. Biscuit: A framework for near-data processing of big data workloads. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 153--165, Piscataway, NJ, USA, 2016. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Laura M. Haas, Donald Kossmann, Edward L. Wimmers, and Jun Yang. Optimizing queries across diverse data sources. In In Proc. of VLDB, pages 276--285, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. T. Hady, A. Foong, B. Veal, and D. Williams. Platform storage performance with 3d xpoint technology. Proceedings of the IEEE, 105(9):1822--1833, Sept 2017.Google ScholarGoogle ScholarCross RefCross Ref
  34. Matthias Hille, Nils Asmussen, Pramod Bhatotia, and Hermann Härtig. Semperos: A distributed capability system. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, 2019. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Galen C. Hunt and James R. Larus. Singularity: Rethinking the software stack. SIGOPS Oper. Syst. Rev., 41(2):37--49, April 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Intel. Storage Performance Development Kit (SPDK), 2018. http://www.spdk.org.Google ScholarGoogle Scholar
  37. Intel. BlobFS (Blobstore Filesystem), 2019. https://spdk.io/doc/blobfs.html.Google ScholarGoogle Scholar
  38. Jonathan Corbet. Page-based direct i/o, 2009. https://lwn.net/Articles/348719/, Online, accessed 01/05/2019.Google ScholarGoogle Scholar
  39. The Linux Kernel. Seccomp BPF (SECure COMPuting with filters), 2018. https://www.kernel.org/doc/html/v4.13/userspace-api/seccomp_filter.html.Google ScholarGoogle Scholar
  40. Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan Trach, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. Sgxbounds: Memory safety for shielded execution. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys '17, pages 205--221, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. A. Lai, Q. Wang, J. Kimball, J. Li, J. Park, and C. Pu. Io performance interference among consolidated n-tier applications: Sharing is better than isolation for disks. In 2014 IEEE 7th International Conference on Cloud Computing, pages 24--31, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. S. Liao, See-Mong Tan, and R. H. Campbell. Fine-grained, dynamic user customization of operating systems. In Proceedings of the Fifth International Workshop on Object-Orientation in Operation Systems, pages 62--66, Oct 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Liedtke. On micro-kernel construction. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 237--250, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. K2: a mobile operating system for heterogeneous coherence domains. ACM SIGARCH Computer Architecture News, 42(1):285--300, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. LWN. Linux >=4.9: eBPF memory corruption bugs, 2017. https://lwn.net/Articles/742169/.Google ScholarGoogle Scholar
  46. Steven McCanne and Van Jacobson. The bsd packet filter: A new architecture for user-level packet capture. In Proceedings of the USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 Conference Proceedings, USENIX'93, pages 2--2, Berkeley, CA, USA, 1993. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Larry McVoy. The splice i/o model, 1998.Google ScholarGoogle Scholar
  48. Mellanox. ConnectX-6 200Gb/s Ethernet Adapter IC, 2018. http://www.mellanox.com/related-docs/prod_silicon/PB_ConnectX-6_EN_IC.pdf.Google ScholarGoogle Scholar
  49. Netronome. About agilio smartnics, 2019. https://www.netronome.com/products/smartnic/overview/, Online, accessed 01/05/2019.Google ScholarGoogle Scholar
  50. NVIDIA. Nsight Eclipse Edition, 2018. https://developer.nvidia.com/nsight-eclipse-edition.Google ScholarGoogle Scholar
  51. Oleksii Oleksenko, Dmitrii Kuvaiskii, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. Intel MPX Explained: A Cross-layer Analysis of the Intel MPX System Stack. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The operating system is the control plane. ACM Trans. Comput. Syst., 33(4):11:1--11:30, November 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ian Pratt and Keir Fraser. Arsenic: A user-accessible gigabit ethernet interface. In IN PROCEEDINGS OF IEEE INFOCOM, pages 67--76, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  54. Mia Primorac, Edouard Bugnion, and Katerina Argyraki. How to measure the killer microsecond. SIGCOMM Comput. Commun. Rev., 47(5):61--66, October 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. IO Visor Project. Bcc bpf compiler collection, 2018. https://www.iovisor.org/technology/bcc.Google ScholarGoogle Scholar
  56. The Linux Foundation Projects. Data Plane Development Kit (DPDK), 2018. http://www.dpdk.org.Google ScholarGoogle Scholar
  57. Amit Purohit, Charles P Wright, Joseph Spadavecchia, Erez Zadok, et al. Cosy: Develop in user-land, run in kernel-mode. In HotOS, pages 109--114, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Andrew Putnam. Large-scale reconfigurable computing in a microsoft datacenter. In Hot Chips 26 Symposium (HCS), 2014 IEEE, pages 1--38. IEEE, 2014.Google ScholarGoogle Scholar
  59. Matthew J Renzelmann, Asim Kadav, and Michael M Swift. Symdrive: Testing drivers without devices. In Osdi, volume 1, page 6, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Samsung. Samsung pm1725a nvme ssd, 2018. https://www.samsung.com/semiconductor/global.semi.static/Samsung_PM1725a_NVMe_SSD-0.pdf.Google ScholarGoogle Scholar
  61. Margo I. Seltzer, Yasuhiro Endo, Christopher Small, and Keith A. Smith. Dealing with disaster: Surviving misbehaved kernel extensions. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI '96, pages 213--227, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. Legoos: A disseminated, distributed {OS} for hardware resource disaggregation. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), pages 69--87, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Christopher Small and Margo Seltzer. A comparison of os extension technologies. In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC '96, pages 4--4, Berkeley, CA, USA, 1996. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Andrew S Tanenbaum. A unix clone with source code for operating systems courses. SIGOPS Oper. Syst. Rev., 21(1):20--29, January 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Chandramohan A. Thekkath, Thu D. Nguyen, Evelyn Moy, and Edward D. Lazowska. Implementing network protocols at user level. IEEE/ACM Trans. Netw., 1(5):554--565, October 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Shivakumar Venkataraman and Tian Zhang. Heterogeneous database query optimization in db2 universal datajoiner. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB '98, pages 685--689, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Common Vulnerabilities and Exposures. CVE-2017-16995, 2017. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-16995.Google ScholarGoogle Scholar
  68. Daniel Waddington and Jim Harris. Software challenges for the changing storage landscape. Commun. ACM, 61(11):136--145, October 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. XiWang, David Lazar, Nickolai Zeldovich, Adam Chlipala, and Zachary Tatlock. Jitk: A trustworthy in-kernel interpreter infrastructure. In OSDI, pages 33--47, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. N. Zilberman, Y. Audzevich,G. A. Covington, and A. W. Moore. Netfpga sume: Toward 100 gbps as research commodity. IEEE Micro, 34(5):32--41, Sept 2014.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    APSys '19: Proceedings of the 10th ACM SIGOPS Asia-Pacific Workshop on Systems
    August 2019
    115 pages
    ISBN:9781450368933
    DOI:10.1145/3343737

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 19 August 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    APSys '19 Paper Acceptance Rate15of36submissions,42%Overall Acceptance Rate149of386submissions,39%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader