skip to main content
10.1145/3452413.3464788acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper

Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda

Published:18 June 2021Publication History

ABSTRACT

The High-Energy Physics experiments at CERN produce a high volume of data. It is not possible to analyze big chunks of it within a reasonable time by any single machine. The ROOT framework was recently extended with the distributed computing capabilities for massively parallelized RDataFrame applications. This approach, using the MapReduce pattern underneath, made the heavy computations much more approachable even for the newcomers.

This paper explores the possibility of running such analyses on serverless services in public cloud using a purely stateless environment. So far, the distributed approaches used by RDataFrame relied on stateful, fully managed computing frameworks like Apache Spark. Here we show that our newly developed tool is able to use perfectly stateless cloud functions, demonstrating the excellent speedup in parallel stage of processing in our benchmarks.

References

  1. Valentina Avati, Milosz Blaszkiewicz, Enrico Bocchi, Luca Canali, Diogo Castro, Javier Cervantes, Leszek Grzanka, Enrico Guiraud, Jan Kaspar, Prasanth Kothuri, Massimo Lamanna, Maciej Malawski, Aleksandra Mnich, Jakub Moscicki, Shravan Murali, Danilo Piparo, and Enric Tejedor. 2019. Declarative Big Data Analysis for High-Energy Physics: TOTEM Use Case. In Euro-Par 2019: Parallel Processing, Ramin Yahyapour (Ed.). Springer International Publishing, Cham, 241--255.Google ScholarGoogle Scholar
  2. Rene Brun and Fons Rademakers. 1996. ROOT - An Object Oriented Data Analysis Framework. http://root.cern.ch/, In AIHENP'96 Workshop, Lausane. Nucl. Inst. & Meth. in Phys. Res. A, Vol. 389, 81--86.Google ScholarGoogle Scholar
  3. Alvise Dorigo, Peter Elmer, Fabrizio Furano, and Andrew Hanushevsky. 2005. XROOTD/TXNetFile: a highly scalable architecture for data access in the ROOT environment. Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics (01 2005), 46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zaharia et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM, Vol. 59, 11 (Oct. 2016) 56--65. https://doi.org/10.1145/2934664 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. HashiCorp. [n.d.]. Terraform: Infrastructure as code for provisioning, compliance, and management of any cloud, infrastructure, and service. https://www.hashicorp.com/products/terraform.Google ScholarGoogle Scholar
  6. htcondor [n.d.]. HTCondor is a specialized workload management system for compute-intensive jobs. https://research.cs.wisc.edu/htcondorGoogle ScholarGoogle Scholar
  7. Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). Association for Computing Machinery, New York, NY, USA, 445--451. https://doi.org/10.1145/3127479.3128601 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Kim and J. Lin. 2018. Serverless Data Analytics with Flint. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). 451-455. https: //doi.org/10.1109/CLOUD.2018.00063Google ScholarGoogle ScholarCross RefCross Ref
  9. Kubernetes [n.d.]. Production-Grade Container Orchestration Automated container deployment, scaling, and management. https://kubernetes.io/.Google ScholarGoogle Scholar
  10. Padulano, Vincenzo Eduardo, Cervantes Villanueva, Javier, Guiraud, Enrico, and Tejedor Saavedra, Enric. 2020. Distributed data analysis with ROOT RDataFrame. EPJ Web Conf., Vol. 245 (2020), 03009. https://doi.org/10.1051/epjconf/202024503009Google ScholarGoogle Scholar
  11. Danilo Piparo, Enric Tejedor, Pere Mato, Luca Mascetti, Jakub Moscicki, and Massimo Lamanna. 2018. SWAN: A service for interactive analysis in the cloud. Future Generation Computer Systems, Vol. 78 (2018), 1071--1078. https://doi.org/10.1016/j.future.2016.11.035Google ScholarGoogle ScholarCross RefCross Ref
  12. Piparo, Danilo, Canal, Philippe, Guiraud, Enrico, Pla, Xavier Valls, Ganis, Gerardo, Amadio, Guilherme, Naumann, Axel, and Tejedor, Enric. 2019. RDataFrame: Easy Parallel ROOT Analysis at 100 Threads. EPJ Web Conf., Vol. 214 (2019), 06029. https://doi.org/10.1051/epjconf/201921406029Google ScholarGoogle Scholar
  13. Russel Sandberg. 2000. The Sun Network File System: Design, Implementation and Experience. (09 2000).Google ScholarGoogle Scholar
  14. Stefan Wunsch. 2019. DoubleMuParked dataset from 2012 in NanoAOD format reduced on muons. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.LVG5.QT81.Google ScholarGoogle Scholar

Index Terms

  1. Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          HiPS '21: Proceedings of the 1st Workshop on High Performance Serverless Computing
          June 2021
          46 pages
          ISBN:9781450383882
          DOI:10.1145/3452413
          • General Chairs:
          • Yadu Babuji,
          • Kyle Chard,
          • Program Chairs:
          • Ian Foster,
          • Zhuozhao Li

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Upcoming Conference

        • Article Metrics

          • Downloads (Last 12 months)23
          • Downloads (Last 6 weeks)6

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader