skip to main content
10.1145/2479871.2479895acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Modeling performance of a parallel streaming engine: bridging theory and costs

Published:21 April 2013Publication History

ABSTRACT

While data are growing at a speed never seen before, parallel computing is becoming more and more essential to process this massive volume of data in a timely manner. Therefore, recently, concurrent computations have been receiving increasing attention due to the widespread adoption of multi-core processors and the emerging advancements of cloud computing technology. The ubiquity of mobile devices, location services, and sensor pervasiveness are examples of new scenarios that have created the crucial need for building scalable computing platforms and parallel architectures to process vast amounts of generated streaming data. In practice, efficiently operating these systems is hard due to the intrinsic complexity of these architectures and the lack of a formal and in-depth knowledge of the performance models and the consequent system costs. The Actor Model theory has been presented as a mathematical model of con- current computation that had enormous success in practice and inspired a number of contemporary work in this area. Recently, the Storm system has been presented as a realization of the principles of the Actor Model theory in the context of the large scale processing of streaming data. In this paper, we present, to the best of our knowledge, the first set of models that formalize the performance characteristics of a practical distributed, parallel and fault-tolerant stream processing system that follows the Actor Model theory. In particular, we model the characteristics of the data flow, the data processing and the system management costs at a fine granularity within the different steps of executing a distributed stream processing job. Finally, we present an experimental validation of the described performance models using the Storm system.

References

  1. Abadi, D. J., et al. Aurora: A Data Stream Management System. In SIGMOD Conference (2003), p. 666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abadi, D. J., et al. The Design of the Borealis Stream Processing Engine. In CIDR (2005), pp. 277--289.Google ScholarGoogle Scholar
  3. Agha, G. Actors: a model of concurrent computation in distributed systems. MIT Press, Cambridge, MA, USA, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aly, A. M., et al. M3: Stream Processing on Main-Memory MapReduce. In ICDE (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Amdahl, G. M. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of AFIPS '67, Spring Joint Computer Conference (1967), pp. 483--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bhatotia, P., et al. Incoop: MapReduce for Incremental Computations. In SOCC (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cherkasova, L. Performance modeling in mapreduce environments: challenges and opportunities. In ICPE (2011), pp. 5--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Clinger, W. D. Foundations of Actor Semantics. Tech. rep., Cambridge, MA, USA, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Condie, T., et al. MapReduce Online. In NSDI (2010), pp. 313--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cucinotta, T. Optimum scalability point for parallelisable real-time components. In Proceedings of SOMRES (2011).Google ScholarGoogle Scholar
  11. de Gooijer, T., et al. An industrial case study of performance and cost design space exploration. In ICPE (2012), pp. 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In OSDI (2004), pp. 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gedik, B., et al. Spade: the system s declarative stream processing engine. In SIGMOD Conference (2008), pp. 1123--1134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Giurgiu, I. Understanding performance modeling for modular mobile-cloud applications. In ICPE (2012), pp. 259--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Herodotou, H., et al. MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish. PVLDB 4, 12 (2011), 1446--1449.Google ScholarGoogle Scholar
  16. Herodotou, H., et al. Starfish: A Self-tuning System for Big Data Analytics. In CIDR (2011), pp. 261--272.Google ScholarGoogle Scholar
  17. Hewitt, C. ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing. IEEE Internet Computing 12, 5 (2008), 96--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hewitt, C. ActorScript(TM): Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing. CoRR abs/0907.3330 (2009).Google ScholarGoogle Scholar
  19. Hewitt, C. Actor Model for Discretionary, Adaptive Concurrency. CoRR abs/1008.1459 (2010).Google ScholarGoogle Scholar
  20. Hewitt, C., et al. A Universal Modular ACTOR Formalism for Artificial Intelligence. In IJCAI (1973), pp. 235--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Loesing, S., et al. Stormy: an elastic and highly available streaming service in the cloud. In EDBT/ICDT Workshops (2012), pp. 55--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Meseguer, J., and Talcott, C. L. A partial order event model for concurrent objects. In Proceedings of CONCUR (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mühl, G., et al. Stochastic Analysis of Hierarchical Publish/Subscribe Systems. In Euro-Par (2009).Google ScholarGoogle Scholar
  24. Sachs, K., et al. Benchmarking of message-oriented middleware. In DEBS (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sachs, K., et al. Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark. Perform. Eval. 66, 8 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sakr, S., et al. A Survey of Large Scale Data Management Approaches in Cloud Environments. IEEE Communications Surveys and Tutorials 13, 3 (2011), 311--336.Google ScholarGoogle Scholar
  27. Schröter, A., et al. Stochastic performance analysis and capacity planning of publish/subscribe systems. In DEBS (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling performance of a parallel streaming engine: bridging theory and costs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
      April 2013
      446 pages
      ISBN:9781450316361
      DOI:10.1145/2479871

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 April 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICPE '13 Paper Acceptance Rate28of64submissions,44%Overall Acceptance Rate252of851submissions,30%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader