ABSTRACT
While data are growing at a speed never seen before, parallel computing is becoming more and more essential to process this massive volume of data in a timely manner. Therefore, recently, concurrent computations have been receiving increasing attention due to the widespread adoption of multi-core processors and the emerging advancements of cloud computing technology. The ubiquity of mobile devices, location services, and sensor pervasiveness are examples of new scenarios that have created the crucial need for building scalable computing platforms and parallel architectures to process vast amounts of generated streaming data. In practice, efficiently operating these systems is hard due to the intrinsic complexity of these architectures and the lack of a formal and in-depth knowledge of the performance models and the consequent system costs. The Actor Model theory has been presented as a mathematical model of con- current computation that had enormous success in practice and inspired a number of contemporary work in this area. Recently, the Storm system has been presented as a realization of the principles of the Actor Model theory in the context of the large scale processing of streaming data. In this paper, we present, to the best of our knowledge, the first set of models that formalize the performance characteristics of a practical distributed, parallel and fault-tolerant stream processing system that follows the Actor Model theory. In particular, we model the characteristics of the data flow, the data processing and the system management costs at a fine granularity within the different steps of executing a distributed stream processing job. Finally, we present an experimental validation of the described performance models using the Storm system.
- Abadi, D. J., et al. Aurora: A Data Stream Management System. In SIGMOD Conference (2003), p. 666. Google ScholarDigital Library
- Abadi, D. J., et al. The Design of the Borealis Stream Processing Engine. In CIDR (2005), pp. 277--289.Google Scholar
- Agha, G. Actors: a model of concurrent computation in distributed systems. MIT Press, Cambridge, MA, USA, 1986. Google ScholarDigital Library
- Aly, A. M., et al. M3: Stream Processing on Main-Memory MapReduce. In ICDE (2012). Google ScholarDigital Library
- Amdahl, G. M. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of AFIPS '67, Spring Joint Computer Conference (1967), pp. 483--485. Google ScholarDigital Library
- Bhatotia, P., et al. Incoop: MapReduce for Incremental Computations. In SOCC (2011). Google ScholarDigital Library
- Cherkasova, L. Performance modeling in mapreduce environments: challenges and opportunities. In ICPE (2011), pp. 5--6. Google ScholarDigital Library
- Clinger, W. D. Foundations of Actor Semantics. Tech. rep., Cambridge, MA, USA, 1981. Google ScholarDigital Library
- Condie, T., et al. MapReduce Online. In NSDI (2010), pp. 313--328. Google ScholarDigital Library
- Cucinotta, T. Optimum scalability point for parallelisable real-time components. In Proceedings of SOMRES (2011).Google Scholar
- de Gooijer, T., et al. An industrial case study of performance and cost design space exploration. In ICPE (2012), pp. 205--216. Google ScholarDigital Library
- Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In OSDI (2004), pp. 137--150. Google ScholarDigital Library
- Gedik, B., et al. Spade: the system s declarative stream processing engine. In SIGMOD Conference (2008), pp. 1123--1134. Google ScholarDigital Library
- Giurgiu, I. Understanding performance modeling for modular mobile-cloud applications. In ICPE (2012), pp. 259--262. Google ScholarDigital Library
- Herodotou, H., et al. MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish. PVLDB 4, 12 (2011), 1446--1449.Google Scholar
- Herodotou, H., et al. Starfish: A Self-tuning System for Big Data Analytics. In CIDR (2011), pp. 261--272.Google Scholar
- Hewitt, C. ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing. IEEE Internet Computing 12, 5 (2008), 96--99. Google ScholarDigital Library
- Hewitt, C. ActorScript(TM): Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing. CoRR abs/0907.3330 (2009).Google Scholar
- Hewitt, C. Actor Model for Discretionary, Adaptive Concurrency. CoRR abs/1008.1459 (2010).Google Scholar
- Hewitt, C., et al. A Universal Modular ACTOR Formalism for Artificial Intelligence. In IJCAI (1973), pp. 235--245. Google ScholarDigital Library
- Loesing, S., et al. Stormy: an elastic and highly available streaming service in the cloud. In EDBT/ICDT Workshops (2012), pp. 55--60. Google ScholarDigital Library
- Meseguer, J., and Talcott, C. L. A partial order event model for concurrent objects. In Proceedings of CONCUR (1999). Google ScholarDigital Library
- Mühl, G., et al. Stochastic Analysis of Hierarchical Publish/Subscribe Systems. In Euro-Par (2009).Google Scholar
- Sachs, K., et al. Benchmarking of message-oriented middleware. In DEBS (2009). Google ScholarDigital Library
- Sachs, K., et al. Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark. Perform. Eval. 66, 8 (2009). Google ScholarDigital Library
- Sakr, S., et al. A Survey of Large Scale Data Management Approaches in Cloud Environments. IEEE Communications Surveys and Tutorials 13, 3 (2011), 311--336.Google Scholar
- Schröter, A., et al. Stochastic performance analysis and capacity planning of publish/subscribe systems. In DEBS (2010). Google ScholarDigital Library
Index Terms
- Modeling performance of a parallel streaming engine: bridging theory and costs
Recommendations
Performance Modeling of Spatio-Temporal Algorithms Over GEDS Framework
The efficient processing of spatio-temporal data streams is an area of intense research. However, all methods rely on an unsuitable processor Govindaraju, 2004, namely a CPU, to evaluate concurrent, continuous spatio-temporal queries over these data ...
Portable performance of data parallel languages
SC '97: Proceedings of the 1997 ACM/IEEE conference on SupercomputingA portable program executes on different platforms and yields consistent performance. With the focus on portability, this paper presents an in-depth study of the performance of three NAS benchmarks (EP, MG, FT) compiled with three commercial HPF ...
Performance characteristics of the multi-zone NAS parallel benchmarks
Special issue: 18th International parallel and distributed processing symposiumWe describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of meshes, but had not previously been captured in ...
Comments