Abstract
Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data.
A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area.
- B. P. Abbott, R. Abbott, T. D. Abbott, M. R. Abernathy, F. Acernese, K. Ackley, C. Adams, T. Adams, P. Addesso, and others. 2016. Observation of gravitational waves from a binary Black Hole merger. Phys. Rev. Lett. 116, 6 (Feb. 2016), 061102.Google Scholar
- Mohamed Abouelhoda, Shadi Issa, and Moustafa Ghanem. 2012. Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinformatics 13, 1 (2012), 77.Google ScholarCross Ref
- David Abramson, Colin Enticott, and Ilkay Altinas. 2008. Nimrod/K: Towards massively parallel dynamic grid workflows. In Proc. ACM/IEEE Conference on Supercomputing (SC’08). IEEE Press, Piscataway, NJ, USA, Article 24, 11 pages. Google ScholarDigital Library
- Bernie Ács, Xavier Llorà, Loretta Auvil, Boris Capitanu, David Tcheng, Mike Haberman, Limin Dong, Tim Wentling, and Michael Welge. 2010. A general approach to data-intensive computing using the Meandre component-based framework. In Proc. 1st International Workshop on Workflow Approaches to New Data-centric Science (WANDS’10). ACM, Article 8, 12 pages. Google ScholarDigital Library
- Aashish N. Adhikari, Jian Peng, Michael Wilde, Jinbo Xu, Karl F. Freed, and Tobin R. Sosnick. 2012. Modeling large regions in proteins: Applications to loops, termini, and folding. Protein Science 21, 1 (Jan. 2012), 107--121.Google ScholarCross Ref
- Chris Allan, Jean-Marie Burel, Josh Moore, Colin Blackburn, Melissa Linkert, Scott Loynton, Donald MacDonald, William J Moore, Carlos Neves, and others. 2012. OMERO: Flexible, model-driven data management for experimental biology. Nature Methods 9, 3 (March 2012), 245--253.Google ScholarCross Ref
- Ilkay Altintas, Oscar Barney, and Efrat Jaeger-Frank. 2006. Provenance collection support in the Kepler scientific workflow system. In Provenance and Annotation of Data. LNCS, Vol. 4145. 118--132. Google ScholarDigital Library
- Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50--58. Google ScholarDigital Library
- Malcolm P. Atkinson. 2013. Data-Intensive thinking with Dispel. In The Data Bonanza -- Improving Knowledge Discovery for Science, Engineering and Business, Malcolm P. Atkinson, Rob Baxter, Paolo Besana, Michelle Galea, Mark Parsons, Peter Brezany, Oscar Corcho, Jano van Hemert, and David Snelling (Eds.). John Wiley 8 Sons, Inc., Hoboken, NJ, USA, Chapter 4, 61--122.Google Scholar
- Malcolm P. Atkinson, Michele Carpené, Emanuele Casarotti, Steffen Claus, Rosa Filgueira, Anton Frank, Michelle Galea, Tom Garth, André Gemünd, and others. 2015. VERCE delivers a productive e-Science environment for seismology research. In Proc. IEEE International Conference on e-Science (e-Science 2015). Google ScholarDigital Library
- Malcolm P. Atkinson and Mark Parsons. 2013. The Digital-Data Challenge. In The Data Bonanza-- Improving Knowledge Discovery for Science, Engineering and Business, Malcolm P. Atkinson, Rob Baxter, Paolo Besana, Michelle Galea, Mark Parsons, Peter Brezany, Oscar Corcho, Jano van Hemert, and David Snelling (Eds.). John Wiley 8 Sons, Inc., Hoboken, NJ, USA, Chapter 1, 5--13.Google Scholar
- Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. 2002. Models and issues in data stream systems. In Proc. 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’02). ACM, New York, NY, USA, 1--16. Google ScholarDigital Library
- Roger Barga, Jared Jackson, Nelson Araujo, Dean Guo, Nitin Gautam, and Yogesh Simmhan. 2008. The trident scientific workflow workbench. In Proc. e-Science’08. IEEE Computer Society, Los Alamitos, CA, USA, 317--318. Google ScholarDigital Library
- Adam Barker, Christopher D. Walton, and David Robertson. 2009. Choreographing web services. IEEE Trans. on Services Computing 2, 2 (April-June 2009), 152--166. Google ScholarDigital Library
- Adam Barker, Jon B. Weissman, and Jano van Hemert. 2008. Orchestrating data-centric workflows. In Proc. 8th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID 2008). IEEE Computer Society, 210--217. Google ScholarDigital Library
- Jörg Becker, Michael zur Muehlen, and Marc Gille. 2002. Workflow application architectures: Classification and characteristics of workflow-based information systems. In Workflow Handbook 2002, Layna Fischer (Ed.). Future Strategies, 39--50.Google Scholar
- Stephan Beisken, Thorsten Meinl, Bernd Wiswedel, Luis de Figueiredo, Michael Berthold, and Christoph Steinbeck. 2013. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinformatics 14, 1 (2013), 257.Google ScholarCross Ref
- Khalid Belhajjame, Jun Zhao, Daniel Garijo, Kristina Hettne, Raul Palma, Óscar Corcho, José-Manuel Gómez-Pérez, Sean Bechhofer, Graham Klyne, and Carole Goble. 2015. Using a suite of ontologies for preserving workflow-centric research objects. Web Semantics: Science, Services and Agents on the World Wide Web 32 (2015), 16--42. Google ScholarDigital Library
- G. Bruce Berriman, Ewa Deelman, Paul T. Groth, and Gideon Juve. 2010. The application of cloud computing to the creation of image mosaics and management of their provenance. In Software and Cyberinfrastructure for Astronomy, Nicole M. Radziwill and Alan Bridger (Eds.), Vol. 7740. SPIE, 77401F.Google Scholar
- Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Kilian Thiel, and Bernd Wiswedel. 2009. KNIME - The Konstanz information miner. SIGKDD Explorations 11, 1 (Nov. 2009), 26--31. Google ScholarDigital Library
- Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and Karan Vahi. 2008. Characterization of scientific workflows. In Proc. Workflows for Science (WORKS’08). IEEE Computer Society, 1--10.Google ScholarCross Ref
- Daniel Blankenberg, Gregory Von Kuster, Nathaniel Coraor, Guruprasad Ananda, Ross Lazarus, Mary Mangan, Anton Nekrutenko, and James Taylor. 2010. Galaxy: A Web-Based Genome Analysis Tool for Experimentalists. John Wiley 8 Sons, Inc.Google Scholar
- Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77--85. Google ScholarDigital Library
- Shawn Bowers and Bertram Ludäscher. 2005. Actor-oriented design of scientific workflows. In Conceptual Modeling -- ER 2005. LNCS, Vol. 3716. 369--384. Google ScholarDigital Library
- Shawn Bowers, Timothy McPhillips, Martin Wu, and Bertram Ludäscher. 2007. Project histories: Managing data provenance across collection-oriented scientific workflow runs. In Data Integration in the Life Sciences. LNCS, Vol. 4544. 122--138. Google ScholarDigital Library
- P. Chris Broekema, Rob V. van Nieuwpoort, and Henri E. Bal. 2012. ExaScale high performance computing in the square kilometer array. In Proc. Astro-HPC’12. ACM, New York, NY, USA, 9--16. Google ScholarDigital Library
- Christopher Brooks, Edward A. Lee, Xiaojun Liu, Stephen Neuendorffer, Yang Zhao, and Haiyang Zheng. 2007. Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II). Technical Report UCB/EECS-2007-7. EECS Department, University of California, Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-7.html.Google Scholar
- Erik Brynjolfsson, Paul Hofmann, and John Jordan. 2010. Cloud computing and electricity: Beyond the utility model. Commun. ACM 53, 5 (May 2010), 32--34. Google ScholarDigital Library
- Tamás Budavári, László Dobos, and Alexander S. Szalay. 2013. SkyQuery: Federating astronomy archives. Computing in Science 8 Engineering 15, 3 (2013), 12--20. Google ScholarDigital Library
- Carlos Buil-Aranda, Marcelo Arenas, Oscar Corcho, and Axel Polleres. 2013. Federating queries in {SPARQL} 1.1: Syntax, semantics and evaluation. Web Semantics: Science, Services and Agents on the World Wide Web 18, 1 (2013), 1--17. Special Section on the Semantic and Social Web Google ScholarDigital Library
- Jacek Cała, Eyad Marei, Yaobo Xu, Kenji Takeda, and Paolo Missier. 2016. Scalable and efficient whole-exome data processing using workflows on the cloud. Future Gener. Comput. Syst. 65 (2016), 153--168. Google ScholarDigital Library
- Scott Callaghan, Ewa Deelman, Dan Gunter, Gideon Juve, Philip Maechling, Christopher Brooks, Karan Vahi, Kevin Milner, Robert Graves, Edward Field, David Okaya, and Thomas Jordan. 2010. Scaling up workflow-based applications. J. Comput. System Sci. 76, 6 (2010), 428--446. Google ScholarDigital Library
- Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2006. Managing the evolution of dataflows with VisTrails. In Proc. 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE Computer Society, Washington, DC, USA, 71. Google ScholarDigital Library
- Sashi Kiran Challa, Marlon Pierce, and Suresh Marru. 2010. Integrating chemistry scholarship with web architectures, grid computing and semantic web. In Proc. Gateway Computing Environments Workshop (GCE’10). 1--8.Google ScholarCross Ref
- Matthew Chalmers. 2014. Large Hadron Collider: The big reboot. Nature 514 (2014), 158--160.Google ScholarCross Ref
- Jinjun Chen and Yun Yang. 2008. A taxonomy of grid workflow verification and validation. Concurrency and Computation: Practice and Experience 20, 4 (March 2008), 347--360. Google ScholarDigital Library
- Weiwei Chen, Rafael Ferreira da Silva, Ewa Deelman, and Rizos Sakellariou. 2015. Using imbalance metrics to optimize task clustering in scientific workflow executions. Future Gener. Comput. Syst. 46 (2015), 69--84. Google ScholarDigital Library
- Weiwei Chen and Ewa Deelman. 2011. Workflow overhead analysis and optimizations. In Proc. WORKS’11. ACM, New York, NY, USA, 11--20. Google ScholarDigital Library
- Daniel Crawl and Ilkay Altintas. 2008. A provenance-based fault tolerance mechanism for scientific workflows. In Provenance and Annotation of Data and Processes. LNCS, Vol. 5272. 152--159. 10.1007/978-3-540-89965-5_17 Google ScholarDigital Library
- Víctor Cuevas-Vicenttín, Saumen Dey, Sven Köhler, Sean Riddle, and Bertram Ludäscher. 2012. Scientific workflows and provenance: Introduction and research opportunities. Datenbank-Spektrum 12, 3 (2012), 193--203.Google ScholarCross Ref
- Sérgio Manuel Serra da Cruz, Maria Luiza M. Campos, and Marta Mattoso. 2009. Towards a taxonomy of provenance in scientific workflow management systems. In Proc. 2009 IEEE Congress on Services- PartI (SERVICES’09). IEEE Computer Society, 259--266. Google ScholarDigital Library
- David De Roure, Carole Goble, Sergejs Aleksejevs, Sean Bechhofer, Jiten Bhagat, Don Cruickshank, Paul Fisher, Nandkumar Kollara, Danius Michaelides, and others. 2010. The evolution of myExperiment. In Proc. e-Science’10. IEEE, 153--160. Google ScholarDigital Library
- David De Roure, Carole Goble, and Robert Stevens. 2009. The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Gener. Comput. Syst. 25, 5 (2009), 561--567. Google ScholarDigital Library
- David De Roure, Kevin R. Page, Benjamin Fields, Tim Crawford, J. Stephen Downie, and Ichiro Fujinaga. 2011. An e-research approach to web-scale music analysis. Phil. Trans. R. Soc. A 369, 1949 (Aug. 2011), 3300--3317.Google ScholarCross Ref
- Ewa Deelman. 2010. Grids and clouds: Making workflow applications work in heterogeneous distributed environments. International Journal of High Performance Computing Applications 24, 3 (Aug. 2010), 284--298. Google ScholarDigital Library
- Ewa Deelman, Scott Callaghan, Edward Field, Hunter Francoeur, Robert Graves, Nitin Gupta, Vipin Gupta, Thomas H. Jordan, Carl Kesselman, and others. 2006. Managing large-scale workflow execution from resource provisioning to provenance tracking: The cybershake example. In Proc. e-Science’06. 14. Google ScholarDigital Library
- Ewa Deelman, Dennis Gannon, Matthew Shields, and Ian Taylor. 2009. Workflows and e-Science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25, 5 (May 2009), 528--540. Google ScholarDigital Library
- Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. 2015. Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46 (2015), 17--35. Google ScholarDigital Library
- Ewa Deelman, Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani, and Rafael Ferreira da Silva. 2016. Pegasus in the cloud: Science automation through workflow technologies. IEEE Internet Computing 20, 1 (Jan. 2016), 70--76. Google ScholarDigital Library
- László Dobos, István Csabai, Alexander S. Szalay, Tamás Budavári, and Nolan Li. 2013. Graywulf: A platform for federated scientific databases and services. In Proc. 25th International Conference on Scientific and Statistical Database Management (SSDBM). ACM, New York, NY, USA, Article 30, 12 pages. Google ScholarDigital Library
- Rion Dooley, Kent Milfeld, Chona Guiang, Sudhakar Pamidighantam, and Gabrielle Allen. 2006. From proposal to production: Lessons learned developing the computational chemistry grid cyberinfrastructure. Journal of Grid Computing 4, 2 (2006), 195--208.Google ScholarCross Ref
- Lei Dou, Daniel Zinn, Timothy McPhillips, Sven Kohler, Sean Riddle, Shawn Bowers, and Bertram Ludäscher. 2011. Scientific workflow design 2.0: Demonstrating streaming data collections in Kepler. In Proc. IEEE ICDE’11. 1296--1299. Google ScholarDigital Library
- Johan Eker, Jörn W. Janneck, Edward A. Lee, Jie Liu, Xiaojun Liu, Jozsef Ludvig, Stephen Neuendorffer, Sonia Sachs, and Yuhong Xiong. 2003. Taming heterogeneity - the Ptolemy approach. Proc. IEEE 91, 1 (Jan. 2003), 127--144.Google ScholarCross Ref
- Erik Elmroth, Francisco Hernández, and Johan Tordsson. 2010. Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment. Future Gener. Comput. Syst. 26, 2 (Feb. 2010), 245--256. Google ScholarDigital Library
- Wolfgang Emmerich, Ben Butchart, Liang Chen, Bruno Wassermann, and Sarah Price. 2005. Grid service orchestration using the business process execution language (BPEL). Journal of Grid Computing 3, 3 (Sept. 2005), 283--304.Google ScholarCross Ref
- EU Parliament. 2007. Directive 2007/2/EC of the European parliament and of the council of 14 march 2007 establishing an infrastructure for spatial information in the european community (INSPIRE). Official Journal of the European Union 50, L108 (April 2007).Google Scholar
- Thomas Fahringer, Radu Prodan, Rubing Duan, Jüurgen Hofer, Farrukh Nadeem, Francesco Nerieri, Stefan Podlipnig, Jun Qin, Mumtaz Siddiqui, and others. 2007. ASKALON: A development and grid computing environment for scientific workflows. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 450--471.Google Scholar
- Zbyněk Falt, David Bednárek, Martin Kruliš, Jakub Yaghob, and Filip Zavoral. 2014. Bobolang: A language for parallel streaming applications. In Proc. HPDC’14. ACM, New York, NY, USA, 311--314. Google ScholarDigital Library
- Rosa Filgueira, Malcolm Atkinson, Yusuke Tanimura, and Isao Kojima. 2014. Applying selectively parallel I/O compression to parallel storage systems. In Euro-Par 2014 Parallel Processing. LNCS, Vol. 8632. 282--293.Google Scholar
- Rosa Filguiera, Amrey Krause, Malcolm Atkinson, Iraklis Klampanos, and Alexander Moreno. 2016. dispel4py: A python framework for data-intensive scientific computing. International Journal of High Performance Computing Applications (2016), 1--19.Google Scholar
- Ian Foster, Jens Vöckler, Michael Wilde, and Yong Zhao. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proc. SSDBM’02. 37--46. Google ScholarDigital Library
- Scott W. French and Barbara Romanowicz. 2015. Broad plumes rooted at the base of the Earth’s mantle beneath major hotspots. Nature 525, 7567 (03 09 2015), 95--99. 10.1038/nature14876.Google Scholar
- Dennis Gannon. 2007. Component architectures and services: From application construction to scientific workflows. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 174--189.Google Scholar
- Daniel Garijo. 2015. Mining Abstractions in Scientific Workflows. Ph.D. Dissertation. Departamento de Inteligencia Artficial Escuela Técnica Superior de Ingenieros Informáticos, Madrid, Spain.Google Scholar
- Daniel Garijo, Facultad De Informática, and Yolanda Gil. 2012. Towards Open Publication of Reusable Scientific Workflows: Abstractions, Standards and Linked Data. Technical Report. (Jan. 2012).Google Scholar
- Sandra Gesing, Malcolm Atkinson, Rosa Filgueira, Ian Taylor, Andrew Jones, Vlado Stankovski, Chee Sun Liew, Alessandro Spinuso, Gabor Terstyanszky, and Peter Kacsuk. 2014. Workflows in a dashboard: A new generation of usability. In Proc. WORKS’14. IEEE Press, Piscataway, NJ, USA, 82--93. Google ScholarDigital Library
- Jayeeta Ghosh, Suresh Marru, Nikhil Singh, Kenno Vanomesslaeghe, Ye Fan, and Sudhakar Pamidighantam. 2011. Molecular parameter optimization gateway (ParamChem): Workflow management through teragrid ASTA. In Proc. TeraGrid (TG’11). ACM, 35:1--35:8. Google ScholarDigital Library
- Yolanda Gil, Jihie Kim, Varun Ratnakar, and Ewa Deelman. 2006. Wings for Pegasus: A semantic approach to creating very large scientific workflows. In Proc. Workshop on OWL: Experiences and Directions (OWLED’06), Vol. 216.Google Scholar
- Edward Givelberg, Alexander Szalay, Kalin Kanov, and Randal Burns. 2011. An architecture for a data-intensive computer. In Proc. Network Aware Data Management (NDM’11). ACM, New York, NY, USA, 57--64. Google ScholarDigital Library
- Carole Goble and David De Roure. 2009. The impact of workflow tools on data-centric research. In The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). Microsoft, 137--145.Google Scholar
- Katharina Görlach, Mirko Sonntag, Dimka Karastoyanova, Frank Leymann, and Michael Reiter. 2011. Conventional workflow technology for scientific simulation. In Guide to e-Science. 323--352.Google Scholar
- Ian Gorton, Paul Greenfield, Alex Szalay, and Roy Williams. 2008. Data-intensive computing in the 21st century. Computer 41, 4 (April 2008), 30--32. Google ScholarDigital Library
- Jim Gray. 2009. Jim gray on escience: A transformed scientific method. In The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). Microsoft, xix--xxxiii.Google Scholar
- Paul Grefen and Jochem Vonk. 2006. A taxonomy of transactional workflow support. International Journal of Cooperative Information Systems 15, 1 (March 2006), 87--118.Google ScholarCross Ref
- Paul Groth, Yolanda Gil, James Cheney, and Simon Miles. 2012. Requirements for provenance on the web. International Journal of Digital Curation 7, 1 (2012), 39--55.Google ScholarCross Ref
- Yunhong Gu and Robert L. Grossman. 2009. Sector and sphere: The design and implementation of a high-performance data cloud. Phil. Trans. R. Soc. A 367, 1897 (June 2009), 2429--2445.Google ScholarCross Ref
- Thilina Gunarathne, Chathura Herath, Eran Chinthaka, and Suresh Marru. 2009. Experience with adapting a WS-BPEL runtime for escience workflows. In Proc. GCE’09. ACM, 7:1--7:10. Google ScholarDigital Library
- Ákos Hajnal, Zoltán Farkas, Péter Kacsuk, and Tamás Pintér. 2014. Remote storage resource management in WS-PGRADE/gUSE. In Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities, Péter Kacsuk (Ed.). Springer, Chapter 5, 69--81. Google ScholarDigital Library
- Mihael Hategan, Justin Wozniak, and Ketan Maheshwari. 2011. Coasters: Uniform resource provisioning and access for clouds and grids. In Proc. 4th IEEE International Conference on Utility and Cloud Computing (UCC’11). IEEE Computer Society, 114--121. Google ScholarDigital Library
- George Heald, Michael Bell, Andreas Horneffer, André Offringa, Roberto Pizzo, Sebastiaan van der Tol, Reinout van Weeren, Joris van Zwieten, James Anderson, and others. 2011. LOFAR: Recent imaging results and future prospects. Journal of Astrophysics and Astronomy 32, 4 (Dec. 2011), 1--10.Google ScholarCross Ref
- Tom Heath and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space (1st ed.). Number 1-136 in Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan 8 Claypool. Google ScholarDigital Library
- Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research.Google Scholar
- Interagency Working Group on Digital Data. 2009. Harnessing the Power of Digital Data for Science and Society: Report of the Interagency Working Group on Digital Data to the National Science and Technology Council. Technical Report. Executive office of the President, Office of Science and Technology, USA.Google Scholar
- Gideon Juve and Ewa Deelman. 2010. Scientific workflows and clouds. Crossroads 16, 3 (March 2010), 14--18. Google ScholarDigital Library
- Péter Kacsuk (Ed.). 2014. Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities. Google ScholarDigital Library
- Peter Kacsuk, Zoltan Farkas, Miklos Kozlovszky, Gabor Hermann, Akos Balasko, Krisztian Karoczkai, and Istvan Marton. 2012. WS-PGRADE/gUSE Generic DCI gateway framework for a large variety of user communities. Journal of Grid Computing 10, 4 (2012), 601--630. Google ScholarDigital Library
- Peter Kacsuk, Gabor Terstyánszky, Ákos Balaskó,, Krisztian Karóczkai, and Zoltan Farkas. 2014. Executing multi-workflow simulations on mixed cloud and grid infrastructure using the SHIWA and SCI-BUS technology. In Cloud Computing and Big Data, C. Catlett, W. Gentzsch, L. Grandinetti, and G. Joubert (Eds.). Ios Pr Inc, 141--162.Google Scholar
- Douglas B. Kell and Stephen G. Oliver. 2004. Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. BioEssays 26, 1 (Jan. 2004), 99--105.Google ScholarCross Ref
- Steve Kelling, Daniel Fink, Wesley Hochachka, Ken Rosenberg, Robert Cook, Theodoros Damoulas, Claudio Silva, and William Michener. 2013. Estimating species distributions -- across space, through time and with features of the environment. In The Data Bonanza -- Improving Knowledge Discovery for Science, Engineering and Business, Malcolm P. Atkinson, Rob Baxter, Paolo Besana, Michelle Galea, Mark Parsons, Peter Brezany, Oscar Corcho, Jano van Hemert, and David Snelling (Eds.). John Wiley 8 Sons Inc., Hoboken, NJ, USA, Chapter 22, 441--458.Google Scholar
- Jihie Kim, Ewa Deelman, Yolanda Gil, Gaurang Mehta, and Varun Ratnakar. 2008. Provenance trails in the Wings/Pegasus system. Concurrency and Computation: Practice and Experience 20, 5 (April 2008), 587--597. Google ScholarDigital Library
- Hoyt Koepke. 2014. Why Python Rocks for Research. Technical Report. University of Washington.Google Scholar
- Sven Kohler, Supriya Gulati, Gongjing Cao, Quinn Hart, and Bertram Ludascher. 2012. Sliding window calculations on streaming data using the Kepler scientific workflow system. Procedia Computer Science 9, 0 (2012), 1639--1646.Google ScholarCross Ref
- Vladimir Korkhov, Dagmar Krefting, Tamas Kukla, Gabor Z. Terstyánszky, Matthan W. A. Caan, and Silvia D. Olabarriaga. 2013. Exploring workflow interoperability for neuroimage analysis on the SHIWA platform. Journal of Grid Computing 11, 3 (2013), 505--522. Google ScholarDigital Library
- Miklos Kozlovszky, Krisztián Karóczkai, István Márton, Péter Kacsuk, and Tibor Gottdank. 2014. DCI Bridge: Executing WS-PGRADE workflows in distributed computing infrastructures. In Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities, Péter Kacsuk (Ed.). Springer, Chapter 4, 51--67.Google Scholar
- Michael Litzkow, Miron Livny, and Matthew Mutka. 1988. Condor - A hunter of idle workstations. In Proc. 8th International Conference of Distributed Computing Systems. IEEE Computer Society Press, 104--111.Google ScholarCross Ref
- Xavier Llorà, Bernie Ács, Loretta S. Auvil, Boris Capitanu, Michael E. Welge, and David E. Goldberg. 2008. Meandre: Semantic-driven data-intensive flows in the clouds. In Proc. e-Science’08. 238--245. Google ScholarDigital Library
- Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. 2006. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18, 10 (August 2006), 1039--1065. Google ScholarDigital Library
- Bertram Ludäscher, Mathias Weske, Timothy McPhillips, and Shawn Bowers. 2009. Scientific workflows: Business as usual? In Business Process Management. LNCS, Vol. 5701. 31--47.Google ScholarCross Ref
- Philip Maechling, Ewa Deelman, Li Zhao, Robert Graves, Gaurang Mehta, Nitin Gupta, John Mehringer, Carl Kesselman, Scott Callaghan, David Okaya, Hunter Francoeur, Vipin Gupta, Yifeng Cui, Karan Vahi, Thomas Jordan, and Edward Field. 2007. SCEC cybershake workflows—Automating probabilistic seismic hazard analysis calculations. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 143--163.Google Scholar
- Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, and Ian Foster. 2013. Enabling multi-task computation on Galaxy-based gateways using swift. In Proc. IEEE International Conference on Cluster Computing (CLUSTER 2013). 1--3.Google ScholarCross Ref
- Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, and others. 2011. Apache airavata: A framework for distributed applications and computational workflows. In Proc. GCE’11. ACM, 21--28. Google ScholarDigital Library
- Suresh Marru, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. 2015. Apache airavata as a laboratory: Architecture and case study for component-based gateway middleware. In Proc. SCREAM’15. 19--26. Google ScholarDigital Library
- Paul Martin and Gagarine Yaikhom. 2013. Definition of the DISPEL language. In The Data Bonanza -- Improving Knowledge Discovery for Science, Engineering and Business, Malcolm P. Atkinson, Rob Baxter, Paolo Besana, Michelle Galea, Mark Parsons, Peter Brezany, Oscar Corcho, Jano van Hemert, and David Snelling (Eds.). John Wiley 8 Sons Inc., Hoboken, NJ, USA, Chapter 10, 203--236.Google Scholar
- Cherian Mathew, Anton Güntsch, Matthias Obst, Saverio Vicario, Robert Haines, Alan Williams, Yde de Jong, and Carole Goble. 2014. A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control. Biodiversity Data Journal 2 (Dec. 2014), e4221.Google Scholar
- Michael McLennan, Steven Clark, Ewa Deelman, Mats Rynge, Karan Vahi, Frank McKenna, Derrick Kearney, and Carol Song. 2015. HUBzero and Pegasus: Integrating scientific workflows into science gateways. Concurrency and Computation: Practice and Experience 27, 2 (2015), 328--343.Google ScholarCross Ref
- Michael McLennan and Rick Kennell. 2010. HUBzero: A platform for dissemination and collaboration in computational science and engineering. Computing in Science Engineering 12, 2 (March 2010), 48--53. Google ScholarDigital Library
- Timothy M. McPhillips and Shawn Bowers. 2005. An approach for pipelining nested collections in scientific workflows. SIGMOD Record 34, 3 (Sept. 2005), 12--17. Google ScholarDigital Library
- William Michener, James Beach, Shawn Bowers, Laura Downey, Matthew Jones, Bertram Ludäscher, Deana Pennington, Arcot Rajasekar, Samantha Romanello, Mark Schildhauer, Dave Vieglais, and Jianting Zhang. 2005. Data integration and workflow solutions for ecology. In Data Integration in the Life Sciences. LNCS, Vol. 3615. 734--734. Google ScholarDigital Library
- Paolo Missier, Bertram Ludascher, Shawn Bowers, Saumen Dey, Anandarup Sarkar, Biva Shrestha, Ilkay Altintas, Manish Kumar Anand, and Carole Goble. 2010a. Linking multiple workflow provenance traces for interoperable collaborative science. In WORKS’10. 1--8.Google Scholar
- Paolo Missier, Bertram Ludäscher, Saumen C. Dey, Michael Wang, Timothy M. McPhillips, Shawn Bowers, Michael Agun, and Ilkay Altintas. 2012. Golden trail: Retrieving the data history that matters from a comprehensive provenance repository. IJDC 7, 1 (2012), 139--150.Google ScholarCross Ref
- Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Wei Tan, Alexandra Nenadic, Ian Dunlop, Alan Williams, Tom Oinn, and Carole Goble. 2010b. Taverna, Reloaded. In Scientific and Statistical Database Management. LNCS, Vol. 6187. 471--481. Google ScholarDigital Library
- Fiona Murphy, Publishing Data Workflows WG, Theodora Bloom, Sunje Dallmeier-Tiessen, Claire C. Austin, Angus Whyte, Jonathan Tedds, Amy Nurnberger, Lisa Raymond, Martina Stockhause, and Mary Vardigan. 2015. WDS-RDA Publishing Data Workflows Working Group Analysis sheet. (June 2015).Google Scholar
- James Myers, Margaret Hedstrom, Dharma Akmon, Sandy Payette, Beth A. Plale, Inna Kouper, Scott McCaulay, Robert McDonald, Isuru Suriarachchi, and others. 2015. Towards sustainable curation and preservation. In Proc. e-Science’15. 526--535. Google ScholarDigital Library
- Michael L. Norman and Allan Snavely. 2010. Accelerating data-intensive science with Gordon and Dash. In Proc. TG’10. ACM, New York, NY, USA, Article 14, 7 pages. Google ScholarDigital Library
- Thomas Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew Pocock, Anil Wipat, and Peter Li. 2004. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 17 (Nov. 2004), 3045--3054. Google ScholarDigital Library
- Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, and others. 2006. Taverna: Lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience 18, 10 (2006), 1067--1100. Google ScholarDigital Library
- Tom Oinn, Peter Li, Douglas B. Kell, Carole Goble, Antoon Goderis, Mark Greenwood, Duncan Hull, Robert Stevens, Daniele Turi, and Jun Zhao. 2007. Taverna/myGrid: Aligning a workflow system with the life sciences community. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 300--319.Google Scholar
- Ioan Raicu, Yong Zhao, Catalin Dumitrescu, Ian Foster, and Mike Wilde. 2007. Falkon: A fast and light-weight tasK executiON framework. In Proc. SC’07. ACM, New York, NY, USA, Article 43, 12 pages. Google ScholarDigital Library
- Christopher Rawlings. 2014. Big data in the agricultural and ecological sciences — a growing challenge. Keynote EGI Community Forum 2014. (May 2014).Google Scholar
- A. T. Ringler, M. T. Hagerty, J. Holland, A. Gonzales, L. S. Gee, J. D. Edwards, D. Wilson, and A. M. Baker. 2015. The data quality analyzer: A quality control program for seismic data. Computers 8 Geosciences 76 (2015), 96--111. Google ScholarDigital Library
- David Rogers, Ian Harvey, Tram Truong Huu, Kieran Evans, Tristan Glatard, Ibrahim Kallel, Ian Taylor, Johan Montagnat, Andrew Jones, and Andrew Harrison. 2013. Bundle and pool architecture for multi-language, robust, scalable workflow executions. Journal of Grid Computing 11, 3 (2013), 457--480. Google ScholarDigital Library
- John W. Romein, Jan David Mol, Rob V. van Nieuwpoort, and P. Chris Broekema. 2011. Processing LOFAR telescope data in real time on a blue Gene/P supercomputer. In General Assembly and Scientific Symposium, 2011 XXXth URSI. 1--4.Google Scholar
- Susanna-Assunta Sansone, Philippe Rocca-Serra, Dawn Field, Eamonn Maguire, Chris Taylor, Oliver Hofmann, Hong Fang, Steffen Neumann, Weida Tong, and others. 2012. Toward interoperable bioscience data. Nat. Genet. 44, 2 (02 2012), 121--126.Google Scholar
- Idafen Santana-Perez, Rafael Ferreira da Silva, Mats Rynge, Ewa Deelman, María S. Pérez-Hernández, and Oscar Corcho. 2016. Reproducibility of execution environments in computational science using Semantics and Clouds. Future Gener. Comput. Syst. 67 (2016), 354--367.Google ScholarCross Ref
- Matthew Shields. 2007. Control- versus data-driven workflows. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 167--173.Google Scholar
- Yogesh L. Simmhan, Roger Barga, Catharine van Ingen, Ed Lazowska, and Alex Szalay. 2009. Building the trident scientific workflow workbench for data management in the cloud. In Proc. 3rd International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP’09). 41--50. Google ScholarDigital Library
- Aleksander Slominski. 2007. Adapting BPEL to scientific workflows. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 208--226.Google Scholar
- Alessandro Spinuso, Rosa Fligueira, Malcolm Atkinson, and Andre Gemuend. 2016. Visualisation methods for large provenance collections in data-intensive collaborative platforms. In Geophysical Research Abstracts - EGU General Assembly 2016, Vol. 18.Google Scholar
- Sudarshan Srinivasan, Gideon Juve, Rafael Ferreira da Silva, Karan Vahi, and Ewa Deelman. 2014. A cleanup algorithm for implementing storage constraints in scientific workflow executions. In Proc. WORKS’14. IEEE Press, 41--49. Google ScholarDigital Library
- Tiberiu Stef-Praun, Benjamin Clifford, Ian Foster, Uri Hasson, Mihael Hategan, Steven L. Small, Michael Wilde, and Yong Zhao. 2007. Accelerating medical research using the swift workflow system. Studies in Health Technology and Informatics 126 (2007), 207--216.Google Scholar
- Michael Stonebraker, Jacek Becla, David J. DeWitt, Kian-Tat Lim, David Maier, Oliver Ratzesberger, and Stanley B. Zdonik. 2009. Requirements for science data bases and SciDB. In Proc. Biennial Conference on Innovative Data Systems Research (CIDR’09).Google Scholar
- Michael Stonebraker, Paul Brown, Donghui Zhang, and Jacek Becla. 2013. SciDB: A database management system for applications with complex analytics. Computing in Science 8 Engineering 15, 3 (2013), 54--62. Google ScholarDigital Library
- Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison. 2007a. The Triana workflow environment: Architecture and applications. In Workflows for e-Science: Scientific Workflows for Grids, Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields (Eds.). Springer London, 320--339.Google Scholar
- Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2007b. Workflows for e-Science: Scientific workflows for grids. Springer London. Google ScholarDigital Library
- Gabor Terstyánszky, Edward Michniak, Tamás Kiss, and Ákos Balaskó. 2014. Sharing science gateway artefacts through repositories. In Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities. Springer, Chapter 9, 123--135.Google Scholar
- Douglas Thain, Todd Tannenbaum, and Miron Livny. 2005. Distributed computing in practice: The Condor experience. Concurrency and Computation: Practice and Experience 17, 2-4 (2005), 323--356. Google ScholarDigital Library
- Thomas D. Uram, Michael E. Papka, Mark Hereld, and Michael Wilde. 2011. A solution looking for lots of problems: generic portals for science infrastructure. In Proc. TG’11. ACM, New York, NY, USA, Article 44, 7 pages. Google ScholarDigital Library
- Wil M. P. van der Aalst and Arthur H. M. ter Hofstede. 2014. Workflow Patterns. http://www.workflowpatterns.com. (2014).Google Scholar
- Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, B. Kiepuszewski, and A. P. Barros. 2003. Workflow Patterns. Distributed and Parallel Databases 14, 1 (July 2003), 5--51. Google ScholarDigital Library
- Jens Vöckler, Gaurang Mehta, Yong Zhao, Ewa Deelman, and Michael Wilde. 2006. Kickstarting Remote Applications. In Second International Workshop on Grid Computing Environments.Google Scholar
- Gregor von Laszewski and Mike Hategan. 2005. Workflow Concepts of the Java CoG Kit. Journal of Grid Computing 3, 3 (Sept. 2005), 239--258.Google ScholarCross Ref
- Chip Walter. 2005. Kryder’s Law: The doubling of processor speed every 18 months is a snail’s pace compared with rising hard-disk capacity, and Mark Kryder plans to squeeze in even more bits. Scientific American (August 2005), 32--33.Google Scholar
- Hongbing Wang, Joshua Zhexue Huang, Yuzhong Qu, and Junyuan Xie. 2004. Web services: Problems and future directions. Web Semantics: Science, Services and Agents on the World Wide Web 1, 3 (April 2004), 309--320.Google ScholarCross Ref
- Marek Wieczorek, Andreas Hoheisel, and Radu Prodan. 2009. Towards a general model of the multi-criteria workflow scheduling on the grid. Future Gener. Comput. Syst. 25, 3 (March 2009), 237--256. Google ScholarDigital Library
- Michael Wilde, Ian Foster, Kamil Iskra, Pete Beckman, Zhao Zhang, Allan Espinosa, Mihael Hategan, Ben Clifford, and Ioan Raicu. 2009. Parallel scripting for applications at the petascale and beyond. Computer 42, 11 (Nov. 2009), 50--60. Google ScholarDigital Library
- Matthew Woitaszek, John M. Dennis, and Taleena R. Sine. 2011. Parallel high-resolution climate data analysis using swift. In Proc. ACM International Workshop on Many Task Computing on Grids and Supercomputers (MTAGS’11). ACM, New York, NY, USA, 5--14. Google ScholarDigital Library
- Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, and others. 2013. The Taverna workflow suite: Designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research 41, W1 (2013), W557--W561.Google ScholarCross Ref
- Justin M. Wozniak, Timothy G. Armstrong, Ketan Maheshwari, Ewing L. Lusk, Daniel S. Katz, Michael Wilde, and Ian T. Foster. 2013a. Turbine: A distributed-memory dataflow engine for high performance many-task applications. Fundamenta Informaticae 128, 3 (01 2013), 337--366. Google ScholarDigital Library
- Justin M. Wozniak, Timothy G. Armstrong, Michael Wilde, Daniel S. Katz, Ewing Lusk, and Ian T. Foster. 2013b. Swift/T: Large-scale application composition via distributed-memory dataflow processing. In Proc. IEEE/ACM CCGRID’13. 95--102.Google Scholar
- Wenjun Wu, Thomas Uram, Michael Wilde, Mark Hereld, and Michael E. Papka. 2010. Accelerating science gateway development with Web 2.0 and Swift. In Proc. TG’10. ACM, New York, NY, USA, Article 23, 7 pages. Google ScholarDigital Library
- Youngik Yang, Jong Youl Choi, Chathura Herath, Suresh Marru, and Sun Kim. 2010. Biovlab:Bioinformatics data analysis using cloud computing and graphical workflow composers. In Cloud Computing and Software Services: Theory and Techniques, Syed A. Ahson and Mohammad Ilyas (Eds.). Number 309-327. CRC Press, Inc.Google Scholar
- Jia Yu and Rajkumar Buyya. 2005. A taxonomy of workflow management systems for grid computing. Journal of Grid Computing 3, 3--4 (Sept. 2005), 171--200.Google ScholarCross Ref
- Yong Zhao, Mihael Hategan, Ben Clifford, Ian Foster, Gregor von Laszewski, Veronika Nefedova, Ioan Raicu, Tiberiu Stef-Praun, and Michael Wilde. 2007. Swift: Fast, reliable, loosely coupled parallel computation. In Proc. IEEE SERVICES’07. IEEE Computer Society, 199--206.Google ScholarCross Ref
- Yong Zhao, Youfu Li, Ioan Raicu, Shiyong Lu, Wenhong Tian, and Heng Liu. 2015. Enabling scalable scientific workflow management in the Cloud. Future Gener. Comput. Syst. 46 (2015), 3--16. Google ScholarDigital Library
- Zhiming Zhao, Paola Grosso, Jeroen van der Ham, Ralph Koning, and Cees de Laat. 2011. An agent based network resource planner for workflow applications. Multiagent and Grid Systems 7, 6 (2011), 187--202. Google ScholarDigital Library
- Daniel Zinn, Quinn Hart, Timothy McPhillips, Bertram Ludäscher, Yogesh Simmhan, Michail Giakkoupis, and Viktor K. Prasanna. 2011. Towards reliable, performant workflows for streaming-applications on cloud platforms. In Proc. IEEE/ACM CCGRID’11. 235--244. Google ScholarDigital Library
Index Terms
- Scientific Workflows: Moving Across Paradigms
Recommendations
Challenges of Running Scientific Workflows in Cloud Environments
ScienceCloud '15: Proceedings of the 6th Workshop on Scientific Cloud ComputingThis talk will touch upon the challenges of running scientific workflows in distributed environments such as academic and commercial clouds. It will describe the Pegasus Workflow Management System [1] and how it manages the execution of a variety of ...
Collaborative e-Science Experiments and Scientific Workflows
Recent advances in Internet and grid technologies have greatly enhanced scientific experiments' life cycle. In addition to compute- and data-intensive tasks, large-scale collaborations involving geographically distributed scientists and e-infrastructure ...
Scheduling of Scientific Workflows on Data Grids
CCGRID '08: Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the GridSelection of resources for execution of scientific workflows in data grids becomes challenging with the exponential growth of files as a result of the distribution of scientific experiments around the world. With more runs of these experiments, huge ...
Comments