Abstract
Reproducibility is widely considered to be an essential requirement of the scientific process. However, a number of serious concerns have been raised recently, questioning whether today’s computational work is adequately reproducible. In principle, it should be possible to specify a computation to sufficient detail that anyone should be able to reproduce it exactly. But in practice, there are fundamental, technical, and social barriers to doing so. The many objectives and meanings of reproducibility are discussed within the context of scientific computing. Technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.
- Ali Abedi, Andrew Heard, and Tim Brecht. 2015. Conducting repeatable experiments and fair comparisons using 802.11 n MIMO networks. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 41--50. Google ScholarDigital Library
- Erika Abraham, Hadas Kress-Gazit, Lorenzo Natale, and Armando Tacchella. 2017. Computer-assisted engineering for robotics and autonomous systems (dagstuhl seminar 17071). In Dagstuhl Reports, Vol. 7. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain. 2012. Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids. In Proceedings of the Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET’12) at ACM SIGMOD. Google ScholarDigital Library
- Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludascher, and Steve Mock. 2004. Kepler: An extensible system for design and execution of scientific workflows. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management. IEEE, 423--424. Google ScholarDigital Library
- Kaizar Amin, Gregor Von Laszewski, Mihael Hategan, Nestor J. Zaluzec, Shawn Hampton, and Albert Rossi. 2004. Gridant: A client-controllable grid workflow system. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences, 2004. IEEE, 1--10. Google ScholarDigital Library
- Peter Amstutz, Michael R. Crusoe, Nebojša Tijanić, Brad Chapman, John Chilton, Michael Heuer, Andrey Kartashov, Dan Leehr, Hervé Ménager, Maya Nedeljkovich, Matt Scales, Stian Soiland-Reyes, and Luka Stojanovic. 2016. Common Workflow Language (version 1.0). (July 2016).Google Scholar
- Paul Anderson and Edmund Smith. 2005. Configuration tools: Working together. In LISA. 31--37. Google ScholarDigital Library
- Matjaz B. Juric, Benny Mathew, and Poornachandra G. Sarang. 2006. Business process execution language for web services: an architect and developer's guide to orchestrating web services using BPEL4WS. Packt Publishing Ltd. Google ScholarDigital Library
- Lerina Aversano, Aniello Cimitile, Pierpaolo Gallucci, and Maria Luisa Villani. 2002. FlowManager: A workflow management system based on petri nets. In Proceedings of the 26th Annual International Computer Software and Applications Conference (COMPSAC’02). IEEE, 1054--1059. Google ScholarDigital Library
- Lorena A. Barba. 2016. The hard road to reproducibility. Science 354, 6308 (2016), 142--142.Google ScholarCross Ref
- Ricardo Melo Bastos and Duncan Dubugras A. Ruiz. 2002. Extending UML activity diagram for workflow modeling in production systems. In Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS’02). IEEE, 3786--3795.Google ScholarCross Ref
- Louis Bavoil, Steven P. Callahan, Patricia J. Crossno, Juliana Freire, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2005. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Vis. 2005. IEEE, 135--142.Google Scholar
- Olivier Beaumont, Jocelyne Erhel, and Bernard Philippe. 2000. Aquarels: A problem-solving environment for validating scientific software. In Enabling Technologies for Computational Science. Springer, 351--362.Google Scholar
- C. Glenn Begley and Lee M. Ellis. 2012. Drug development: Raise standards for preclinical cancer research. Nature 483, 7391 (2012), 531--533.Google Scholar
- Robert Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky. 2009. The million dollar programming prize. IEEE Spectrum 46, 5 (2009), 28--33. Google ScholarDigital Library
- Jakob Blomer, Predrag Buncic, and Thomas Fuhrmann. 2011. CernVM-FS: Delivering scientific software to globally distributed computing resources. In Proceedings of the 1st International Workshop on Network-Aware Data Management. ACM, 49--56. Google ScholarDigital Library
- Barry Boehm. 1989. Software risk management. In Proceedings of the European Software Engineering Conference. Springer, 1--19. Google ScholarDigital Library
- Choompol BOONMEE and Shigeo KAWATA. 1998. Computer-assisted simulation environment for partial-differential-equation problem. Trans. Japan Soc. Comput. Eng. Sci. (1998), 19980002--19980002.Google Scholar
- Randall Bramley, Bruce Char, Dennis Gannon, Thomas T. Hewett, Chris Johnson, and John R. Rice. 2000. Workshop on scientific knowledge, information and computing (SIDEKI’98). Enabling Technol. Comput. Sci.: Framew. Middlew. Environ. 548 (2000), 19.Google Scholar
- Grant R. Brammer, Ralph W. Crosby, Suzanne J. Matthews, and Tiffani L. Williams. 2011. Paper Mâché: Creating dynamic reproducible science. Proced. Comput. Sci. 4 (2011), 658--667.Google ScholarCross Ref
- Tim Bray, Jean Paoli, C. Michael Sperberg-McQueen, Eve Maler, and Franois Yergeau. 1997. Extensible Markup Language (XML). World Wide Web Journal 2, 4 (1997), 27--66. Google ScholarDigital Library
- John Bresnahan, Tim Freeman, David LaBissoniere, and Kate Keahey. 2011. Managing appliance launches in infrastructure clouds. In Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. ACM, 12. Google ScholarDigital Library
- Eric A. Brewer. 2015. Kubernetes and the path to cloud native. In Proceedings of the 6th ACM Symposium on Cloud Computing. ACM, 167--167. Google ScholarDigital Library
- Tomasz Buchert, Cristian Ruiz, Lucas Nussbaum, and Olivier Richard. 2015. A survey of general-purpose experiment management tools for distributed systems. Future Gener. Comput. Syst. 45 (2015), 1--12. Google ScholarDigital Library
- Jonathan B. Buckheit and David L. Donoho. 1995. Wavelab and Reproducible Research. Springer.Google Scholar
- Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In Database Theory—ICDT 2001. Springer, 316--330. Google ScholarDigital Library
- Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger, Cláudio T. Silva, and Huy T. Vo. 2006. VisTrails: Visualization meets data management. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, 745--747. Google ScholarDigital Library
- Franck Cappello, Eddy Caron, Michel Dayde, Frédéric Desprez, Yvon Jégou, Pascale Primet, Emmanuel Jeannot, Stéphane Lanteri, Julien Leduc, Nouredine Melab, et al. 2005. Grid’5000: A large scale and highly reconfigurable grid experimental testbed. In Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing. IEEE Computer Society, 99--106. Google ScholarDigital Library
- Dylan Chapp, Travis Johnston, and Michela Taufer. 2015. On the need for reproducible numerical accuracy through intelligent runtime selection of reduction algorithms at the extreme scale. In Proceedings of the 2015 IEEE International Conference on Cluster Computing (CLUSTER’15), IEEE, 166--175. Google ScholarDigital Library
- Fernando Chirigati, Dennis Shasha, and Juliana Freire. 2013. Reprozip: Using provenance to support computational reproducibility. In Presented as Part of the 5th USENIX Workshop on the Theory and Practice of Provenance. Google ScholarDigital Library
- CircleCI. 2017. Continuous Integration and Delivery—CircleCI. Retrieved August 2, 2017, https://circleci.com/.Google Scholar
- Jon Claerbout. 2011. Making Scientific Contributions Reproducible. Retrieved July 11, 2006, http://sepwww.stanford.edu/oldsep/matt/join/redoc/web/iris.html.Google Scholar
- Jon Claerbout and Martin Karrenbach. 1992. Electronic documents give reproducible research a new meaning. In Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics. 601--604.Google ScholarCross Ref
- J. F. Claerbout. 1991. Electronic Document Preface. Technical Report SEP-72. Stanford Exploration Project. 18 pages. http://sepwww.stanford.edu/public/docs/sep72/jon3/paper_html/node4.html.Google Scholar
- National Research Council et al. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. National Academies Press.Google Scholar
- Ludovic Courtès and Ricardo Wurmus. 2015. Reproducible and user-controlled software environments in HPC with guix. In European Conference on Parallel Processing. Springer, 579--591.Google ScholarCross Ref
- Jennifer Crocker and M. Lynne Cooper. 2011. Addressing scientific fraud. Science 334, 6060 (2011), 1182--1182.Google ScholarCross Ref
- Donald Dabdub, K. Mani Chandy, and Thomas T. Hewett. 2000. Managing specificity and generality: Tailoring general archetypal PSEs to specific users. In Enabling Technologies for Computational Science. Springer, 65--77.Google Scholar
- Andrew Davison. 2012. Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. 14, 4 (2012), 48--56. Google ScholarDigital Library
- Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Sonal Patil, Mei-Hui Su, Karan Vahi, and Miron Livny. 2004. Pegasus: Mapping scientific workflows onto the grid. In Grid Computing. Springer, 11--20.Google Scholar
- Ewa Deelman, Dennis Gannon, Matthew Shields, and Ian Taylor. 2009. Workflows and e-Science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25, 5 (2009), 528--540. Google ScholarDigital Library
- David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. 2014. Eidetic systems. In Proceedings of the 11th USENIX Symposium on Oper. Systems Design and Implementation (OSDI’14), Vol. 14. 525--540. Google ScholarDigital Library
- Paolo Di Tommaso, Evan Floden, Maria Chatzou, and Cedric Notredame. 2017. Using the NextFlow framework for reproducible in-silico omics analyses across clusters and clouds. PeerJ Preprints 5 (2017), e2796v1.Google Scholar
- Andrew Dienstfrey and Ronald Boisvert. 2012. Uncertainty Quantification in Scientific Computing: 10th IFIP WG 2.5 Working Conference (WoCoUQ’11), Vol. 377. Springer.Google Scholar
- Christian Dietrich and Daniel Lohmann. 2015. The dataref versuchung: Saving time through better internal repeatability. ACM SIGOPS Oper. Systems Rev. 49, 1 (2015), 51--60. Google ScholarDigital Library
- Eelco Dolstra and Andres Löh. 2008. NixOS: A purely functional Linux distribution. In ACM Sigplan Not., Vol. 43. ACM, 367--378. Google ScholarDigital Library
- Carsten Dominik. 2010. The Org Mode 7 Reference Manual-Organize your life with GNU Emacs. Network Theory Ltd. Google ScholarDigital Library
- Chris Drummond. 2009. Replicability is not reproducibility: Nor is it good science. Cogprints Technical Report #7691. http://cogprints.org/7691/7/ICMLws09.pdf.Google Scholar
- Marlon Dumas and Arthur H. M. Ter Hofstede. 2001. UML activity diagrams as a workflow specification language. In Proceedings of the International Conference on the Unified Modeling Language. Springer, 76--90. Google ScholarDigital Library
- Paul M. Duvall. 2007. Continuous Integration. Pearson Education India.Google Scholar
- Sarah Edwards, Xuan Liu, and Niky Riga. 2015. Creating repeatable computer science and networking experiments on shared, public testbeds. ACM SIGOPS Oper. Systems Rev. 49, 1 (2015), 90--99. Google ScholarDigital Library
- Bo Einarsson. 2005. Accuracy and Reliability in Scientific Computing. SIAM. Google ScholarDigital Library
- Joseph Emeras, Bruno Bzeznik, Olivier Richard, Yiannis Georgiou, and Cristian Ruiz. 2012. Reconstructing the software environment of an experiment with Kameleon. In Proceedings of the 5th ACM COMPUTE Conference: Intelligent 8 Scalable System Technologies. ACM, 16. Google ScholarDigital Library
- Hakan Erdogmus, Maurizio Morisio, and Marco Torchiano. 2005. On the effectiveness of test-first approach to programming. In IEEE Transactions on Software Engineering 31, 3 (2005), 226--237. Google ScholarDigital Library
- Dror G. Feitelson. 2015. From repeatability to reproducibility and corroboration. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 3--11. Google ScholarDigital Library
- Karl Fogel. 2005. Producing Open Source Software: How to Run a Successful Free Software Project. O’Reilly Media, Inc. Google ScholarDigital Library
- Sergey Fomel. 2015. Reproducible research as a community effort: Lessons from the madagascar project. Comput. Sci. Eng. 17, 1 (2015), 20--26.Google ScholarDigital Library
- Sergey Fomel and Gilles Hennenfent. 2007. Reproducible computational experiments using scons. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'07), Vol. 4. IEEE, IV--1257.Google ScholarCross Ref
- Sergey Fomel, Paul Sava, Ioan Vlad, Yang Liu, and Vladimir Bashkardin. 2013. Madagascar: Open-source software project for multidimensional data analysis and reproducible computational experiments. J. Open Res. Softw. 1, 1 (2013).Google Scholar
- Sergey Fomel, Matthias Schwab, and Joel Schroeder. 1997. Empowering SEP’s documents. SEP-94: Stanford Exploration Project (1997), 339--361.Google Scholar
- Ian Foster, Jens Vockler, Michael Wilde, and Yong Zhao. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on the Scientific and Statistical Database Management, 2002. IEEE, 37--46. Google ScholarDigital Library
- Martin Fowler and Matthew Foemmel. 2006. Continuous integration. Thought-Works, Retrieved from http://www.thoughtworks.com/ContinuousIntegration.pdf, 122.Google Scholar
- Juliana Freire, David Koop, Emanuele Santos, and Cláudio T. Silva. 2008. Provenance for computational tasks: A survey. Comput. Sci. Eng. 10, 3 (2008). Google ScholarDigital Library
- James Frey. 2002. Condor DAGMan: Handling inter-job dependencies. Technical report, University of Wisconsin, Dept. of Computer Science).Google Scholar
- Hideaki Fuju, Shigeo Kawata, Hideaki Sugiura, Yuichi Saitoh, Yoshikazu Hayase, Hitohide Usami, Motohiro Yamada, Yutaka Miyahara, Hiroyuki Kanazawa, and Takashi Kikuchi. 2006. Scientific simulation execution support on a closed distributed computer environment. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (e-Science’06). IEEE, 109--109. Google ScholarDigital Library
- Efstratios Gallopoulos, Elias Houstis, and John R. Rice. 1994. Computer as thinker/doer: Problem-solving environments for computational science. IEEE Comput. Sci. Eng. 1, 2 (1994), 11--23. Google ScholarDigital Library
- John R. Rice. 1991. Future research directions in problem solving environments for computational science. In Proceedings of the IFIP TC2/WG 2.5 Working Conference on Programming Environments for High-Level Scientific Problem Solving. North-Holland Publishing Co., 363--369. Google ScholarDigital Library
- Rogel Garcia and Marco Tulio Valente. NextFlow: Business process meets mapping frameworks. Retrieved March 9, 2017, http://www.nextflow.org/downloads/Nextflow_tech_report.pdf.Google Scholar
- Daniel Garijo, Oscar Corcho, and Yolanda Gil. 2013. Detecting common scientific workflow fragments using templates and execution provenance. In Proceedings of the 7th International Conference on Knowledge Capture. ACM, 33--40. Google ScholarDigital Library
- Matan Gavish and David Donoho. 2011. A universal identifier for computational results. Proced. Comput. Sc. 4 (2011), 637--647.Google ScholarCross Ref
- Belinda Giardine, Cathy Riemer, Ross C. Hardison, Richard Burhans, Laura Elnitski, Prachi Shah, Yi Zhang, Daniel Blankenberg, Istvan Albert, James Taylor, et al. 2005. Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 15, 10 (2005), 1451--1455.Google ScholarCross Ref
- Carole Goble. 2013. Results may vary. Reproducibility, open science, and all that jazz (July 2013). Keynote given by Carole Goble on July 23, 2013 at ISMB/ECCB 2013. Retrieved November 9, 2016, http://www.slideshare.net/carolegoble/ismb2013-keynotecleangoble/17.Google Scholar
- O. S. Gómez, N. Juristo, and S. Vegas. 2010. Replication, reproduction and re-analysis: Three ways for verifying experimental findings. In Proceedings of the 1st International Workshop on Replication in Empirical Software Engineering Research (RESER’10), Cape Town, South Africa.Google Scholar
- Alyssa Goodman, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, et al. 2014. Ten simple rules for the care and feeding of scientific data. PLoS Comput. Biol. 10, 4 (2014), e1003542.Google ScholarCross Ref
- Eelco Dolstra and Eelco Visser. 2007. Automated software testing and release with nix build farms. In Proceedings of the 3rd European Symposium on Verification and Validation of Software Systems (VVSS'07). Eindhoven University of Technology, 65--77.Google Scholar
- Zhijie Guan, Francisco Hernandez, Purushotham Bangalore, Jeff Gray, Anthony Skjellum, Vijay Velusamy, and Yin Liu. 2006. Grid-flow: A grid-enabled scientific workflow system with a petri-net-based interface. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1115--1140. Google ScholarDigital Library
- Pradeep Kumar Gunda, Lenin Ravindranath, Chandramohan A. Thekkath, Yuan Yu, and Li Zhuang. 2010. Nectar: Automatic management of data and computation in datacenters. In OSDI, Vol. 10. 1--8. Google ScholarDigital Library
- Mitchell Hashimoto. 2013. Vagrant: Up and Running. O’Reilly Media, Inc.Google Scholar
- Les Hatton and Gregory Warr. 2016. Full computational reproducibility in biological science: Methods, software and a case study in protein biology. arXiv:1608.06897 (2016).Google Scholar
- Francisco Hernández, Purushotham Bangalore, Jeff Gray, and Kevin Reilly. 2005. A graphical modeling environment for the generation of workflows for the globus toolkit. In Component Models and Systems for Grid Applications. Springer, 79--96.Google Scholar
- Thomas T. Hewett and Jennifer L. DePaul. 2000. Toward a human centered scientific problem solving environment. In Kluwer International Series in Engineering and Computer Science. 79--90.Google Scholar
- Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy H. Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, Vol. 11, 22--22. Google ScholarDigital Library
- Andreas Hoheisel. 2006. User tools and languages for graph-based Grid workflows. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1101--1113. Google ScholarDigital Library
- David Hollingsworth and U. K. Hampshire. 1995. Workflow management coalition: The workflow reference model. Document Number TC00-1003 19 (1995). http://www.pa.icar.cnr.it/cossentino/ICT/doc/D12.1%20-%20Workflow%20Management%20Coalition%20-%20The%20Workflow%20Reference%20Model.pdf.Google Scholar
- Elias N. Houstis, John R. Rice, Efstratios Gallopoulos, and Randall Bramley. 2012. Enabling Technologies for Computational Science: Frameworks, Middleware and Environments, Vol. 548. Springer Science 8 Business Media. Google ScholarDigital Library
- Bill Howe. 2012. CDE: A tool for creating portable experimental software packages. Comput. Sci. Eng. 14, 4 (2012), 32--35. Google ScholarDigital Library
- Bill Howe. 2012. Virtual appliances, cloud computing, and reproducible research. Comput. Sci. Eng. 14, 4 (2012), 36--41. Google ScholarDigital Library
- Duncan Hull, Katy Wolstencroft, Robert Stevens, Carole Goble, Mathew R. Pocock, Peter Li, and Tom Oinn. 2006. Taverna: A tool for building and running workflows of services. Nucleic Acids Res. 34, Suppl. 2 (2006), W729--W732.Google ScholarCross Ref
- John P. A. Ioannidis. 2005. Why most published research findings are false. PLoS Med 2, 8 (2005), e124.Google ScholarCross Ref
- Peter Ivie and Douglas Thain. 2016. PRUNE: A preserving run environment for reproducible scientific computing. In Proceedings of the IEEE Conference on e-Science.Google ScholarCross Ref
- P. Ivie, C. Zheng, and D. Thain. 2016. An analysis of reproducibility and non-determinism in HEP software and ROOT data. In J. Phys.: Conf. Ser. IOP Publishing.Google Scholar
- Barbara R. Jasny, Gilbert Chin, Lisa Chong, and Sacha Vignieri. 2011. Again, and again, and again? Science 334, 6060 (2011), 1225--1225.Google ScholarCross Ref
- Emmanuel Jeanvoine, Luc Sarzyniec, and Lucas Nussbaum. 2013. Kadeploy3: Efficient and scalable operating system provisioning for clusters. USENIX; Login: 38, 1 (2013), 38--44.Google Scholar
- Jenkins. 2017. Jenkins. Retrieved August 2, 2017, https://jenkins.io/.Google Scholar
- Ivo Jimenez, Michael Sevilla, Noah Watkins, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2017. The popper convention: Making reproducible systems evaluation practical. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). IEEE, 1561--1570.Google ScholarCross Ref
- Chris Johnson. 2004. Top scientific visualization research problems. IEEE Comput. Graph. Appl. 24, 4 (2004), 13--17. Google ScholarDigital Library
- Shigeo Kawata. 2015. Computer assisted problem solving environment (PSE). In Encyclopedia of Information Science and Technology (3rd ed.). IGI Global, 1251--1260.Google Scholar
- Jihie Kim, Ewa Deelman, Yolanda Gil, Gaurang Mehta, and Varun Ratnakar. 2008. Provenance trails in the wings/pegasus system. Concurr. Computat.: Pract. Exp. 20, 5 (2008), 587--597. Google ScholarDigital Library
- Jonathan Klinginsmith, Malika Mahoui, and Yuqing Melanie Wu. 2011. Towards reproducible escience in the cloud. In Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom’11). IEEE, 582--586. Google ScholarDigital Library
- Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, et al. 2016. Jupyter notebooks? A publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas (2016), 87.Google Scholar
- Steven Knight. 2005. Building software with SCons. Comput. Sci. Eng. 7, 1 (2005), 79--88. Google ScholarDigital Library
- Ivan Krsul, Arijit Ganguly, Jian Zhang, Jose A. B. Fortes, and Renato J. Figueiredo. 2004. Vmplants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the ACM/IEEE SC2004 Conference on Supercomputing, 2004. IEEE, 7--7. Google ScholarDigital Library
- Christine Laine, Steven N. Goodman, Michael E. Griswold, and Harold C. Sox. 2007. Reproducible research: Moving toward research the public can really trust. Ann. Intern. Med. 146, 6 (2007), 450--453.Google ScholarCross Ref
- Dag Toppe Larsen, Jakob Blomer, Predrag Buncic, Ioannis Charalampidis, and Artem Haratyunyan. 2012. Long-term preservation of analysis software environment. In J. Phys.: Conf. Ser., Vol. 396. IOP Publishing, 032064.Google Scholar
- Yung-Li Lee, Mark E. Barkey, and Hong-Tae Kang. 2011. Metal Fatigue Analysis Handbook: Practical Problem-Solving Techniques for Computer-Aided Engineering. Elsevier.Google Scholar
- Jeffrey T. Leek and Roger D. Peng. 2015. Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences 112, 6 (2015), 1645--1646.Google ScholarCross Ref
- Randall J. LeVeque, Ian M. Mitchell, and Victoria Stodden. 2012. Reproducible research for scientific computing: Tools and strategies for changing the culture. Comput. Sci. Eng. 14, 4 (2012), 13. Google ScholarDigital Library
- Frank Leymann et al. 2001. Web Services FlowLanguage (WSFL 1.0). (2001).Google Scholar
- Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. 2015. A survey of data-intensive scientific workflow management. J. Grid Comput. 13, 4 (2015), 457--493. Google ScholarDigital Library
- Jon Loeliger. 2006. Collaborating with GIT. Linux Mag. June (2006).Google Scholar
- Dionysios Logothetis, Christopher Olston, Benjamin Reed, Kevin C. Webb, and Ken Yocum. 2010. Stateful bulk processing for incremental analytics. In Proceedings of the 1st ACM symposium on Cloud computing. ACM, 51--62. Google ScholarDigital Library
- James Loope. 2011. Managing Infrastructure with Puppet. O’Reilly Media, Inc. Google ScholarDigital Library
- Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. 2006. Scientific workflow management and the Kepler system. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1039--1065. Google ScholarDigital Library
- Bertram Ludascher, Ilkay Altintas, and Amarnath Gupta. 2003. Compiling abstract scientific workflows into web service workflows. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management, 2003. IEEE, 251--254. Google ScholarDigital Library
- Cory Lueninghoener. 2011. Getting started with configuration management. (2011).Google Scholar
- Ben Marwick. 2016. Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. J. Archaeol. Meth. Theor. (2016), 1--27.Google Scholar
- Anthony Mayer, Steve McGough, Nathalie Furmento, William Lee, Steven Newhouse, and John Darlington. 2003. ICENI dataflow and workflow: Composition and scheduling in space and time. In UK e-Science All Hands Meeting, Vol. 634. 627.Google Scholar
- Robert Mecklenburg. 2004. Managing Projects with GNU Make. O’Reilly Media, Inc.Google Scholar
- Haiyan Meng and Douglas Thain. 2015. Umbrella: A portable environment creator for reproducible computing on clusters, clouds, and grids. In Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC’15). ACM, New York, NY. Google ScholarDigital Library
- Dirk Merkel. 2014. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 239 (2014), 2. Google ScholarDigital Library
- Ralph C. Merkle. 1982. Method of providing digital signatures. (Jan. 5 1982). US Patent 4,309,569. File date: Sep. 5, 1979.Google Scholar
- Jill P. Mesirov. 2010. Accessible reproducible research. Science 327, 5964 (2010), 415--416.Google Scholar
- Steffen Meyer, Patrick Healy, Theo Lynn, and Jim Morrison. 2013. Quality assurance for open source software configuration management. In Proceedings of the 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC’13), IEEE, 454--461. Google ScholarDigital Library
- Roger E. Millsap and Howard T. Everson. 1993. Methodology review: Statistical approaches for assessing measurement bias. Appl. Psychol. Meas. 17, 4 (1993), 297--334.Google ScholarCross Ref
- Gyöngyvér Molnár and Benő Csapó. 2017. Exploration and learning strategies in an interactive problem-solving environment at the beginning of higher education studies. (2017).Google Scholar
- Kevin Murrell. 2013. The Harwell Dekatron computer. In Making the History of Computing Relevant. Springer, 309--313.Google Scholar
- James Myers, Margaret Hedstrom, Dharma Akmon, Sandy Payette, Beth A. Plale, Inna Kouper, Scott McCaulay, Robert McDonald, Isuru Suriarachchi, Aravindh Varadharaju, et al. 2015. Towards sustainable curation and preservation: The SEAD project’s data services approach. In Proceedings of the 2015 IEEE 11th International Conference on e-Science (e-Science’15), IEEE, 485--494. Google ScholarDigital Library
- Chris J. Oates, Jim Q. Smith, and Sach Mukherjee. 2016. Estimating causal structure using conditional DAG models. J. Mach. Learn. Res. 17, 54 (2016), 1--23. Google ScholarDigital Library
- William L. Oberkampf and Christopher J. Roy. 2010. Verification and Validation in Scientific Computing. Cambridge University Press. Google ScholarDigital Library
- Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew R. Pocock, Anil Wipat, et al. 2004. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 17 (2004), 3045--3054. Google ScholarDigital Library
- Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, et al. 2006. Taverna: Lessons in creating a workflow environment for the life sciences. Concurr. Comput.: Pract. Exp. 18, 10 (2006), 1067--1100. Google ScholarDigital Library
- Sudhir Pandey. 2012. Investigating community, reliability and usability of CFEngine, Chef and Puppet. Master thesis. University of Oslo's Department of Informatics.Google Scholar
- Roger Peng. 2015. The reproducibility crisis in science: A statistical counterattack. Significance 12, 3 (2015), 30--32.Google ScholarCross Ref
- Roger D. Peng. 2011. Reproducible research in computational science. Science 334, 6060 (2011), 1226--1227.Google ScholarCross Ref
- Quan Pham, Tanu Malik, and Ian Foster. 2013. Using provenance for repeatability. In Presented as Part of the 5th USENIX Workshop on the Theory and Practice of Provenance. 5--8. Google ScholarDigital Library
- Karl Popper. 2005. The Logic of Scientific Discovery. Routledge.Google Scholar
- Florian Prinz, Thomas Schlange, and Khusru Asadullah. 2011. Believe it or not: How much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 9 (2011), 712--712.Google ScholarCross Ref
- Todd Proebsting, Alex M. Warren, and Christian Collberg. 2015. Repeatability and benefaction in computer systems research. University of Arizona TR 14. Vol. 4. 1--68.Google Scholar
- Min Ragan-Kelley, F. Perez, B. Granger, T. Kluyver, P. Ivanov, J. Frederic, and M. Bussonier. 2014. The jupyter/IPython architecture: A unified view of computational research, from interactive exploration to communication and publication. In AGU Fall Meeting Abstracts, Vol. 1, 07.Google Scholar
- Arcot Rajasekar, Reagan Moore, Chien-Yi Hou, Christopher A. Lee, Richard Marciano, Antoine de Torcy, Michael Wan, Wayne Schroeder, Sheau-Yen Chen, Lucas Gilbert, et al. 2010. iRODS primer: Integrated rule-oriented data system. Synth. Lect. Inform. Concepts, Retr. Serv. 2, 1 (2010), 1--143. Google ScholarDigital Library
- Joyce M. Ray. 2014. Research Data Management: Practical Strategies for Information Professionals. Purdue University Press. Google ScholarDigital Library
- John R. Rice. 2000. Future challenges for scientific simulation. In Enabling Technologies for Computational Science. Springer, 7--17.Google ScholarCross Ref
- Cristian Ruiz, Olivier Richard, and Joseph Emeras. 2014. Reproducible software appliances for experimentation. In Testbeds and Research Infrastructure: Development of Networks and Communities. Springer, 33--42.Google Scholar
- James Rumbaugh, Ivar Jacobson, and Grady Booch. 2004. The Unified Modeling Language Reference Manual. Pearson Higher Education. Google ScholarDigital Library
- Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton, and Jacob Ofir. 1999. Deciding when to forget in the elephant file system. In ACM SIGOPS Oper. Syst. Rev. 33. ACM, 110--123. Google ScholarDigital Library
- Matthias Schwab, Martin Karrenbach, and Jon Claerbout. 2000. Making scientific computations reproducible. Comput. Sci. Eng. 2, 6 (2000), 61--67. Google ScholarDigital Library
- Barbara Sierman. 2014. The SCAPE policy framework, maturity levels and the need for realistic preservation policies. IPRES 2014 Proceedings 259.Google Scholar
- Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. 2005. A survey of data provenance in e-science. ACM Sigmod Rec. 34, 3 (2005), 31--36. Google ScholarDigital Library
- Munindar P. Singh and Mladen A. Vouk. 1996. Scientific workflows: Scientific computing meets transactional workflows. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions. 28--34.Google Scholar
- Luka Stanisic, Arnaud Legrand, and Vincent Danjean. 2015. An effective git and org-mode based workflow for reproducible research. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 61--70. Google ScholarDigital Library
- Victoria Stodden. 2011. Trust your science? Open your data and code. Amstat News (2011), 21--22.Google Scholar
- Victoria Stodden, Jonathan Borwein, and David H. Bailey. 2013. Setting the default to reproducible. Comput. Sci. Res. SIAM News 46 (2013), 4--6.Google Scholar
- Victoria Stodden, Friedrich Leisch, and Roger D. Peng. 2014. Implementing Reproducible Research. CRC Press.Google Scholar
- Victoria Stodden and Sheila Miguez. 2013. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Available at SSRN 2322276 (2013).Google Scholar
- Sam Sun, Larry Lannom, and Brian Boesch. 2003. Handle System Overview. Technical Report. The Internet Society. Google Scholar
- Martin Szomszor and Luc Moreau. 2003. Recording and reasoning over data provenance in web and grid services. In On the Move to Meaningful Internet Syst. 2003: CoopIS, DOA, and ODBASE. Springer, 603--620.Google Scholar
- Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison. 2007. The triana workflow environment: Architecture and applications. In Workflows for e-Science. Springer, 320--339.Google Scholar
- Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2014. Workflows for e-Science: Scientific Workflows for Grids. Springer. Google ScholarDigital Library
- Mischa Taylor and Seth Vargo. 2014. Learning Chef: A Guide to Configuration Management and Automation. O’Reilly Media, Inc. Google ScholarDigital Library
- Takayuki Teramoto, Tadashi Okada, and Shigeo Kawata. 2007. A distributed education-support PSE system. In IEEE International Conference on e-Science and Grid Computing. IEEE, 516--520. Google ScholarDigital Library
- Travis CI. 2017. Travis CI—Test and Deploy Your Code with Confidence. Retrieved August 2, 2017, https://travis-ci.org/.Google Scholar
- Chris Tucker, David Shuffelton, Ranjit Jhala, and Sorin Lerner. 2007. Opium: Optimal package install/uninstall manager. In Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society, 178--188. Google ScholarDigital Library
- Matthew J. Turk. 2013. Scaling a code in the human dimension. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. ACM, 69. Google ScholarDigital Library
- Amin Vahdat and Thomas E. Anderson. 1998. Transparent result caching. In USENIX Annual Technical Conference. Google ScholarDigital Library
- Wil Van Der Aalst and Kees Max Van Hee. 2004. Workflow Management: Models, Methods, and Systems. MIT Press. Google ScholarDigital Library
- Wil M. P. Van der Aalst. 1998. The application of petri nets to workflow management. J. Circuits, Syst. Comput. 8, 01 (1998), 21--66.Google ScholarCross Ref
- Wil M. P. Van Der Aalst and Arthur H. M. Ter Hofstede. 2005. YAWL: Yet another workflow language. Inform. Syst. 30, 4 (2005), 245--275. Google ScholarDigital Library
- Sander Van Der Burg, Merijn de Jonge, Eelco Dolstra, and Eelco Visser. 2009. Software deployment in a dynamic cloud: From device to service orientation in a hospital environment. In Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing. IEEE Computer Society, 61--66. Google ScholarDigital Library
- Mayank Varia, Benjamin Price, Nicholas Hwang, Ariel Hamlin, Jonathan Herzog, Jill Poland, Michael Reschly, Sophia Yakoubov, and Robert K. Cunningham. 2015. Automated assessment of secure search systems. ACM SIGOPS Oper. Syst. Rev. 49, 1 (2015), 22--30. Google ScholarDigital Library
- H. M. W. Verbeek, Alexander Hirnschall, and Wil M. P. van der Aalst. 2002. XRL/flower: Supporting inter-organizational workflows using XML/Petri-net technology. In International Workshop on Web Services, E-Business, and the Semantic Web. Springer, 93--108. Google ScholarDigital Library
- Gregor Von Laszewski, Mihael Hategan, and Deepti Kodeboyina. 2007. Java CoG kit workflow. In Workflows for e-Science. Springer, 340--356.Google Scholar
- Greg Wilson, Dhavide A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven HD Haddock, Kathryn D. Huff, Ian M. Mitchell, Mark D. Plumbley, et al. 2014. Best practices for scientific computing. PLoS Biol. 12, 1 (2014), e1001745.Google ScholarCross Ref
- Roundtable Participants Yale. 2010. Reproducible research. Compu. Sci. Eng. 12, 5 (2010), 8--13.Google Scholar
- Jia Yu and Rajkumar Buyya. 2004. A novel architecture for realizing grid workflow using tuple spaces. In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing. IEEE, 119--128. Google ScholarDigital Library
- Jia Yu and Rajkumar Buyya. 2005. A taxonomy of workflow management systems for grid computing. J. Grid Computi. 3, 3--4 (2005), 171--200.Google ScholarCross Ref
- Xiang Zhao, Emery R. Boose, Yuriy Brun, Barbara Staudt Lerner, and Leon J. Osterweil. 2013. Supporting undo and redo in scientific data analysis. In TaPP. Google ScholarDigital Library
Index Terms
- Reproducibility in Scientific Computing
Recommendations
KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and ReplicabilityDistributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to ...
Distributed Computing Education, Part 4: Training Infrastructure
Teaching distributed computing requires environments that provide adequate training infrastructure, or t-infrastructure. In practice, t-infrastructure includes the computing equipment, digital communications, software, data, and support staff necessary ...
Network Analysis of Scientific Workflows: A Gateway to Reuse
Online workflow repositories let scientists share successful experimental routines and compose new workflows from best practices and existing service components. The authors share the results of a social- network analysis of the myExperiment workflow ...
Comments