skip to main content
10.1145/2479832.2479848acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

Detecting common scientific workflow fragments using templates and execution provenance

Authors Info & Claims
Published:23 June 2013Publication History

ABSTRACT

Provenance plays a major role when understanding and reusing the methods applied in a scientific experiment, as it provides a record of inputs, the processes carried out and the use and generation of intermediate and final results. In the specific case of in-silico scientific experiments, a large variety of scientific workflow systems (e.g., Wings, Taverna, Galaxy, Vistrails) have been created to support scientists. All of these systems produce some sort of provenance about the executions of the workflows that encode scientific experiments. However, provenance is normally recorded at a very low level of detail, which complicates the understanding of what happened during execution. In this paper we propose an approach to automatically obtain abstractions from low-level provenance data by finding common workflow fragments on workflow execution provenance and relating them to templates. We have tested our approach with a dataset of workflows published by the Wings workflow system. Our results show that by using these kinds of abstractions we can highlight the most common abstract methods used in the executions of a repository, relating different runs and workflow templates with each other.

References

  1. R. Bergmann and Y. Gil. Similarity assessment and efficient retrieval of semantic workflows. To appear in the Information Systems Journal, 2012.Google ScholarGoogle Scholar
  2. C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. International Journal on Semantic Web and Information Systems, 5(3):1--22, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. C. Boulakia, C. Froidevaux, and J. Chen. Scientific workflow rewriting while preserving provenance. In 8th IEEE International Conference on eScience 2012, pages 1--9, Chicago, 2012. IEEE Computer Society Press, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. H. Burstein, R. Laddaga, D. D. McDonald, M. T. Cox, B. Benyo, P. Robertson, T. S. Hussain, M. Brinn, and D. V. McDermott. Poirot - integrated learning of web service procedures. In AAAI, pages 1274--1279, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. Vistrails: Visualization meets data management. In ACM SIGMOD, pages 745--747. ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, STOC '71, pages 151--158, New York, NY, USA, 1971. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Garijo, P. Alper, K. Belhajjame, O. Corcho, Y. Gil, and C. Goble. Common motifs in scientific workflows: An empirical analysis. In 8th IEEE International Conference on eScience 2012, Chicago, 2012. IEEE Computer Society Press, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Garijo and Y. Gil. A new approach for publishing workflows: Abstractions, standards, and linked data. In Proceedings of the 6th Workshop on Workflows in support of large-scale science, pages 47--56, Seattle, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Giardine et al. Galaxy: A platform for interactive large-scale genome analysis. Genome Research, 15(10):1451--1455, Oct 2005.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Gil, V. Ratnakar, J. Kim, P. A. Gonzälez-Calero, P. T. Groth, J. Moody, and E. Deelman. Wings: Intelligent workflow-based design of computational experiments. IEEE Intelligent Systems, 26(1):62--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Goderis, P. Li, and C. A. Goble. Workflow discovery: the problem, a case study from e-science and a graph-based solution. In ICWS, pages 312--319, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Goderis, U. Sattler, P. W. Lord, and C. A. Goble. Seven bottlenecks to workflow reuse and repurposing. In International Semantic Web Conference, pages 323--337. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. M. Gomez-Perez and O. Corcho. Problem-solving methods for understanding process executions. Computing in Science and Engineering, 10(3):47--52, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hauder, Y. Gil, and Y. Liu. A framework for efficient data analytics through automatic configuration and customization of scientific workflows. In Proceedings of the 2011 IEEE Seventh International Conference on eScience, ESCIENCE'11, pages 379--386, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the SUBDUE System. AAAI Workshop on Knowledge Discovery, pages 169--180, 1994.Google ScholarGoogle Scholar
  17. D. Leake and J. Kendall-Morwick. Towards case-based support for e-science workflow generation by mining provenance. In Proceedings of the 9th European conference on Advances in Case-Based Reasoning, ECCBR '08, pages 269--283, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10):1039--1065, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Mates, E. Santos, J. Freire, and C. T. Silva. Crowdlabs: Social analysis and visualization for the sciences. In 23rd International Conference on Scientific and Statistical Database Management (SSDBM), pages 555--564. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Missier, S. Soiland-Reyes, S. Owen, W. Tan, A. Nenadic, I. Dunlop, A. Williams, T. Oinn, and C. Goble. Taverna, reloaded. In 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. Van den Bussche. The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. G. Perez and R. Benjamins. Applications of ontologies and problem-solving methods. AI Magazine, 20(1), 1999.Google ScholarGoogle Scholar
  23. M. Reich, T. Liefeld, J. Gould, J. Lerner, P. Tamayo, and J. P. Mesirov. Genepattern 2.0. Nature Genetics, 38:500--501, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. D. D. Roure, C. A. Goble, and R. Stevens. The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Comp. Syst., 25(5):561--567, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. M. P. van der Aalst, A. H. M. ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow patterns. Distributed and Parallel Databases, 14(1):5--51, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Yaman, T. Oates, and M. Burstein. A context driven approach for workflow mining. In Proceedings of the 21st international jont conference on Artifical intelligence, IJCAI'09, pages 1798--1803, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detecting common scientific workflow fragments using templates and execution provenance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      K-CAP '13: Proceedings of the seventh international conference on Knowledge capture
      June 2013
      160 pages
      ISBN:9781450321020
      DOI:10.1145/2479832

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      K-CAP '13 Paper Acceptance Rate13of60submissions,22%Overall Acceptance Rate55of198submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader