research-article

Detecting common scientific workflow fragments using templates and execution provenance

Authors:
Daniel Garijo

Universidad Politécnica de Madrid, Madrid, Spain

Universidad Politécnica de Madrid, Madrid, Spain
View Profile

,
Oscar Corcho

Universidad Politécnica de Madrid, Madrid, Spain

Universidad Politécnica de Madrid, Madrid, Spain
View Profile

,
Yolanda Gil

University of Southern California, Los Angeles, USA

University of Southern California, Los Angeles, USA
View Profile

K-CAP '13: Proceedings of the seventh international conference on Knowledge captureJune 2013Pages 33–40https://doi.org/10.1145/2479832.2479848

Published:23 June 2013Publication History

K-CAP '13: Proceedings of the seventh international conference on Knowledge capture

Pages 33–40

ABSTRACT

Provenance plays a major role when understanding and reusing the methods applied in a scientific experiment, as it provides a record of inputs, the processes carried out and the use and generation of intermediate and final results. In the specific case of in-silico scientific experiments, a large variety of scientific workflow systems (e.g., Wings, Taverna, Galaxy, Vistrails) have been created to support scientists. All of these systems produce some sort of provenance about the executions of the workflows that encode scientific experiments. However, provenance is normally recorded at a very low level of detail, which complicates the understanding of what happened during execution. In this paper we propose an approach to automatically obtain abstractions from low-level provenance data by finding common workflow fragments on workflow execution provenance and relating them to templates. We have tested our approach with a dataset of workflows published by the Wings workflow system. Our results show that by using these kinds of abstractions we can highlight the most common abstract methods used in the executions of a repository, relating different runs and workflow templates with each other.

References

R. Bergmann and Y. Gil. Similarity assessment and efficient retrieval of semantic workflows. To appear in the Information Systems Journal, 2012.Google Scholar
C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. International Journal on Semantic Web and Information Systems, 5(3):1--22, 2009.Google ScholarCross Ref
S. C. Boulakia, C. Froidevaux, and J. Chen. Scientific workflow rewriting while preserving provenance. In 8th IEEE International Conference on eScience 2012, pages 1--9, Chicago, 2012. IEEE Computer Society Press, USA. Google ScholarDigital Library
M. H. Burstein, R. Laddaga, D. D. McDonald, M. T. Cox, B. Benyo, P. Robertson, T. S. Hussain, M. Brinn, and D. V. McDermott. Poirot - integrated learning of web service procedures. In AAAI, pages 1274--1279, 2008. Google ScholarDigital Library
S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. Vistrails: Visualization meets data management. In ACM SIGMOD, pages 745--747. ACM Press, 2006. Google ScholarDigital Library
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994. Google ScholarDigital Library
S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, STOC '71, pages 151--158, New York, NY, USA, 1971. ACM. Google ScholarDigital Library
D. Garijo, P. Alper, K. Belhajjame, O. Corcho, Y. Gil, and C. Goble. Common motifs in scientific workflows: An empirical analysis. In 8th IEEE International Conference on eScience 2012, Chicago, 2012. IEEE Computer Society Press, USA. Google ScholarDigital Library
D. Garijo and Y. Gil. A new approach for publishing workflows: Abstractions, standards, and linked data. In Proceedings of the 6th Workshop on Workflows in support of large-scale science, pages 47--56, Seattle, 2011. ACM. Google ScholarDigital Library
B. Giardine et al. Galaxy: A platform for interactive large-scale genome analysis. Genome Research, 15(10):1451--1455, Oct 2005.Google ScholarCross Ref
Y. Gil, V. Ratnakar, J. Kim, P. A. Gonzälez-Calero, P. T. Groth, J. Moody, and E. Deelman. Wings: Intelligent workflow-based design of computational experiments. IEEE Intelligent Systems, 26(1):62--72, 2011. Google ScholarDigital Library
A. Goderis, P. Li, and C. A. Goble. Workflow discovery: the problem, a case study from e-science and a graph-based solution. In ICWS, pages 312--319, 2006. Google ScholarDigital Library
A. Goderis, U. Sattler, P. W. Lord, and C. A. Goble. Seven bottlenecks to workflow reuse and repurposing. In International Semantic Web Conference, pages 323--337. Springer, 2005. Google ScholarDigital Library
J. M. Gomez-Perez and O. Corcho. Problem-solving methods for understanding process executions. Computing in Science and Engineering, 10(3):47--52, May 2008. Google ScholarDigital Library
M. Hauder, Y. Gil, and Y. Liu. A framework for efficient data analytics through automatic configuration and customization of scientific workflows. In Proceedings of the 2011 IEEE Seventh International Conference on eScience, ESCIENCE'11, pages 379--386, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the SUBDUE System. AAAI Workshop on Knowledge Discovery, pages 169--180, 1994.Google Scholar
D. Leake and J. Kendall-Morwick. Towards case-based support for e-science workflow generation by mining provenance. In Proceedings of the 9th European conference on Advances in Case-Based Reasoning, ECCBR '08, pages 269--283, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10):1039--1065, 2006. Google ScholarDigital Library
P. Mates, E. Santos, J. Freire, and C. T. Silva. Crowdlabs: Social analysis and visualization for the sciences. In 23rd International Conference on Scientific and Statistical Database Management (SSDBM), pages 555--564. Springer, 2011. Google ScholarDigital Library
P. Missier, S. Soiland-Reyes, S. Owen, W. Tan, A. Nenadic, I. Dunlop, A. Williams, T. Oinn, and C. Goble. Taverna, reloaded. In 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010. Google ScholarDigital Library
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. Van den Bussche. The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, July 2010. Google ScholarDigital Library
A. G. Perez and R. Benjamins. Applications of ontologies and problem-solving methods. AI Magazine, 20(1), 1999.Google Scholar
M. Reich, T. Liefeld, J. Gould, J. Lerner, P. Tamayo, and J. P. Mesirov. Genepattern 2.0. Nature Genetics, 38:500--501, 2006.Google ScholarCross Ref
D. D. Roure, C. A. Goble, and R. Stevens. The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Comp. Syst., 25(5):561--567, 2009. Google ScholarDigital Library
W. M. P. van der Aalst, A. H. M. ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow patterns. Distributed and Parallel Databases, 14(1):5--51, 2003. Google ScholarDigital Library
F. Yaman, T. Oates, and M. Burstein. A context driven approach for workflow mining. In Proceedings of the 21st international jont conference on Artifical intelligence, IJCAI'09, pages 1798--1803, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library

Index Terms

Detecting common scientific workflow fragments using templates and execution provenance
1. Computing methodologies
  1. Machine learning

Recommendations

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of ...
Read More
Using Explicit Control Processes in Distributed Workflows to Gather Provenance
Provenance and Annotation of Data and Processes

Distributing workflow tasks among high performance environments involves local processing and remote execution on clusters and grids. This dis-tribution often needs interoperation between heterogeneous workflow definition languages and their ...
Read More
A Survey of Data-Intensive Scientific Workflow Management

Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
K-CAP '13: Proceedings of the seventh international conference on Knowledge capture
June 2013
160 pages
ISBN:9781450321020
DOI:10.1145/2479832
General Chair:
Richard Benjamins
Telefonica Digital
,
Program Chairs:
Mathieu d'Aquin
KMi, The Open University
,
Andrew Gordon
University of Southern California
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abstraction
provenance
scientific workflow
wings
Qualifiers
- research-article
Conference

Acceptance Rates
K-CAP '13 Paper Acceptance Rate13of60submissions,22%Overall Acceptance Rate55of198submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 198
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detecting common scientific workflow fragments using templates and execution provenance

K-CAP '13: Proceedings of the seventh international conference on Knowledge capture

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Using Explicit Control Processes in Distributed Workflows to Gather Provenance

A Survey of Data-Intensive Scientific Workflow Management

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Detecting common scientific workflow fragments using templates and execution provenance

K-CAP '13: Proceedings of the seventh international conference on Knowledge capture

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Using Explicit Control Processes in Distributed Workflows to Gather Provenance

A Survey of Data-Intensive Scientific Workflow Management

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media