Skip to main content
Log in

Mapping Abstract Complex Workflows onto Grid Environments

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

In this paper we address the problem of automatically generating job workflows for the Grid. These workflows describe the execution of a complex application built from individual application components. In our work we have developed two workflow generators: the first (the Concrete Workflow Generator CWG) maps an abstract workflow defined in terms of application-level components to the set of available Grid resources. The second generator (Abstract and Concrete Workflow Generator, ACWG) takes a wider perspective and not only performs the abstract to concrete mapping but also enables the construction of the abstract workflow based on the available components. This system operates in the application domain and chooses application components based on the application metadata attributes. We describe our current ACWG based on AI planning technologies and outline how these technologies can play a crucial role in developing complex application workflows in Grid environments. Although our work is preliminary, CWG has already been used to map high energy physics applications onto the Grid. In one particular experiment, a set of production runs lasted 7 days and resulted in the generation of 167,500 events by 678 jobs. Additionally, ACWG was used to map gravitational physics workflows, with hundreds of nodes onto the available resources, resulting in 975 tasks, 1365 data transfers and 975 output files produced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Abramovici, W.E. Althouse et al., “LIGO: The Laser Interferometer Gravitational-Wave Observatory (in Large Scale Measurements),” Science, Vol. 256, pp. 325–333, 1992.

    Google Scholar 

  2. W. Allcock, J. Bester et al., “Secure, Efficient Data Transport and Replica Management for High-Performance Data Intensive Computing,” presented at Mass Storage Conference, 2001.

  3. J.e.L. Ambite and C.A. Knoblock, “Planning by Rewriting: Efficiently Generating High-Quality Plans,” in Proc. 14 th National Conf. on Artificial Intelligence, 1997.

  4. J. Annis, Y. Zhao et al., “Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey,” Technical Report GriPhyN-2002-05, 2002.

  5. B.C. Barish and R. Weiss, “LIGO and the Detection of Gravitational Waves,” Physics Today, Vol. 52, pp. 44, 1999.

    Google Scholar 

  6. F. Berman and R. Wolski, “Scheduling from the Perspective of the Application,” presented at High Performance Distributed Computing Conference, Syracuse, NY, 1996.

  7. J. Blythe, “Decision-Theoretic Planning,” AI Magazine, Vol. 20, 1999.

  8. J. Blythe, E. Deelman, Y. Gil, C. Kesselman, A. Agarwal and G. Mehta, “The Role of Planning in Grid Computing,” 13 th International Conference on Automated Planning & Scheduling, 2003.

  9. C. Boutlier, T. Dean and S. Hanks, “Planning under Uncertainty: Structural Assumptions and Computational Leverage,” Journal of Artificial Intelligence, Vol. 11, 1999.

  10. R. Buyya, D. Abramson et al., “Nimrod-G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid,” presented at HPC ASIA'2000, 2000.

  11. R. Buyya, D. Abramson et al., “An Economy Driven Resource Management Architecture for Global Computational Power Grids,” presented at The 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000), Las Vegas, USA, 2000.

  12. H. Casanova, A. Legrand et al., “Heuristics for Scheduling Parameter Sweep Applications in Grid Environments,” presented at 9 th Heterogeneous Computing Workshop (HCW'C 2000), Cancun, Mexico, 2000.

  13. A. Chervenak, E. Deelman et al., “Giggle: A Framework for Constructing Sclable Replica Location Services,” in Proceedings of Supercomputing 2002 (SC2002), 2002.

  14. K. Czajkowski, S. Fitzgerald et al., “Grid Information Services for Distributed Resource Sharing,” presented at 10 th IEEE International Symposium on High Performance Distributed Computing, 2001.

  15. K. Czajkowski, I. Foster et al., “A Resource Management Architecture for Metacomputing Systems,” in 4 th Workshop on Job Scheduling Strategies for Parallel Processing, Springer-Verlag, 1998, pp. 62–82.

  16. E. Deelman, K. Blackburn et al., “GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists,” presented at 11 th Intl. Symposium on High Performance Distributed Computing, 2002.

  17. E. Deelman, J. Blythe et al., “Pegasus: Planning for Execution in Grids,” Technical Report GRIPHYN 2002-20, 2002.

  18. E. Deelman, C. Kesselman et al., “Transformation Catalog Design for GriPhyN,” Technical Report GriPhyN-2001-17, 2001.

  19. E. Deelman, I. Foster et al., “Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency,” Technical Report GriPhyN-2001-14, 2001.

  20. I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.

  21. I. Foster, C. Kesselman et al., “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of High Performance Computing Applications, Vol. 15, pp. 200–222, 2001.

  22. I. Foster, J. Voeckler et al., “Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation,” presented at Scientific and Statistical Database Management, 2002.

  23. I. Foster, J. Voeckler et al., “Chimera: A Virtual Data system for Representing, Querying, and Automating data Derivation,” presented at 14 th International Conference on Scientific and Statistical Database Management (SSDBM 2002), Edinburgh, 2002.

  24. I. Foster, C. Kesselman et al., “Grid Services for Distributed System Integration,” Computer, Vol. 35, 2002.

  25. I. Foster, C. Kesselman et al., “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” 22 June 2002.

  26. J. Frey, T. Tannenbaum et al., “Condor-G: A Computation Management Agent for Multi-Institutional Grids,” Cluster Computing, Vol. 5, pp. 237–246, 2002.

    Google Scholar 

  27. F. Giacomini and F. Prelz, “Definition of Architecture, Technical Plan and Evaluation Criteria for Scheduling, Resource Management, Security and Job Description,” EDG Workload Management Draft, 2001.

  28. Y. Gil and J. Blythe, “PLANET: A Shareable and Reusable Ontology for Representing Plans,” presented at AAAI Workshop on Representational Issues for Real-World Planning Systems, 2000.

  29. Globus, www.globus.org.

  30. GriPhyN, www.griphyn.org.

  31. K.J. Hammond, “Case-Based Planning: An Integrated Theory of Planning, Learning and Memory,” 1986.

  32. K. Holtman, “CMS Data Grid System Overview and Requirements,” CMS-NOTE-2001-037, 2001.

  33. V. Lefebure and J. Andreeva, “RefDB,” CMS IN 2002/044, 2002.

  34. D. Long and M. Fox, “Recognizing and Exploiting Generic Types in Planning Domains,” presented at 5 th International Conference on Artificial Intelligence Planning and Scheduling, Breckenridge, CO, 2000.

  35. K. Myers, S. Smith et al., “Integrating Planning and Scheduling through Adaptation of Resource Intensity Estimates,” in Proceedings of the 6 th European Conference on Planning (ECP-01), 2001.

  36. NPACI, “Telescience,” https://gridport.npaci.edu/Telescience/.

  37. K. Ranganathan and I. Foster, “Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid,” presented at International Conference on Computing in High Energy and Nuclear Physics, 2001.

  38. K. Ranganathan and I. Foster, “Identifying Dynamic Replication Strategies for a High Performance Data Grid,” presented at International Workshop on Grid Computing, 2001.

  39. K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications,” presented at International Symposium for High Performance Distributed Computing (HPDC-11), Edinburgh, 2002.

  40. M. Ruda et al., “Integrating GRID Tools to Build a Computing Resource Broker: Activities of DataGrid WP1,” presented at CHEP 2001, Beijing, 2001.

  41. S. C. E. C. s. C. M. “Environment,” http://www.scec.org/cme/.

  42. S.F. Smith and M. Becker, “An Ontology for Constructing Scheduling Systems,” presented at AAAI Spring Symposium on Ontological Engineering, Stanford University, 1997.

  43. S.F. Smith and O. Lassila, “Toward the Development of Mixed-Initiative Scheduling Systems,” in Proceedings ARPARome Laboratory Planning Initiative Workshop, Tucson, AZ, 1994.

  44. M. Veloso, J. Carbonell et al., “Integrating Planning and Learning: The PRODIGY Architecture,” Journal of Experimental and Theoretical AI, Vol. 7, pp. 81–120, 1995.

    Google Scholar 

  45. M.M. Veloso, Planning and Learning by Analogical Reasoning. Springer Verlag, December 1994.

  46. R. Wolski, “Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service,” in Proc. 6 th IEEE Symp. on High Performance Distributed Computing, Portland, Oregon, 1997.

  47. C.-E. Wulz, “CMS – Concept and Physics Potential,” presented at 2 nd Latin American Symposium on High Energy Physics (II-SILAFAE), San Juan, Puerto Rico, 1998.

  48. Q. Yang, Intelligent Planning. Springer Verlag, 1997.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deelman, E., Blythe, J., Gil, Y. et al. Mapping Abstract Complex Workflows onto Grid Environments. Journal of Grid Computing 1, 25–39 (2003). https://doi.org/10.1023/A:1024000426962

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024000426962

Navigation