skip to main content
10.1145/2286996.2287000acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Job and data clustering for aggregate use of multiple production cyberinfrastructures

Published:19 June 2012Publication History

ABSTRACT

In this paper, we address the challenges of reducing the time-to-solution of the data intensive earthquake simulation workflow "CyberShake" by supplementing the high-performance parallel computing (HPC) resources on which it typically runs with distributed, heterogeneous resources that can be obtained opportunistically from grids and clouds. We seek to minimize time to solution by maximizing the amount of work that can be efficiently done on the distributed resources. We identify data movement as the main bottleneck in effectively utilizing the combined local and distributed resources. We address this by analyzing the I/O characteristics of the application, processor acquisition rate (from a pilot-job service), and the data movement throughput of the infrastructure. With these factors in mind, we explore a combination of strategies including partitioning of computation (over HPC and distributed resources) and job clustering.

We validate our approach with a theoretical study and with preliminary measurements on the Ranger HPC system and distributed Open Science Grid resources. More complete performance results will be presented in the final submission of this paper.

References

  1. Advanced Network and Distrbuted Storage Laboratory website.Google ScholarGoogle Scholar
  2. T. G. Armstrong, Z. Zhang, D. S. Katz, M. Wilde, and I. Foster. Scheduling many-task workloads on supercomputers: Dealing with trailing tasks. In Proceedings of Many-Task Computing on Grids and Supercomputers, 2010, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. P. Avery, R. Roskies, and D. S. Katz. ExTENCI: Extending Science Through Enhanced National Cyberinfrastructure, 2010. Project homepage: https://sites.google.com/site/extenci/.Google ScholarGoogle Scholar
  4. S. Callaghan, E. Deelman, D. Gunter, G. Juve, P. Maechling, C. Brooks, K. Vahi, K. Milner, R. Graves, E. Field, D. Okaya, and T. Jordan. Scaling up workflow-based applications. Journal of Computer and System Sciences, 76(6):18, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Callaghan, P. Maechling, E. Deelman, K. Vahi, G. Mehta, G. Juve, K. Milner, R. Graves, E. Field, D. Okaya, D. Gunter, K. Beattie, and T. Jordan. Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake. In Fourth International Conference on eScience, pages 151--158, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Couvares, T. Kosar, A. Roy, J. Weber, and K. Wenger. Workflow Management in Condor, pages 357--375. Springer, 2007.Google ScholarGoogle Scholar
  7. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M.-H. Su, K. Vahi, and M. Livny. Pegasus: Mapping Scientific Workflows onto the Grid, volume 3165, pages 131--140. Springer Berlin / Heidelberg, 2004.Google ScholarGoogle Scholar
  8. E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming, 13(3):219--237, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Espinosa. Cybershake on Opportunistic Cyberinfrastructures. Master thesis, University of Chicago, Chicago, Mar. 2011.Google ScholarGoogle Scholar
  10. A. Espinosa, D. S. Katz, M. Wilde, K. Maheshwari, I. Foster, S. Callaghan, and P. Maechling. Data-intensive CyberShake computations on an opportunistic cyberinfrastructure. In Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Foster and C. Kesselman. The Globus toolkit, pages 259--278. Morgan Kaufmann Publishers Inc., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Graves, T. Jordan, S. Callaghan, E. Deelman, E. Field, G. Juve, C. Kesselman, P. Maechling, G. Mehta, K. Milner, D. Okaya, P. Small, and K. Vahi. CyberShake: A Physics-Based Seismic Hazard Model for Southern California. Pure and Applied Geophysics, Online Fir:1--15, May 2010.Google ScholarGoogle Scholar
  13. M. Hategan, J. Wozniak, and K. Maheshwari. Coasters: uniform resource provisioning and access for clouds and grids. In 4th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2011), Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. S. Katz, S. Callaghan, R. Harkness, S. Jha, K. Kurowski, S. Manos, S. Pamidighantam, M. Pierce, B. Plale, C. Song, and J. Towns. Science on the TeraGrid. Computational Methods in Science and Technology, Special Issue 2010:81--97, 2010.Google ScholarGoogle Scholar
  15. P. Maechling, E. Deelman, L. Zhao, R. Graves, G. Mehta, N. Gupta, J. Mehringer, C. Kesselman, S. Callaghan, D. Okaya, H. Francoeur, V. Gupta, Y. Cui, K. Vahi, T. Jordan, and E. Field. SCEC CyberShake Workflows -- Automating Probabilistic Seismic Hazard Analysis Calculations, pages 143--163. Springer London, London, 2007.Google ScholarGoogle Scholar
  16. J. McGee and C. Sehgal, 2011. Personal communication.Google ScholarGoogle Scholar
  17. R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, F. Würthwein, I. Foster, R. Gardner, M. Wilde, A. Blatecky, J. McGee, and R. Quick. The open science grid. Journal of Physics: Conference Series, 78:012057, July 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. I. Raicu, Y. Zhao, I. T. Foster, and A. Szalay. Accelerating large-scale data exploration through data diffusion. In Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing (DADC '08), pages 9--18. ACM Press, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Ranganathan and I. Foster. Simulation studies of computation and data scheduling algorithms for data grids. Journal of Grid Computing, 1(1):53--62, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Reed. Grids, the TeraGrid and beyond. Computer, 36(1):62--68, Jan. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Rynge, G. Juve, G. Mehta, E. Deelman, K. Larson, B. Holzman, I. Sfiligoi, F. Würthwein, G. B. Berriman, and S. Callaghan. Experiences Using GlideinWMS and the Corral Frontend Across Cyberinfrastructures. In Proceedings of the 7th IEEE International Conference on e-Science (e-Science 2011), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Sfiligoi, D. Bradley, B. Holzman, P. Mhashilkar, S. Padhi, and F. Würthwein. The Pilot Way to Grid Resources Using glideinWMS. In Computer Science and Information Engineering, 2009 WRI World Congress on, pages 428--432, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. von Laszewski, I. Foster, J. Gawor, and P. Lane. A Java Commodity Grid Kit. Concurrency and Computation: Practice and Experience, 13(8--9), 2001.Google ScholarGoogle Scholar
  24. M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, 37(9):633--652, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. M. Wozniak and M. Wilde. Case studies in storage access by loosely coupled petascale applications. In Proc. 4th Annual Workshop on Petascale Data Storage, pages 16--20, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. XSEDE Project. XSEDE web site.Google ScholarGoogle Scholar
  27. L. Zhao, P. Chen, and T. Jordan. Strain Green's tensors, reciprocity, and their applications to seismic source and structure studies. Bulletin of the Seismological Society of America, 96(5):1753--1765, 2006.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Job and data clustering for aggregate use of multiple production cyberinfrastructures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DIDC '12: Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
        June 2012
        68 pages
        ISBN:9781450313414
        DOI:10.1145/2286996

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate7of12submissions,58%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader