ABSTRACT
Observational astrophysics has recently become a data-intensive science after many decades of relative data poverty. As a result, many of the algorithms developed for processing astronomical data, although well established for low-volume data capture, do not scale well to today's high-volume sky surveys and transient searches. Specifically, problems may occur with data transfer, workflow management, efficient parallelization, and integration of legacy code. Observational astrophysics workflows present computational challenges unique in high performance computing, including 24/7 operations, time-critical processing, and very large numbers of relatively small data files which must all be processed and archived. We present a case study based on Sunfall, a distributed, parallel scientific workflow system we built for the Nearby Supernova Factory, the largest data-volume supernova search currently in existence. We describe innovative techniques for data transfer and workflow management, and discuss lessons learned in building a large-scale observational astrophysics workflow management system.
- Aldering, G., et al. Overview of the Nearby Supernova Factory. Proceedings of the SPIE, 2002, 61--72.Google ScholarCross Ref
- Aragon, C. and Aragon, D. B. A Fast Contour Descriptor Algorithm for Supernova Image Classification. SPIE Symposium on Electronic Imaging: Real-Time Image Processing, San Jose, CA, 2007.Google ScholarCross Ref
- Aragon, C., Bailey, S., Poon, S., Runge, K. and Thomas, R. C. Sunfall: A Collaborative Visual Analytics System for Astrophysics. SciDAC, Seattle, WA, 2008.Google ScholarCross Ref
- Aragon, C., Poon, S., Aldering, G., Thomas, R. C. and Quimby, R. Using Visual Analytics to Maintain Situational Awareness in Astrophysics. IEEE Symposium on Visual Analytics Science and Technology (VAST), Columbus, OH, 2008.Google Scholar
- Astier, P. SuperNova Legacy Survey (SNLS). A&A (447), 2006, 31--48.Google Scholar
- Bailey, S., Aragon, C., Romano, R., Thomas, R. C., Weaver, B. A. and Wong, D. How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging. Astrophysical Journal, 2007.Google Scholar
- Cao, J., Jarvis, S. A., Saini, S. and Nudd, G. R. GridFlow: Workflow Management for Grid Computing. Third IEEE International Symposium on Cluster Computing and the Grid (CCGrid'03), 2003. Google ScholarDigital Library
- Foster, I. and Kesselman, C. The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco, CA, 2003. Google ScholarDigital Library
- GPFS, IBM General Parallel File System, 2006, http://www03.ibm.com/systems/clusters/software/gpfs.html.Google Scholar
- HPSS, NERSC High Performance Storage System, 2007, http://www.nersc.gov/nusers/systems/HPSS/.Google Scholar
- HPWREN, High Performance Wireless Research and Education Network, 2004, http://hpwren.ucsd.edu.Google Scholar
- LSST, Large Synoptic Survey Telescope, 2008, http://lsst.org.Google Scholar
- Ludaescher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J. and Zhao, Y. Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience (Special Issue on Scientific Workflows), 2005. Google ScholarDigital Library
- NASA, The Joint Dark Energy Mission, 2008, http://universe.nasa.gov/program/probes/jdem.html.Google Scholar
- NEAT, Near Earth Asteroid Tracking, 2007, http://neat.jpl.nasa.gov.Google Scholar
- NERSC, National Energy Research Scientific Computing Center, 2008, http://www.nersc.gov.Google Scholar
- PanSTARRS, Pan-STARRS: Panoramic Survey Telescope and Rapid Response System, 2008, http://panstarrs.ifa.hawaii.edu/.Google Scholar
- PDSF, NERSC Parallel Distributed Systems Facility, 2008, http://www.nersc.gov/nusers/systems/PDSF/.Google Scholar
- Perlmutter, S., Aldering, G., Goldhaber, G., et al. Measurements of Omega and Lambda from 42 High-Redshift Supernovae. Astrophysical Journal, 1999 (517), 1999, 565--586.Google ScholarCross Ref
- Ptolemy, The Ptolemy II software framework, 2004, http://ptolemy.eecs.berkeley.edu/ptolemyII.Google Scholar
- Riess, A. G., Filippenko, A. V., et al. Observational Evidence from Supernovae for an Accelerating Universe and a Cosmological Constant. Astrophysical Journal, 1998 (116), 1998, 1009--1038.Google Scholar
- Romano, R., Aragon, C. and Ding, C. Supernova Recognition Using Support Vector Machines. Proceedings of the 5th International Conference of Machine Learning Applications, Orlando, FL, 2006. Google ScholarDigital Library
- Sako, M. The Sloan Digital Sky Survey-II Supernova Survey: Search Algorithm and Follow-Up Observations. Astronomical Journal, 135, 2008, 348--373.Google Scholar
- Scheidegger, C., Koop, D., Freire, J. and Silva, C. Querying and Re-Using Workflows with VisTrails. ACM SIGMOD International Conference on Management of Data, 2008. Google ScholarDigital Library
- SDSS, Sloan Digital Sky Survey, 2008, http://www.sdss.org.Google Scholar
- Silva, C., Freire, J. and Callahan, S. Provenance for Visualizations: Reproducibility and Beyond. IEEE Computing in Science & Engineering, 9 (5), 2007, 82--89. Google ScholarDigital Library
- SNfactory, The Nearby Supernova Factory, 2008, http://snfactory.lbl.gov.Google Scholar
- SNLS, SuperNova Legacy Survey, 2008, http://www.cfht.hawaii.edu/SNLS/.Google Scholar
- UH88, University of Hawaii 2.2-meter telescope, 2004, http://www.ifa.hawaii.edu/88inch/.Google Scholar
- Wood-Vasey, W. M. Rates and Progenitors of Type Ia Supernovae Physics, University of California, Berkeley, 2004.Google Scholar
Index Terms
- Workflow management for high volume supernova search
Recommendations
A high throughput workflow environment for cosmological simulations
XSEDE '12: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyondThe next generation of wide-area sky surveys offer the power to place extremely precise constraints on cosmological parameters and to test the source of cosmic acceleration. These observational programs will employ multiple techniques based on a variety ...
Enabling dark energy survey science analysis with simulations on XSEDE resources
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to DiscoveryUpcoming wide-area sky surveys offer the power to test the source of cosmic acceleration by placing extremely precise constraints on existing cosmological model parameters. These observational surveys will employ multiple tests based on statistical ...
Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows
Scientists have many computing infrastructures available to conduct their research, including grids and public or private clouds. This article explores the use of these cyberinfrastructures to execute scientific workflows, an important class of ...
Comments