skip to main content
10.1145/1996130.1996166acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
poster

Adapting MapReduce for HPC environments

Published:08 June 2011Publication History

ABSTRACT

MapReduce is increasingly gaining popularity as a programming model for use in large-scale distributed processing. The model is most widely used when implemented using the Hadoop Distributed File System (HDFS). The use of the HDFS, however, precludes the direct applicability of the model to HPC environments, which use high performance distributed file systems. In such distributed environments, the MapReduce model can rarely make use of full resources, as local disks may not be available for data placement on all the nodes. This work proposes a MapReduce implementation and design choices directly suitable for such HPC environments.

References

  1. Apache Hadoop. http://hadoop.apache.org.Google ScholarGoogle Scholar
  2. Fermilab Computing Division, FermiGrid. http://fermigrid.fnal.gov/.Google ScholarGoogle Scholar
  3. Microsoft Research. http://www.microsoft.com/windowsazure/.Google ScholarGoogle Scholar
  4. National Energy Research Scientific Computing Center. http://www.nersc.gov.Google ScholarGoogle Scholar
  5. Open Science Grid. http://www.opensciencegrid.org.Google ScholarGoogle Scholar
  6. TeraGrid Information Services. http://info.teragrid.org/.Google ScholarGoogle Scholar
  7. Amazon. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2.Google ScholarGoogle Scholar
  8. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pages 810--818, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Heshan, A. Ma, and M. Feng. Moon: Mapreduce on opportunistic environments. In HPDC '10: the ACM International Symposium on High Performance Distributed Computing. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, 1994.Google ScholarGoogle Scholar
  12. R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and implementation or the sun network filesystem, 1985.Google ScholarGoogle Scholar
  13. F. Schmuck and R. Haskin. Gpfs: A shared-disk file system for large computing clusters. In In Proceedings of the 2002 Conference on File and Storage Technologies (FAST, pages 231--244, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. R. Soltis, G. M. Erickson, K. W. Preslan, M. T. O'keefe, and T. M. Ruwart. The global file system: A file system for shared disk storage, 1997.Google ScholarGoogle Scholar

Index Terms

  1. Adapting MapReduce for HPC environments

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader