poster

Adapting MapReduce for HPC environments

Authors:
Zacharia Fadika

Binghamton University, Binghamton, USA

Binghamton University, Binghamton, USA
View Profile

,
Elif Dede

Binghamton University, Binghamton, USA

Binghamton University, Binghamton, USA
View Profile

,
Madhusudhan Govindaraju

Binghamton University, Binghamton, USA

Binghamton University, Binghamton, USA
View Profile

,
Lavanya Ramakrishnan

Lawrence Berkeley National Laboratory, Berkeley, USA

Lawrence Berkeley National Laboratory, Berkeley, USA
View Profile

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computingJune 2011Pages 263–264https://doi.org/10.1145/1996130.1996166

Published:08 June 2011Publication History

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

Pages 263–264

ABSTRACT

MapReduce is increasingly gaining popularity as a programming model for use in large-scale distributed processing. The model is most widely used when implemented using the Hadoop Distributed File System (HDFS). The use of the HDFS, however, precludes the direct applicability of the model to HPC environments, which use high performance distributed file systems. In such distributed environments, the MapReduce model can rarely make use of full resources, as local disks may not be available for data placement on all the nodes. This work proposes a MapReduce implementation and design choices directly suitable for such HPC environments.

References

Apache Hadoop. http://hadoop.apache.org.Google Scholar
Fermilab Computing Division, FermiGrid. http://fermigrid.fnal.gov/.Google Scholar
Microsoft Research. http://www.microsoft.com/windowsazure/.Google Scholar
National Energy Research Scientific Computing Center. http://www.nersc.gov.Google Scholar
Open Science Grid. http://www.opensciencegrid.org.Google Scholar
TeraGrid Information Services. http://info.teragrid.org/.Google Scholar
Amazon. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2.Google Scholar
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pages 810--818, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
L. Heshan, A. Ma, and M. Feng. Moon: Mapreduce on opportunistic environments. In HPDC '10: the ACM International Symposium on High Performance Distributed Computing. ACM, 2010. Google ScholarDigital Library
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, 1994.Google Scholar
R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and implementation or the sun network filesystem, 1985.Google Scholar
F. Schmuck and R. Haskin. Gpfs: A shared-disk file system for large computing clusters. In In Proceedings of the 2002 Conference on File and Storage Technologies (FAST, pages 231--244, 2002. Google ScholarDigital Library
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10, May 2010. Google ScholarDigital Library
S. R. Soltis, G. M. Erickson, K. W. Preslan, M. T. O'keefe, and T. M. Ruwart. The global file system: A file system for shared disk storage, 1997.Google Scholar

Index Terms

Adapting MapReduce for HPC environments
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed programming languages
  2. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Distributed programming languages
        Parallel programming languages

Recommendations

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies

Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of ...
Read More
Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique
VisionNet'16: Proceedings of the Third International Symposium on Computer Vision and the Internet

The data generated by today's enterprises has been increasing at exponential rates in size from most recent couple of years. Also, the need to process and break down the substantial volumes of data has likewise expanded. In order to handle this enormous ...
Read More
MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing
June 2011
296 pages
ISBN:9781450305525
DOI:10.1145/1996130
General Chair:
Arthur "Barney" Maccabe
Oak Ridge National Lab, USA
,
Program Chair:
Douglas Thain
University of Notre Dame, USA
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hdfs
hpc
mapreduce
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate166of966submissions,17%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 528
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adapting MapReduce for HPC environments

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique

MapReduce: Review and open challenges

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adapting MapReduce for HPC environments

HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop

Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique

MapReduce: Review and open challenges

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media