Flexible architecture for cluster evolution in cloud computing

doi:10.1016/j.compeleceng.2014.08.006

Computers & Electrical Engineering

Volume 42, February 2015, Pages 90-106

https://doi.org/10.1016/j.compeleceng.2014.08.006 Get rights and content

Highlights

•
FACE supports system primitives that allow application developers to develop various applications in clouds.
•
FACE allows application developers to customize data partitioning, localization, and processing procedures.
•
FACE designs its system primitives in a language-independent and platform-independent way.
•
FACE makes extensible the Master of a MapReduce system by application developers.

Abstract

MapReduce is considered the key behind the success of cloud computing because it not only makes a cluster highly scalable but also allows applications to use resources in a cluster. However, MapReduce achieves this simplicity at the expense of flexibility for data partitioning, localization, and processing procedures by handling all issues on behalf of application developers. Unfortunately, MapReduce currently has no solution capable of giving application developers flexibility in customizing data partitioning, localization, and processing procedures. To address the aforementioned flexibility constraints of MapReduce, we propose an architecture called Flexible Architecture for Cluster Evolution (FACE) which is both language-independent and platform-independent. FACE allows a MapReduce cluster to be designed to match various application requirements by customizing data partitioning, localization, and processing procedures. We compare the performance of FACE with that of a general MapReduce system and then demonstrate performance improvements with our implemented procedures.

Graphical abstract

Introduction

MapReduce [1] is a programming model proposed by Google to process a large number of datasets in a cluster [2]. MapReduce is the key behind the success of cloud computing [3] today because it not only makes a cluster highly scalable but also allows applications to use resources in a cluster easily. When working for an application in a cluster, MapReduce can make computers (also known as nodes) process well-partitioned data simultaneously without interfering with each other. MapReduce relies on its runtime system to partition input data automatically and distribute intermediate results [1] over nodes in a cluster. MapReduce hides the issues of cooperatively distributing data over nodes working for applications from application developers. All MapReduce requires from application developers is the preparation of a Map function (also known as a Mapper) and a Reduce function (also known as a Reducer) to process the application data. Technically, MapReduce runs a Mapper to process input data and produce intermediate results constructed with a series of key/value pairs while running a Reducer to merge values in intermediate results associated with the same key.

MapReduce contributes to the success of cloud computing due to its simplicity, but it does so at the expense of several other potential benefits. To achieve simplicity, MapReduce handles all parallel and distributed computing issues on behalf of application developers, but as a result, it suffers from several constraints:

•
MapReduce partitions input data into a series of fixed-size blocks (e.g., 64 MB in Google and Hadoop MapReduce implementations [1], [4]) as the working units for Mappers. However, a cloud is often composed of nodes with various hardware configurations along with different performances, and a fixed but appropriate block size is not easily determined to give all applications their optimal performances. Application developers of current implementations cannot dynamically adjust the granularity of a Map task at runtime to balance workloads among nodes.
•
MapReduce makes use of a built-in hash function to distribute intermediate results automatically over the corresponding nodes. Consequently, application developers cannot choose nodes to perform certain location-aware computations (e.g., to transfer intermediate results among intra-rack nodes to avoid overloading links between racks). This is because MapReduce automatically selects the node with a free slot (usually indicating an available quota of CPU resources) [1], [4] to execute a task. Thus, application developers cannot change the node selection policy according to their specific criteria.
•
MapReduce automatically executes a Reducer to handle intermediate results produced by a Mapper, so application developers cannot process application data outside of Mappers and Reducers. Sometimes, application developers require a post-processing procedure so that they can process outputs collected from all Reducers for certain application requirements (e.g., as inputs for the next iteration in iterative applications).

To achieve simplicity, MapReduce loses many potential benefits such as data partitioning, localization, and processing procedures because it automatically cares about most issues with the procedures without leaving application developers any room and flexibility to modify the procedures. If a MapReduce system supports application developers with flexibility in the data partitioning procedure, they can dynamically adjust the sizes of partitioned data to balance task loads at an appropriate granularity. If a MapReduce system supports application developers with the flexibility in the procedure of data localization, they can choose a certain node to run a Mapper for processing a block of input data or a Reducer for processing some intermediate results. If a MapReduce system supports application developers with flexibility in the data processing procedure, the developers can program behaviors of Mappers or Reducers like current MapReduce systems and arrange certain post-processing operations for outputs collected from all Reducers at the end of application execution, e.g., for implementing iterative applications or applying more variant computing styles to data in addition to the two-phase MapReduce computations.

In this paper, we propose a Flexible Architecture for Cluster Evolution (FACE). FACE is a flexible design architecture intended to provide application developers with system primitives that allow them to develop applications based on specific application requirements. Due to the high flexibility of the system primitives, FACE allows a MapReduce cluster to be designed for various application requirements such as load balancing, location-aware computation, special node selection policies, and customization data processing. FACE allows application developers to: submit input data in files of any size to a cloud computing environment, specify the location of intermediate results to facilitate the processing of data by local Reducers, specify which node should be responsible for running a Mapper to process input data or a Reducer to process intermediate results, and arrange a post-processing operation on outputs from all Reducers at the end of application execution. In addition to processing data with a Mapper or a Reducer, FACE allows application developers to enhance the functionality of applications with other user-defined functions (e.g., by applying certain post-processing operations to outputs collected from Reducers). Above and beyond the system primitives’ support designed to help the development of an application, FACE also provides application developers with node runtime information not only to monitor progress during application execution but also to facilitate the selection of a node to perform a specific function. To optimize performance, FACE implements most components in the C language. However, FACE does allow application developers to implement their applications using other languages because FACE executes user-defined functions and provides runtime node information with language-independent interfaces.

The rest of the paper is organized as follows. In Section 2, we briefly review MapReduce, discuss related works, and highlight the research contributions of this paper. We present the proposed FACE design in Section 3 and describe its implementation in Section 4. In Section 5, we present a performance evaluation of FACE, and Section 6 concludes the paper.

Section snippets

Background on MapReduce

MapReduce [1] is a programming model composed of three programs: a Master, a Mapper, and a Reducer, which can be distributed over nodes in a cluster to work co-operatively on an application. MapReduce usually has only one Master that runs on a node to monitor and control the progress of application execution. However, MapReduce may have many Mappers to process different parts of input data and many Reducers to process different parts of intermediate results produced by the Mappers. MapReduce

System overview

Flexible Architecture for Cluster Evolution (FACE) is an architecture that places a strong emphasis on flexibility. FACE defines a cloud with physical nodes connected through network devices or Virtual Private Networks (VPN) across the Internet. FACE allows a cloud to be partitioned into a union of different clusters constructed by nodes in order to serve different applications. For flexibility, FACE does not always assign a specific node to a particular Master, Mapper, or Reducer. To optimize

Implementation of node

We implemented the FACE prototype on Windows Server 2003 because Windows is a popular OS with robust support for application development tools and Graphic User Interfaces (GUIs). We used the C language to implement a node with the support of multithreads and its components in different subroutines. In the node, we create a TCP server that uses a TCP socket to accept commands from networks and passes the commands to the Command Dispatcher to verify their formats and parameters. We created a

Experimental Testbed Configuration

Fig. 8, Experimental Testbed Configuration and Performance Metrics Used.

Fig. 8 shows our testbed cluster with 8 identical PCs as nodes for running Mappers and Reducers, and a notebook as the Extensible Master that can use computational resources of the 8 PCs and offer them the input files of an application. In the following experiments, we first measured the native performances of three canonical applications, i.e., Word Count [1], Radix Sort [21], and Pi Approximation [22], as the baseline for

Conclusion

In this paper, an architecture called Flexible Architecture for Cluster Evolution (FACE) is proposed to give application developers flexibility in customizing the ways in which data is partitioned, localized, and processed based on specific application requirements. FACE not only defines various language-independent and platform-independent system primitives but also includes an Extensible Master that application developers can extend. For high performance, FACE implements its main components

Acknowledgements

We gratefully acknowledge the National Science Council of Taiwan for its support of this project under Grant number NSC 102-2221-E-262-014. We thank Lunghwa University of Science and Technology for kindly providing us with the hardware equipment used to implement the prototype described in this work. We also thank the anonymous reviewers for their useful comments and Manu Malek for his kind advice and support throughout the preparation of the final revised version of this paper.

Tzu-Chi Huang received his B.S., M.S., and Ph.D. degrees in Electrical Engineering from National Cheng Kung University at Taiwan in 1997, 1999, and 2008 respectively. He was a system program engineer responsible for the development of network device drivers and related protocols in Silicon Integrated Systems (SiS) Corp. Now, he is an assistant professor in the Department of Electronic Engineering at Lunghwa University of Science and Technology at Taiwan. His research interests include cloud

References (25)

Y.-S. Jeong et al.
High availability and efficient energy consumption for cloud computing service with grid infrastructure
Comput Electr Eng
(2013)
N. Fernando et al.
Mobile cloud computing: a survey
Future Gener Comput Syst
(2013)
C. Rong et al.
Beyond lightning: a survey on security challenges in cloud computing
Comput Electr Eng
(2013)
J. Dean et al.
MapReduce: simplified data processing on large clusters
Commun ACM
(2008)
V. Cardellini et al.
The state of the art in locally distributed web-server systems
ACM Comput Surv
(2001)
Rimal BP, Choi E, Lumb I. A taxonomy and survey of cloud computing systems. In: Proceedings of fifth international...
Kurazumi S, Tsumura T, Saito S, Matsuo H. Dynamic processing slots scheduling for I/O intensive jobs of Hadoop...
Yang H-C, Dasdan A, Hsiao R-L, Parker DS. Map-reduce-merge: simplified relational data processing on large clusters....
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building...
Battre D, Ewen S, Hueske F, Kao O, Markl V, Warneke D. Nephele/PACTs: a programming model and execution framework for...

Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R. MapReduce online. In: Proceedings of the 7th USENIX...

Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, et al. Twister: a runtime for iterative MapReduce. In:...

Cited by (11)

Introduction to the special issue on Cloud Computing: Recent Developments and Challenging Issues
2015, Computers and Electrical Engineering
Workload Alleviation Scheduling Framework to Alleviate Negative Performance Impact of Intermediate Data Skew in Small-Scale MapReduce Cloud
2018, 2018 International Conference on System Science and Engineering, ICSSE 2018
Distributed control framework for mapreduce cloud on cloud computing
2018, IEEE/IFIP Network Operations and Management Symposium: Cognitive Management in a Cyber World, NOMS 2018
Computation capability deduction architecture for MapReduce on cloud computing
2017, Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
Smart partitioning mechanism for dealing with intermediate data skew in reduce task on cloud computing
2017, Proceedings - International Conference on Advanced Information Networking and Applications, AINA
Idempotent Task Cache System for Handling Intermediate Data Skew in MapReduce on Cloud Computing
2017, Proceedings - 2016 International Computer Symposium, ICS 2016

View all citing articles on Scopus

Sherali Zeadally is an Associate Professor in the College of Communication and Information at the University Kentucky. He received his bachelor degree in Computer Science from University of Cambridge, England and his doctorate degree in Computer Science from University of Buckingham, England. He is a Fellow of the British Computer Society and a Fellow of the Institute of Engineering Technology, England.

^☆: Reviews processed and approved for publication by the Editor-in-Chief Dr. Manu Malek.

View full text

Flexible architecture for cluster evolution in cloud computing☆

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Background on MapReduce

System overview

Implementation of node

Experimental Testbed Configuration

Conclusion

Acknowledgements

Comput Electr Eng

Future Gener Comput Syst

Comput Electr Eng

MapReduce: simplified data processing on large clusters

Commun ACM

The state of the art in locally distributed web-server systems

ACM Comput Surv