A taxonomy of software-based and hardware-based approaches for energy efficiency management in the Hadoop

doi:10.1016/j.jnca.2018.11.007

Journal of Network and Computer Applications

Volume 126, 15 January 2019, Pages 162-177

https://doi.org/10.1016/j.jnca.2018.11.007 Get rights and content

Abstract

Apache Hadoop framework supports the storing and processing of big data datasets using simple programming models. Energy management has been recognized as one of the major issues in Hadoop, and many types of research have been conducted in this scope. However, despite the importance of this issue, there is no inclusive study about energy efficiency in Hadoop. In this paper, the techniques of energy efficiency in Hadoop are classified into two main categories. Moreover, the benefits and drawbacks of these methods and a systematic study of the conducted research are provided and examined in this paper. Another aim is to provide the visions for the descriptions of open issues and recommendations for future research.

Introduction

Nowadays, the ability to analyze the big data repositories remains a problem in many modern enterprises and research societies (Gonçalves et al., 2017; Khezr and Navimipour, 2017). Every day, large amounts of data are produced from numerous sources e.g. sensors, digital pictures, videos, purchase transaction records, and cell phone, but mining suitable information for making an appropriate decision from these massive data repositories is almost impractical for the traditional database management system (DBMS) technologies (Cuzzocrea et al., 2011).

The Hadoop as an applicable solution to big data (Rashmi and Basu, 2017) provides reliable, fault-tolerant, scalable, and efficient services for large amounts of data processing using MapReduce (Uzunkaya et al., 2015; Zhao, 2017). The simple programming interface, high scalability and the capability of processing a high amount of data in the distributed processing environments are considered as its main features (Khezr and Navimipour, 2017). The MapReduce has an important role in performing a very large number of data-intensive applications (Cassales et al., 2015; Chelliah, 2017).

Recently, the significant issue in the data centers is the efficiency of energy (Khan et al., 2016; Kurpicz et al., 2018). According to U.S. department of energy report in 2014, the U.S data centers spent about 70 billion kWh (1.8% of entire U.S. electricity consumption). It is estimated that about 73 billion kWh will be consumed by U.S data centers in 2020 (Shehabi et al., 2016). Given the environmental challenges and the limited resources of energy and high energy costs (Akhter and Othman, 2016; Babar et al., 2017), hardware and software techniques should be used to reduce the energy consumption. As a result, the energy reduction is a big challenge for Hadoop which consists of the large cluster (Usama et al., 2017).

This paper is a key systematic one about the energy utilization techniques for Hadoop. It discusses different software methods, such as scheduling, and hardware methods such as Dynamic Voltage and Frequency Scaling (DVFS) which are employed to reduce the energy consumption. Providing the conceptual aspects of energy efficiency in Hadoop is the main goal of this paper. The contributions of this study are listed below:

•
Providing a review on the current energy-aware methods for Hadoop.
•
Dividing energy efficiency techniques into two main classes, including software-based and hardware-based techniques.
•
Providing the benefits and drawbacks of the existing energy efficiency techniques for Hadoop.
•
Discussing and comparing the main challenges for the energy efficiency in the Hadoop.
•
Highlighting the guidelines for future research and open issues about the energy efficiency in the Hadoop.

Furthermore, the techniques are compared in this paper using some performance measures and the Quality of Service (QoS) parameters (Conejero et al., 2016) such as data locality, fault tolerance, heterogeneity, scalability, makespan, performance, cost, and load balancing. Therefore, we provide a brief discussion of them.

•
Data Locality: It means moving computation close to data rather than moving data towards computation (George et al., 2016).
•
Heterogeneity: In a heterogeneous data center, there are some nodes with dissimilar abilities such as computing power (Rasooli and Down, 2014b).
•
Fault tolerance: It provides continuous and correct operation of a system in the presence of the failure of its component(s) (Sampaio and Barbosa, 2018).
•
Scalability: It is the capacity to be changed and reformed in numerous situations in a Hadoop cluster (Zhang et al., 2018).
•
Makespan: It is the time variance between the beginning and the end of the job or task sequence (Kalra and Singh, 2015).
•
Load balancing: It enhances the distribution of loads across multiple computing resources (Gao and Yu, 2017; Ghomi et al., 2017).
•
Cost: Two types of costs can be considered, one is in term of manpower and the other in term of money (Majeed and Shah, 2015).
•
Performance: The amount of useful estimated fulfilled work in terms of time needed, used resources, etc. (Cheng et al., 2017a).

The arrangement of various sections of the articles is as follows: Hadoop and its components are presented in Section 2. Section 3 reviews the related work. Section 4 provides the research selection process and a Systematic Literature Review (SLR). Section 5 systematically overviews the energy efficiency approaches in the Hadoop and classifies them. Furthermore, this section provides a comparison of the methods of the selected articles. Section 6 discusses the obtained findings. Some open issues are elaborated in Section 7. Finally, Section 8 presents the conclusion in addition to the paper limitations.

Section snippets

Background

The Google's MapReduce and Google File System (GFS) model are performed by Apache Hadoop (Cassales et al., 2016; Li et al., 2017; Park et al., 2016; Qin et al., 2017; Veiga et al., 2018) that supports the storing and processing of big datasets. It has attracted the attention of both the industrial communities and academic due to its open source solution (Polato et al., 2014). The Hadoop framework is classified as follows:

Motivation and related work

Some related works on Hadoop, MapReduce, and energy issues are discussed briefly in this section.

Majeed and Shah (2015) have presented a survey according to the state-of-the-art on some techniques and architectures of the energy efficiency in big data during 2007–2015. First of all, they have considered the existing surveys on energy consumption utilization. Then, they have categorized the research papers in terms of a hardware-based, component-based and the best energy efficiency methods that

Research methodology

The SLR is offered in this section to improve the understanding of the energy efficiency techniques in the Hadoop. All examination that addresses a specific issue is analyzed by SLR which is a critical assessment (Navimipour and Charband, 2016; Soltani and Navimipour, 2016). The article classification and selection process as two parts of the search process are discussed in the next subsections.

Energy efficiency techniques in Hadoop

The present section describes the differences, advantages, and disadvantages of the main state-of-the-art energy efficiency mechanisms in the Hadoop. We review software-based and hardware-based articles for reducing the energy consumption in the Hadoop. These articles have applied software or hardware techniques, or both.

Discussion

In the previous sections, we discussed the energy efficiency techniques of Hadoop in two main groups: software-based and hardware-based techniques. Now, a statistical analysis of declared techniques regarding the energy efficiency in the Hadoop is going to be considered. Table 4 and Table 5 show the main properties of the discussed methods like kind of Hadoop environment, the platform of implementation or simulation in software-based and hardware-based techniques, respectively. Also, Fig. 8

Open challenges and future work

Future works should consider some important challenges. The mentioned issues are discussed and investigated in this section. In the rest of this section, some important directions are provided for future researches.

•
Heterogeneity as the main cause of performance variability is available in the hardware and workload characteristics. The performance and energy consumption vary by performing the same task on various nodes. Some factors such as type of workload and the rate of Hadoop's tasks can

Conclusion and limitation

This paper refers to survey the previous and the present mechanisms for energy efficiency in the Hadoop systematically. First, we have overviewed Hadoop and its components. Then, we have explained the research methodology and have classified 22 selected articles into two groups that 11 of them are software-based approach and 11 of them are the hardware-based approach. Also, important methods of each category and their advantages and disadvantages are discussed. The reason behind addressing the

Fatemeh Shabestari received his B.S. in computer engineering, software, from Shabestar Branch, Islamic Azad University, Shabestar, Iran, in 2005 and the M.S. in computer engineering, software, from Shabestar Branch, Islamic Azad University, Shabestar, Iran, in 2009. She is currently a Ph.D. candidate in computer engineering at Science and Research Branch, Islamic Azad University, Tehran, Iran. Her research interests include big data and green computing.

References (112)

K.M. Attia et al.
Dynamic power management techniques in multi-core architectures: a survey study
Ain Shams Eng. J.
(2017)
A. Beloglazov et al.
A taxonomy and survey of energy-efficient data centers and cloud computing systems
Adv. Comput.
(2011)
G.W. Cassales et al.
Context-aware scheduling for Apache hadoop over pervasive environments
Procedia Comput. Sci.
(2015)
J. Conejero et al.
Analyzing Hadoop power consumption and impact on application QoS
Future Generat. Comput. Syst.
(2016)
S. Costache et al.
Resource management in cloud platform as a service systems: analysis and opportunities
J. Syst. Software
(2017)
H. Fu et al.
FARMS: efficient mapreduce speculation for failure recovery in short jobs
Parallel Comput.
(2017)
D. Glushkova et al.
Mapreduce performance model for Hadoop 2. x
Info. Syst.
(2019)
A. Gonçalves et al.
Towards of a real-time big data architecture to intensive care
Procedia Comput. Sci.
(2017)
S. Ibrahim et al.
Governing energy consumption in hadoop through cpu frequency scaling: an analysis
Future Generat. Comput. Syst.
(2016)
M. Kalra et al.
A review of metaheuristic scheduling techniques in cloud computing
Egypt. Inf. J.
(2015)

Y.-C. Kao et al.

Data-locality-aware mapreduce real-time scheduling framework

J. Syst. Software

(2016)

M. Kurpicz et al.

Energy-proportional profiling and accounting in heterogeneous virtualized environments

Sustain. Comput. Info. Syst.

(2018)

P. Leimich et al.

A RAM triage methodology for Hadoop HDFS forensics

Digit. Invest.

(2016)

Z. Lu et al.

InSTechAH: cost-effectively autoscaling smart computing hadoop cluster in private cloud

J. Syst. Architect.

(2017)

I. Mavridis et al.

Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark

J. Syst. Software

(2017)

K. Neshatpour et al.

Energy-efficient acceleration of MapReduce applications using FPGAs

J. Parallel Distr. Comput.

(2018)

P.P. Nghiem et al.

Towards efficient resource provisioning in MapReduce

J. Parallel Distr. Comput.

(2016)

A. Oussous et al.

Big Data technologies: A survey

J. King Saud Univ. Comput. Info. Sci.

(2018)

I. Polato et al.

A comprehensive view of Hadoop research—a systematic literature review

J. Netw. Comput. Appl.

(2014)

A. Rasooli et al.

COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems

Future Generat. Comput. Syst.

(2014)

A. Reuther et al.

Scalable system scheduling for HPC and big data

J. Parallel Distr. Comput.

(2018)

N.B. Rizvandi et al.

Some observations on optimal frequency selection in DVFS-based energy consumption minimization

J. Parallel Distr. Comput.

(2011)

A.M. Sampaio et al.

A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud

Sustain. Comput. Info. Syst.

(2018)

Y. Shao et al.

Efficient jobs scheduling approach for big data applications

Comput. Ind. Eng.

(2018)

S. Singh et al.

Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster

Comput. Elect. Eng.

(2018)

Z. Soltani et al.

Customer relationship management mechanisms: a systematic review of the state of the art literature and recommendations for future research

Comput. Hum. Behav.

(2016)

J. Song et al.

Modulo based data placement algorithm for energy consumption optimization of MapReduce system

J. Grid Comput.

(2016)

M. Soualhia et al.

Task scheduling in big data platforms: a systematic literature review

J. Syst. Software

(2017)

A. Spivak et al.

Storage tier-aware replicative data reorganization with prioritization for efficient workload processing

Future Generat. Comput. Syst.

(2018)

M. Usama et al.

Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs

Digital Commun. Netw.

(2017)

C. Uzunkaya et al.

Hadoop ecosystem and its analysis on tweets

Procedia Soc. Behav. Sci.

(2015)

M. Varga et al.

Deadline scheduling algorithm for sustainable computing in Hadoop environment

Comput. Secur.

(2018)

J. Veiga et al.

BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks

Future Generat. Comput. Syst.

(2018)

Y.-F. Wen

Energy-aware dynamical hosts and tasks assignment for cloud computing

J. Syst. Software

(2016)

N. Akhter et al.

Energy aware resource allocation of cloud data center: review and open issues

Cluster Comput.

(2016)

S.R. Alapati

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

(2016)

A. Alhamali et al.

FPGA-accelerated hadoop cluster for deep learning computations

B. Antony et al.

Professional Hadoop

(2016)

Apache Hadoop

(2018)

J.A. Aroca et al.

A measurement-based characterization of the energy consumption in data center servers

IEEE J. Sel. Area. Commun.

(2015)

H. Artail et al.

Speedy Cloud: Cloud Computing with Support for Hardware Acceleration Services

IEEE Trans. Cloud Comput.

(2017)

P. Azad et al.

An energy-aware task scheduling in the cloud computing using a hybrid cultural and ant colony optimization algorithm

Int. J. Cloud Appl. Comput. (IJCAC)

(2017)

M. Babar et al.

Energy-harvesting based on internet of things and big data analytics for smart health monitoring

Sustain. Comput. Info. Syst.

(2017)

M. Bakratsas et al.

Hadoop MapReduce performance on SSDs: the case of complex network analysis tasks

X. Cai et al.

SLA-aware energy-efficient scheduling scheme for Hadoop YARN

J. Supercomput.

(2017)

G.W. Cassales et al.

Improving the performance of Apache Hadoop on pervasive environments through context-aware scheduling

J. Ambient Intell. Hum. Comput.

(2016)

Y. Charband et al.

Online knowledge sharing mechanisms: a systematic review of the state of the art literature and recommendations for future research

Inf. Syst. Front

(2016)

P.R. Chelliah

The hadoop ecosystem technologies and tools

D. Cheng et al.

Improving performance of heterogeneous mapreduce clusters with adaptive task tuning

IEEE Trans. Parallel Distr. Syst.

(2017)

D. Cheng et al.

Energy efficiency aware task assignment with DVFS in heterogeneous hadoop clusters

Cited by (38)

Blockchain based Securing Medical Records in Big Data Analytics
2023, Data and Knowledge Engineering
Citation Excerpt :
So, there is no pseudonymity. Access control is particularly difficult task in electronic health, because resources, data are dispersed amid the similar installation and organizations [15,16]. Therefore, certain solutions are essential towards to solve this issues, which is motivated to do this research area.
The patient privacy is danger while medical records and data are transmitted or share beyond secure big data. This is because violations push them to the margins and they begin to avoid fully revealing their stages. This kind of stages contains negative impact in scientific investigate. To overcome this issue, Secure Block Chain System for Managing and Sharing Electronic Medical Records in Big Data Field is proposed. In this manuscript, a Cryptographic Hash Generator (CHG) technique based Secured and Trusted Data storage and transmission using Block Chain (BC) in Hadoop Distributed File System (HDFS). Initially, the Big data collected from the health care center is partitioned into sensitive and insensitive data. Block chain system utilizes an asymmetric cryptography for validating transactions authentication. Here, the user key is created through secured bitwise cryptographic hash generator (CHG) while there is required to fetch the newly record for usage. In block chain system, when a user seeking data from a healthcare application have forward a request to CHG. The message is send back to the user with a secret key for confirmation. The key can be decrypted or even denied access if only a valid user allows the user to link to this cluster. Only sensitive data were selected to the process of encryption for the process of encryption, this CHG technique employs the Discrete Shearlet Transform (DST) for encrypting the data, and the data’s are warehoused in the block chain to upgrade the level of security. And the insensitive data are put directly on the Hadoop Distributed File System. During the verification process, CHG is utilized for creating the request forward through the user. The operator creates the purpose of remote key to create the block (request) and signing the request using transaction private key, then forward to request queuing. To validate a request, the request from the queue is supplied first and an Improved Grey wolf Optimization algorithm (IGWO) is utilized to determine the optimal request that is fetch through the consensus node for initiating the process of validation. After accepting the user’s request, access is given to the user associated with input or requested data, then the verified request is set to broadcast. The proposed method is executed in JAVA and Hadoop platform. Experimental results show that the proposed BC-CHG-DST-IGWOA shows better performances of higher Efficiency 20.14%, 31.25%, 24.33%, 14.69%, lower time 16.12%, 15.09%, 21.36%, 46.26% compared with the existing methods, such as medical records managing and securing blockchain based system supported by genetic algorithm with discrete wavelet transform (BC-SMR-BD-GA-DWT), DQN-based optimization framework to secure shared blockchain systems (BC-SMR-BD-DQNSB), Hyper ledger blockchain enabled secure medical record management along deep learning-based diagnosis model (BC-SMR-BD-HBESDM-DLD), Secure attribute-based signature scheme with multiple authorities for blockchain in electronic health records system (BC-SMR-BD-MA-ABS) respectively.
SAAS parallel task scheduling based on cloud service flow load algorithm
2022, Computer Communications
Citation Excerpt :
The global scheduler is responsible for assigning new tasks to the appropriate virtual machines. The local controller uses reinforcement learning technology to automatically control the switch of the virtual machine by predicting the busy or idle state of each virtual machine in the future [33]. In terms of resource allocation in a competitive environment, Buyya and others put forward the concept of market-oriented cloud computing, which laid the foundation for the commercialization of cloud computing.
In cloud platform applications, the user’s goal is to obtain high-quality application services, while the service provider’s goal is to obtain revenue by performing the tasks submitted by the user. The platform built by the service provider’s application resources needs to improve the mapping between service requests and resources to achieve higher value. Through the current situation of resource management in the cloud environment, it is found that many task scheduling and resource allocation algorithms are still affected by factors such as the diversity, dynamics, and multiple constraints of resources and tasks. This paper focuses on Software as a Service (SaaS) applications’ task scheduling and resource configuration in a dynamic and uncertain cloud environment. It is a challenging online scheduling problem to automatically and intelligently allocate user task requests that continually reach SaaS applications to appropriate resources for execution. To this end, a real-time task scheduling method based on deep reinforcement learning is proposed, which automatically and intelligently allocates user task requests that continually reach SaaS applications to appropriate resources for execution. In this way, the limited virtual machine resources rented by SaaS providers can be used in a balanced and efficient manner. In the experiment, by comparing with other five task scheduling algorithms, it is proved that the algorithm proposed in this paper not only improves the execution efficiency of better deploying workflow in IaaS public cloud, but also makes the resources provided by SaaS are used in a balanced and efficient manner.
Analysis of hadoop MapReduce scheduling in heterogeneous environment
2021, Ain Shams Engineering Journal
Citation Excerpt :
The processing part is done by MapReduce. MapReduce processing comprised of two main tasks- Map and reduce [22,23]. The detailed processing of MapReduce is explained below-MapReduce execution starts with submitting the input file which resides in HDFS.
Over the last decade, several advancements have happened in distributed and parallel computing. A lot of data is generated daily from various sources, and this speedy data proliferation led to the development of many more frameworks that are efficient to handle such huge data e.g. - Microsoft Dryad, Apache Hadoop, etc. Apache Hadoop is an open-source application of Google MapReduce and is getting a lot of attention from various researchers. Proper scheduling of jobs needs to be done for better performance. Numerous efforts have been done in the development of existing MapReduce schedulers and in developing new optimized techniques or algorithms. This paper focuses on the Hadoop MapReduce framework, its shortcomings, various issues we face while scheduling jobs to nodes and algorithms proposed by various researchers. Furthermore, we then classify these algorithms on various quality measures that affect MapReduce performance.
SPO: A Secure and Performance-aware Optimization for MapReduce Scheduling
2021, Journal of Network and Computer Applications
Citation Excerpt :
Apache Hadoop assists the distributed storing and processing of big datasets using Google’s MapReduce and Google File System (GFS) models. The prevalence of Hadoop in industries and academic communities is due to its open-source solution (Shabestari et al., 2019). The Hadoop framework is classified as described in Section 2.3.
MapReduce is a common framework that effectively processes multi-petabyte data in a distributed manner. Therefore, MapReduce is widely used in heterogeneous environments, such as cloud, to provide performance adequate for system needs. Despite the MapReduce benefits, tweaking the system configuration to achieve the maximum performance is still challenging and needs deep expertise. Besides, some new MapReduce security issues, which has not been well-addressed yet, are recently raised. In this paper, we present a performance-aware and secure framework, named $S P O$ , to minimize the makespan of the tasks while considering task security constraints. Inspired by the $H E F T$ algorithm, first, we introduce $S P O$ , which proposes a two-stage static scheduler in Map and Reduce phases, respectively, to minimize makespan while considering network traffic. Plus, $S P O^{*}$ introduces a mathematical optimization model of the proposed scheduler aiming to estimate the system performance while considering security constraints with an error of less than 2%. The experimental results demonstrate that $S P O$ outperforms Hadoop-stock in terms of makespan and network traffic by 29% and 31%, respectively, for the tasks running in heterogeneous environments.
A systematic study on meta-heuristic approaches for solving the graph coloring problem
2020, Computers and Operations Research
Citation Excerpt :
According to Cook et al. (1997), SLR has been distinguished from an old study, if there's any duplicable, technical, and clear procedure. The goal of an SLR is presenting a thorough outline of present significant works (Aznoli and Navimipour, 2017; Pourghebleh and Jafari Navimipour, 2019; Shabestari et al., 2019). As a technique, it was stimulated by the discipline of medicine (Kitchenham, 2004; Ebrahimi et al., 2014; Rahim et al., 2013; Nesioonpour et al., 2014) which offered a look into technique and adequate points of interest repeated by different scientists (Cook et al., 1997; Charband and Navimipour, 2016).
Typically, Graph Coloring Problem (GCP) is one of the key features for graph stamping in graph theory. The general approach is to paint at least edges, vertices, or the surface of the graph with some colors. In the simplest case, a kind of coloring is preferable in which two vertices are not adjacent to the same color. Similarly, the two edges in the same joint should not have the same color. In addition, the same goes for the surface color of the graph. This is one of the NP-hard issues well studied in graph theory. Therefore, many different meta-heuristic techniques are presented to solve the problem and provide high performance. Seemingly, regardless of the importance of the nature-stimulated meta-heuristic methods to solve the GCP, there is not any inclusive report and detailed review about overviewing and investigating the crucial problems of the field. As a result, the present study introduces a wide-ranging reporting of nature- stimulated meta-heuristic methods, which are used throughout the graph coloring. The literature review contains a classification of significant techniques. This study mainly aims at emphasizing the optimization algorithms to handle the GCP problems. Furthermore, the advantages and disadvantages of the meta-heuristic algorithms in solving the GCP and their key issues are examined to offer more advanced meta-heuristic techniques in the future.
Influence of Social and Environmental Responsibility in Energy Efficiency Management for Smart City
2022, Journal of Interconnection Networks

View all citing articles on Scopus

Amir Masoud Rahmani received his B.S. in Computer Engineering from Amir Kabir University, Tehran, in 1996, the MS in Computer Engineering from Sharif University of Technology, Tehran, in 1998 and the Ph.D. degree in Computer Engineering from IAU University, Tehran, in 2005. Currently, he is a Professor in the Department of Computer Engineering at the IAU University. He is the author/co-author of more than 150 publications in technical journals and conferences. His research interests are in the areas of distributed systems, ad hoc and wireless sensor networks and evolutionary computing.

Nima Jafari Navimipour received his B.S. in computer engineering, software engineering, from Tabriz Branch, Islamic Azad University, Tabriz, Iran, in 2007; the M.S. in computer engineering, computer architecture, from Tabriz Branch, Islamic Azad University, Tabriz, Iran, in 2009; the Ph.D. in computer engineering, computer architecture, from Science and Research Branch, Islamic Azad University, Tehran, Iran in 2014. He is an assistance professor in the Department of Computer Engineering at Tabriz Branch, Islamic Azad University, Tabriz, Iran. He has published more than 100 papers in various journals and conference proceedings. His research interests include Cloud Computing, Social Networks, Fault-Tolerance Software, QCA, Internet of Things, and Network on Chip.

Sam Jabbehdari is currently working as an associated professor at the department of Computer Engineering in IAU (Islamic Azad University), North Tehran Branch, in Tehran, since 1993. He received his both B.Sc. and M.S. degrees in Electrical Engineering Telecommunication from Khajeh Nasir Toosi University of Technology, and IAU, South Tehran branch in Tehran, Iran, respectively. He was honored Ph.D. degree in Computer Engineering from IAU, Science and Research Branch, Tehran, Iran in 2005. His current research interests are Scheduling, QoS, MANETs, Wireless Sensor Networks and Cloud Computing.

View full text

A taxonomy of software-based and hardware-based approaches for energy efficiency management in the Hadoop

Abstract

Introduction

Section snippets

Background

Motivation and related work

Research methodology

Energy efficiency techniques in Hadoop

Discussion

Open challenges and future work

Conclusion and limitation

Ain Shams Eng. J.

Adv. Comput.

Procedia Comput. Sci.

Future Generat. Comput. Syst.

J. Syst. Software

Parallel Comput.

Info. Syst.

Procedia Comput. Sci.

Future Generat. Comput. Syst.

Egypt. Inf. J.

J. Syst. Software

Sustain. Comput. Info. Syst.

Digit. Invest.

J. Syst. Architect.

J. Syst. Software

J. Parallel Distr. Comput.

J. Parallel Distr. Comput.

J. King Saud Univ. Comput. Info. Sci.

J. Netw. Comput. Appl.

Future Generat. Comput. Syst.

J. Parallel Distr. Comput.

J. Parallel Distr. Comput.

Sustain. Comput. Info. Syst.

Comput. Ind. Eng.

Comput. Elect. Eng.

Comput. Hum. Behav.

J. Grid Comput.

J. Syst. Software

Future Generat. Comput. Syst.

Digital Commun. Netw.

Procedia Soc. Behav. Sci.

Comput. Secur.

Future Generat. Comput. Syst.

J. Syst. Software

Energy aware resource allocation of cloud data center: review and open issues

Cluster Comput.

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

FPGA-accelerated hadoop cluster for deep learning computations

Professional Hadoop

Apache Hadoop

A measurement-based characterization of the energy consumption in data center servers

IEEE J. Sel. Area. Commun.

Speedy Cloud: Cloud Computing with Support for Hardware Acceleration Services

IEEE Trans. Cloud Comput.

An energy-aware task scheduling in the cloud computing using a hybrid cultural and ant colony optimization algorithm

Int. J. Cloud Appl. Comput. (IJCAC)

Energy-harvesting based on internet of things and big data analytics for smart health monitoring

Sustain. Comput. Info. Syst.

Hadoop MapReduce performance on SSDs: the case of complex network analysis tasks

SLA-aware energy-efficient scheduling scheme for Hadoop YARN

J. Supercomput.

Improving the performance of Apache Hadoop on pervasive environments through context-aware scheduling

J. Ambient Intell. Hum. Comput.

Online knowledge sharing mechanisms: a systematic review of the state of the art literature and recommendations for future research

Inf. Syst. Front

The hadoop ecosystem technologies and tools

Improving performance of heterogeneous mapreduce clusters with adaptive task tuning

IEEE Trans. Parallel Distr. Syst.

Energy efficiency aware task assignment with DVFS in heterogeneous hadoop clusters