Identifying Requirements for Big Data Analytics and Mapping to Hadoop Tools
Urmil Bharti1, Deepali Bajaj2, Anita Goel3, S. C. Gupta4

1Ms. Urmil Bharti, Assistant Professor in Department of Computer Science, Shaheed Rajguru College of Applied Sciences for women (University of Delhi).
2Ms. Deepali Bajaj, Assistant Professor in Department of Computer Science, Shaheed Rajguru College of Applied Sciences for women (University of Delhi).
3Dr. Anita Goel, Associate Professor in Department of Computer Science, Dyal Singh College, University of Delhi, India.
4Dr SC Gupta, Faculty at Dept of Computer Science and Engineering, IIT Delhi.

Manuscript received on 21 August 2019. | Revised Manuscript received on 27 August 2019. | Manuscript published on 30 September 2019. | PP: 4384-4392 | Volume-8 Issue-3 September 2019 | Retrieval Number: C5524098319/2019©BEIESP | DOI: 10.35940/ijrte.C5524.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Big data is being generating in a wide variety of formats at an exponential rate. Big data analytics deals with processing and analyzing voluminous data to provide useful insight for guided decision making. The traditional data storage and management tools are not well-equipped to handle big data and its application. Apache Hadoop is a popular open-source platform that supports storage and processing of extremely large datasets. For the purposes of big data analytics, Hadoop ecosystem provides a variety of tools. However, there is a need to select a tool that is best suited for a specific requirement of big data analytics. The tools have their own advantages and drawbacks over each other. Some of them have overlapping business use cases however they differ in critical functional areas. So, there is a need to consider the trade-offs between usability and suitability while selecting a tool from Hadoop ecosystem. This paper identifies the requirements for Big Data Analytics (BDA) and maps tools of the Hadoop framework that are best suited for them. For this, we have categorized Hadoop tools according to their functionality and usage. Different Hadoop tools are discussed from the users’ perspective along with their pros and cons, if any. Also, for each identified category, comparison of Hadoop tools based on important parameters is presented. The tools have been thoroughly studied and analyzed based on their suitability for the different requirements of big data analytics. A mapping of big data analytics requirements to the Hadoop tools has been established for use by the data analysts and predictive modelers.
Keywords: Hadoop Ecosystem, BDA Life Cycle, Data Ingestion Tools, Data Processing Frameworks, Data Storage and Access Tools.

Scope of the Article:
Requirements Engineering