Designing Framework for Real Time Twitter Data Analytics using Apache Flume and Pig
Ashlesha S. Nagdive1, Rajkishor Tugnayat2

1Ashlesha S. Nagdive*, Assistant Professor Information Technology, G. H. Raisoni College of Engineering, Nagpur, India.
2Dr. Rajkishor Tugnayat, Principal, Shri Shankarprasad Agnihotri College of Engineering, Wardha, India.
Manuscript received on March 15, 2020. | Revised Manuscript received on March 24, 2020. | Manuscript published on March 30, 2020. | PP: 4474-4477 | Volume-8 Issue-6, March 2020. | Retrieval Number: F7726038620/2020©BEIESP | DOI: 10.35940/ijrte.F7726.038620

Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In the world of technology, people prefer social media to express themselves. Record says Twitter has more than 321 million active users with 100 million users posting approximately 340 million tweets a day. Twitter is the largest source of breaking news on social issues specially election-related where people can express their views also suggest their opinion. Twitter is generating unlimited unstructured text data. Hadoop is one of the finest tools accessible for analyzing twitter data because it supports processing of distributed big data, streaming data, time stamped data, text data etc. Whereas Apache Flume is used to extract real time twitter data into HDFS. This study attempts to establish an analytical framework to derive and interpret structured as well as unstructured Twitter data. The proposed framework comprises of real time twitter data insertion, its processing, and data visualization utilizing Apache Flume and pig. In this project we fetch positive and negative tweets on election data from twitter and analyzing the party status and the probability to win the election.
Keywords: Unstructured Twitter Data, HDFS, Apache Flume, Pig, Textblob, Dash.
Scope of the Article: Data Analytics.