Implementation of a Big Data Architecture For The Realization of Predictive Models With Great Volumes of Data

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2021 by IJETT Journal
Volume-69 Issue-1
Year of Publication : 2021
Authors : Enrique Lee Huamaní, Avid Roman-Gonzalez
DOI :  10.14445/22315381/IJETT-V69I1P206

Citation 

MLA Style: Enrique Lee Huamaní, Avid Roman-Gonzalez "Implementation of a Big Data Architecture For The Realization of Predictive Models With Great Volumes of Data" International Journal of Engineering Trends and Technology 69.1(2021):35-42. 

APA Style:Enrique Lee Huamaní, Avid Roman-Gonzalez. Implementation of a Big Data Architecture For The Realization of Predictive Models With Great Volumes of Data  International Journal of Engineering Trends and Technology, 69(1), 35-42.

Abstract
The research direction of the University of Sciences and Humanities has integrated a Big Data architecture to make predictive models with large volumes of data. Therefore it was implemented with the purpose that in future research, this architecture can be used efficiently. In this study, the theoretical concepts of Hadoop version 2.0 will be discussed, as well as the next scalability in a Beowulf cluster implemented in one of the University`s laboratories and the configuration of Hadoop Spark and how they were able to work in conjunction. Finally, in the results section, tests will be carried out to validate that this architecture works perfectly.

Reference
[1] T. D. Wemegah and S. Zhu, Big data challenges in transportation: A case study of traffic volume count from massive Radio Frequency Identification(RFID) data, Conf. Proc. - 2017 Int. Conf. Front. Adv. Data Sci. FADS (2017) 58–63.
[2] T. J. Barnes, Big data, a little history, Dialogues Hum. Geogr., 3(3)(2013) 297–302.
[3] G. Sharma and A. Ganpati, Performance evaluation of fair and capacity scheduling in Hadoop YARN, Proc. 2015 Int. Conf. Green Comput. Internet Things, ICGCIoT (2015),904–906.
[4] A. Wakde, P. Shende, S. Waydande, S. Uttarwar, and G. Deshmukh, ,Comparative Analysis of Hadoop Tools and Spark Technology, Proc. - 2018 4th Int. Conf. Comput. Commun. Control Autom. ICCUBEA(2018) 1–4.
[5] N. M. Lapa Romero, J. A. Fiestas Iquira, A. Tenorio Trigoso, and Y. Nuñez Medrano, Pruebas de rendimiento sobre el Clúster de CPUs y GPUs empleando simulación N-body, (2018) 19–21.
[6] G. Bravo-Rocca, P. Torres-Robatty, and J. Fiestas-Iquira, Sparkmach: A distributed data processing system based on automated machine learning for big data, Commun. Comput. Inf. Sci., 898(2019) 121–128.
[7] I. Ocampo and L. Exequiel, INTRODUCCIÓN A LA SUPERCOMPUTACIÓN EN EL PERU, 39(5)(2017).
[8] M. Nunez-del-Prado, M. Rodriguez, and Ieee, Big Data Analytics Labs in the Cloud Spaces for Teamwork, 2017 7th World Eng. Educ. Forum, (2017) 499–503.
[9] S. Maddodi and K. P. K,Netflix Bigdata Analytics- The Emergence of Data-Driven Recommendation, SSRN Electron. J., 3( 2)(2019) 41–51.
[10] J. Fiestas, O. Porth, P. Berczik, and R. Spurzem, Evolution of growing black holes in axisymmetric galaxy cores, Mon. Not. R. Astron. Soc., 419(2012) 57–69.
[11] A. Siretskiy and O. Spjuth, HTSeq-Hadoop: Extending HTSeq for massively parallel sequencing data analysis using Hadoop, Proc. - 2014 IEEE 10th Int. Conf. eScience, eScience (2014),1,317–323.
[12] A. Shah and M. Padole, Load Balancing through Block Rearrangement Policy for Hadoop Heterogeneous Cluster, 2018 Int. Conf. Adv. Comput. Commun. Informatics, (2018) 230–236.
[13] C. Verma and R. Pandey, Comparative analysis of GFS and HDFS: Technology and architectural landscape, Proc. - 2018 10th Int. Conf. Comput. Intell. Commun. Networks, CICN,(2018) 54–58.
[14] T. Subbulakshmi and J. S. Manjaly,A comparison study and performance evaluation of schedulers in Hadoop YARN,Proc. 2nd Int. Conf. Commun. Electron. Syst. ICCES (2018)-Janua, no. Icces, (2018) 78–83.
[15] I. Hortonworks,Data access and data management.[Online]. Available: https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.1.2/bk_getting-started-guide/content/ch_hdp2_data-access-mgt.html.
[16] E. L. Huamaní, P. Condori, B. Meneses-Claudio, and A. Roman-Gonzalez, “Render farm for highly realistic images in a Beowulf cluster using distributed programming techniques,Int. J. Adv. Comput. Sci. Appl., 10(11)(2019) 407–411.
[17] E. L. Huamaní, A. M. Alicia, and A. Roman-Gonzalez,Machine Learning Techniques to Visualize and Predict Terrorist Attacks Worldwide using the Global Terrorism Database, Int. J. Adv. Comput. Sci. Appl.,11,(4)(2020) 562–570.
[18] H. Geng, Internet of things and data analytics handbook.(2017).
[19] E. L. Huamaní, P. Condori, and A. Roman-Gonzalez, “Implementation of a Beowulf Cluster and Analysis of its Performance in Applications with Parallel Programming,Int. J. Adv. Comput. Sci. Appl., 10(8)(2019) 522–527. [20] A. V. Hazarika, G. Jagadeesh Sai Raghu Ram, and E. Jain, Performance Comparision of Hadoop and spark engine, Proc. Int. Conf. IoT Soc. Mobile, Anal. Cloud, I-SMAC (2017) 671–674.
[21] Scala, Scala Downloads, [Online]. Available: https://scala-lang.org/files/archive/.(2020).
[22] T. A. S. Foundation, The Apache Software Foundation. [Online]. Available: https://spark.apache.org/downloads.html.

Keywords
Big Data, Hadoop, Spark, Predictive models, HDFS.