A Cloud Pub/Sub Architecture to Integrate Google Big Query with Elasticsearch using Cloud Functions

Authors

  • Sergio Laureano Gutiérrez
  • Yasiel Pérez Vera

DOI:

https://doi.org/10.47839/ijc.21.3.2694

Keywords:

Cloud Computing Architecture, Cloud Function, Cloud Service, Serverless, Pub/Sub

Abstract

In recent years, the need for analytics on large volumes of data has become increasingly important. It turns out to be extremely useful in making strategic decisions about different applications. In this way, appropriate mechanisms must be designed to carry out data processing and integration with different platforms to take advantage of their best features. In this work, an architecture that works on cloud services is shown to migrate data stored in Big Query to an analytics engine such as Elasticsearch and take advantage of its potential in query, insert and display operations. This is accomplished through the use of Cloud Functions and Pub / Sub. The integration of these platforms through the proposed architecture showed 100% effectiveness when transferring data to another, maintaining an insertion rate of 4,138.30 documents per second, demonstrating its robustness, efficiency, and versatility when performing a data migration. This pretends to establish an architecture solution when it comes about handling a large amount of data as in the real world.

References

Q. Na, J. Lou, Y. Yang, D. Su, J. Wu, and J. Zeng, “A big data technology-based approach to power neural network analysis,” in Proceedings of the 9th Frontier Academic Forum of Electrical Engineering, Singapore, 2021, pp. 677–688. https://doi.org/10.1007/978-981-33-6606-0_62.

Atta-ur-Rahman, S. Dash, A. Kr. Luhach, N. Chilamkurti, S. Baek, and Y. Nam, “A neuro-fuzzy approach for user behaviour classification and prediction,” Journal of Cloud Computing, vol. 8, no. 1, p. 17, 2019. https://doi.org/10.1186/s13677-019-0144-9.

Seagate Technology LLC, “Seagate advises global business leaders and entrepreneurs to sharpen focus on data critical to the success of global business impact,” Business Wire, a Berkshire Hathaway company, Apr. 04, 2017. [Online]. Available at: https://www.businesswire.com/news/home/20170403006056/en/Seagate-Advises-Global-Business-Leaders-and-Entrepreneurs-to-Sharpen-Focus-on-Data-Critical-to-the-Success-of-Global-Business-Impact.

N. Feng and Q. Yin, “Research on computer software engineering database programming technology based on virtualization cloud platform,” Proceedings of the 2020 IEEE 3rd International Conference of Safe Production and Informatization (IICSPI), 2020, pp. 696–699. https://doi.org/10.1109/IICSPI51290.2020.9332454.

O. Debauche, S. A. Mahmoudi, N. D. Cock, S. Mahmoudi, P. Manneback, and F. Lebeau, “Cloud architecture for plant phenotyping research,” Concurrency and Computation: Practice and Experience, vol. 32, no. 17, p. e5661, 2020. https://doi.org/10.1002/cpe.5661.

U. Suthakar, L. Magnoni, D. R. Smith, and A. Khan, “Optimised lambda architecture for monitoring scientific infrastructure,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 6, pp. 1395–1408, 2021. https://doi.org/10.1109/TPDS.2017.2772241.

L. dos S. Dourado, R. S. Miranda, A. P. F. de Araujo, and E. Ishikawa, “Performance evaluation of big data applications in cloud providers,” Proceedings of the 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), 2020, pp. 1–6. https://doi.org/10.23919/CISTI49556.2020.9140855.

B. Kotecha and H. Joshiyara, “Handling non-relational databases on big query with scheduling approach and performance analysis,” Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018, pp. 1–5. https://doi.org/10.1109/ICCUBEA.2018.8697561.

N. Newman, S. Gilman, M. Burdumy, M. Yimen, and O. Lattouf, “A novel tool for patient data management in the ICU – Ensuring timely and accurate vital data exchange among ICU team members,” International Journal of Medical Informatics, vol. 144, p. 104291, 2020. https://doi.org/10.1016/j.ijmedinf.2020.104291.

L. Chen, N. Zhang, H.-M. Sun, C.-C. Chang, S. Yu, and K.-K. R. Choo, “Secure search for encrypted personal health records from big data NoSQL databases in cloud,” Computing, vol. 102, no. 6, pp. 1521–1545, 2020. https://doi.org/10.1007/s00607-019-00762-z.

M. Bendechache, S. Svorobej, P. T. Endo, A. Mihai, and T. Lynn, “Simulating and evaluating a real-world elasticsearch system using the RECAP DES simulator,” Future Internet, vol. 13, no. 4, Art. no. 4, 2021. https://doi.org/10.3390/fi13040083.

G. Papadimitriou et al., “End-to-end online performance data capture and analysis for scientific workflows,” Future Generation Computer Systems, vol. 117, pp. 387–400, 2021. https://doi.org/10.1016/j.future.2020.11.024.

S. Ren, J.-S. Kim, W.-S. Cho, S. Soeng, S. Kong, and K.-H. Lee, “Big data platform for intelligence industrial IoT sensor monitoring system based on edge computing and AI,” Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 2021, pp. 480–482. https://doi.org/10.1109/ICAIIC51459.2021.9415189.

G. Zhao, S. Hassan, Y. Zou, D. Truong, and T. Corbin, “Predicting performance anomalies in software systems at run-time,” ACM Trans. Softw. Eng. Methodol., vol. 30, no. 3, pp. 33:1-33:33, 2021. https://doi.org/10.1145/3440757.

E. Bugingo, D. Zhang, Z. Chen, and W. Zheng, “Towards decomposition based multi-objective workflow scheduling for big data processing in clouds,” Cluster Comput, vol. 24, no. 1, pp. 115–139, 2021. https://doi.org/10.1007/s10586-020-03208-w.

J. Ding, “Development of computer-aided English listening system based on BS architecture,” Computer-Aided Design and Applications, vol. 19, no. S1, pp. 93–104, 2022. https://doi.org/10.14733/cadaps.2022.S1.93-104.

D. Gandhi, “Analyzing open source GitHub repositories towards technology acceptance model,” Pace University, the Michael L. Gargano, 18th Annual Research Day, May 8th, 2020, pp. 1-6.

X. Tian, T. Zhang, X. Zhuang, and X. He, “Research and implementation of campus network search engine based on scrapy framework and elasticsearch,” 2020, pp. 4193–4198. https://doi.org/10.1109/CCDC49329.2020.9164582.

H. Falatiuk, M. Shirokopetleva and Z. Dudar, “Investigation of architecture and technology stack for e-archive system,” Proceedings of the 2019 IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), 2019, pp. 229-235. https://doi.org/10.1109/PICST47496.2019.9061407.

H.-L. Truong, “Integrated analytics for IIoT predictive maintenance using IoT big data cloud systems,” Proceedings of the 2018 IEEE International Conference on Industrial Internet (ICII), 2018, pp. 109–118. https://doi.org/10.1109/ICII.2018.00020.

M. Jaiswal, “Software architecture and software design,” International Research Journal of Engineering and Technology (IRJET), vol. 6, issue 11, pp. 2452–2454, 2019. https://doi.org/10.2139/ssrn.3772387.

O. Sievi-Korte, I. Richardson, and S. Beecham, “Software architecture design in global software development: An empirical study,” Journal of Systems and Software, vol. 158, 110400, 2019. https://doi.org/10.1016/j.jss.2019.110400.

S. Stoja, S. Vukmirovič, and B. Jelačić, “Publisher/subscriber implementation in cloud environment,” 2013, pp. 677–682. https://doi.org/10.1109/3PGCIC.2013.116.

Google Cloud, “Pub/Sub: A Google-scale messaging service,” Google Cloud, 2021. [Online]. Available at: https://cloud.google.com/pubsub/architecture.

Amazon Web Services, Inc., “What is Pub/Sub messaging?” Amazon Web Services, Inc., 2021. [Online]. Available at: https://aws.amazon.com/pub-sub-messaging/

M. Malawski, A. Gajek, A. Zima, B. Balis, and K. Figiela, “Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google cloud functions,” Future Generation Computer Systems, vol. 110, pp. 502–514, 2020. https://doi.org/10.1016/j.future.2017.10.029.

E. Van Eyk, A. Iosup, S. Seif, and M. Thömmes, “The spec cloud group’s research vision on FAAS and serverless architectures,” 2017, pp. 1–4. https://doi.org/10.1145/3154847.3154848.

V. Lakshmanan and J. Tigani, Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale, O’Reilly Media, Inc., 2019.

V.-A. Zamfir, M. Carabas, C. Carabas, and N. Tapus, “Systems monitoring and big data analysis using the elasticsearch system,” Proceedings of the 2019 22nd International Conference on Control Systems and Computer Science (CSCS), 2019, pp. 188-193. https://doi.org/10.1109/CSCS.2019.00039.

S. Gupta and R. Rani, “A comparative study of elasticsearch and CouchDB document oriented databases,” Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT), 2016, vol. 1, pp. 1–4. https://doi.org/10.1109/INVENTIVE.2016.7823252.

A. Yang, S. Zhu, X. Li, J. Yu, M. Wei, and C. Li, “The research of policy big data retrieval and analysis based on elastic search,” Proceedings of the International Conference on Artificial Intelligence and Big Data (ICAIBD), 2018, pp. 43-46. https://doi.org/10.1109/ICAIBD.2018.8396164.

R. Kuc and M. Rogozinski, Elasticsearch Server, Packt Publishing Ltd, 2013.

P. P. I. Langi, Widyawan, W. Najib, and T. B. Aji, “An evaluation of Twitter river and Logstash performances as elasticsearch inputs for social media analysis of Twitter,” Proceedings of the 2015 International Conference on Information & Communication Technology and Systems (ICTS), 2015, pp. 181-186. https://doi.org/10.1109/ICTS.2015.7379895.

M. S. Divya and S. K. Goyal, “ElasticSearch: An advanced and quick search technique to handle voluminous data,” Compusoft, vol. 2, no. 6, p. 171, 2013.

B. Dixit, Mastering Elasticsearch 5.x, Packt Publishing Ltd, 2017.

Downloads

Published

2022-09-30

How to Cite

Gutiérrez, S. L., & Pérez Vera, Y. (2022). A Cloud Pub/Sub Architecture to Integrate Google Big Query with Elasticsearch using Cloud Functions. International Journal of Computing, 21(3), 369-376. https://doi.org/10.47839/ijc.21.3.2694

Issue

Section

Articles