Skip to main content

Data Preparation for Advanced Data Analysis on Elastic Stack

  • Conference paper
  • First Online:
Biologically Inspired Cognitive Architectures 2023 (BICA 2023)

Abstract

This paper presents approaches for preparing different types of data to be loaded into the document-oriented NoSQL Elasticsearch database. The considered database allows not only to store data, but also provides an opportunity to use Kibana, data visualization utility, which is a powerful tool for data analysis. The task of preprocessing is essential, because well-prepared data not only allows you to increase the accuracy of the analysis, but also expand its capabilities. For more coverage, the approaches are described with the use of real cases that have been solved by analysts. The paper presents methodological and practical ways to solve problems both by transforming the data and adding new fields, and by correctly mapping for Elasticsearch indexes. For a clear demonstration of the approaches, their practical application is given on the example of two datasets with bibliographic information on papers and information on funding of scientific and technical projects. The demonstration shows the difference between initial and enriched data, as well as the charts built by working with the data, which enables advanced data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bajer, M.: Building an IoT Data Hub with Elasticsearch, Logstash and Kibana. In: 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp. 63–68. IEEE (2017)

    Google Scholar 

  2. Talas, A., Pop, F., Neagu, G.: Elastic stack in action for smart cities: making sense of big data. In: 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 469–476. IEEE (2017)

    Google Scholar 

  3. Shah, N., Willick, D., Mago, V.: A framework for social media data analytics using Elasticsearch and Kibana. Wireless Netw. 28(3), 1179–1187 (2018)

    Article  Google Scholar 

  4. Lahmadi, F. Beck, Finickel, E., Festor, O.: A platform for the analysis and visualization of network flow data of android environments. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 1129–1130. IEEE (2015)

    Google Scholar 

  5. Barakhnin, V., Kozhemyakina, O., Mukhamedyev, R., Borzilova, Y., Yakunin, K.: The design of the structure of the software system for processing text document corpus. Bus. Inform. 13(4), 60–72 (2019)

    Article  Google Scholar 

  6. Zamfir, V.-A., Carabas, M., Carabas, C., Tapus, N: Systems monitoring and big data analysis using the Elasticsearch system. In: 2019 22nd International Conference on Control Systems and Computer Science (CSCS). IEEE (2019)

    Google Scholar 

  7. Haugerud, H., Sobhie, M., Yazidi, A.: Tuning of elasticsearch configuration: parameter optimization through simultaneous perturbation stochastic approximation. Front. Big Data 5, 686416 (2022)

    Article  Google Scholar 

  8. Ngo, T.T.T., Sarramia, D., Kang, M.-A., Pinet, F.: A new approach based on ELK stack for the analysis and visualisation of geo-referenced sensor data. SN Comput. Sci. 4(3), 241 (2023)

    Article  Google Scholar 

  9. Hunter, T.: Advanced Microservices: A Hands-on Approach to Microservice Infrastructure and Tooling. Apress, Berkely, CA, USA (2017)

    Google Scholar 

  10. Elastic: https://www.elastic.co/guide/en/elasticsearch/reference/8.7/mapping.html. Last accessed: 24 Apr 2023

  11. Walter-Tscharf, F.F.W.V.: Indexing, clustering, and search engine for documents utilizing Elasticsearch and Kibana. In: Mobile Computing and Sustainable Informatics, pp. 897–910 (2022)

    Google Scholar 

  12. Rosenberg, J., Coronel, J.B., Meiring, J., Gray, S., Brown, T.: Leveraging Elasticsearch to improve data discoverability in science gateways. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), vol. 19, pp. 1–5. ACM (2019)

    Google Scholar 

  13. Kim, K.-J., Cho, Y.-B.: Improving elasticsearch for Chinese, Japanese, and Korean text search through language detector. J. Inform. Commun. Converg. Eng. 18(1), 33–38 (2020)

    Google Scholar 

  14. Scopus Homepage. https://www.scopus.com. Last accessed 16 May 2023

  15. NIH RePORTER Homepage. https://reporter.nih.gov. Last accessed 16 May 2023

  16. Agarwal, V.: Research on data preprocessing and categorization technique for smartphone review analysis. Int. J. Comput. Appl. 131(4), 30–36 (2015)

    MathSciNet  Google Scholar 

  17. García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Analytics 1(1), 9 (2016)

    Article  Google Scholar 

  18. Fan, C., Chen, M., Wang, X., Wang, J., Huang, B.: A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data. Front. Energy Res. 9, 652801 (2021)

    Article  Google Scholar 

  19. Al-Jabery, K.K., Obafemi-Ajayi, T., Olbricht, G.R., Wunsch, D.C., II: Data preprocessing. In: Computational Learning Approaches to Data Analytics in Biomedical Applications, pp. 7–27 (2020)

    Google Scholar 

  20. Uematsu, H., Nguyen, P., Takeda, H.: Design for data structures: data unification and federation with Wikibase. In: 2022 IEEE International Conference on Big Data, pp. 6169–6178. IEEE (2022)

    Google Scholar 

Download references

Acknowledgements

The study was supported by the Russian Science Foundation grant No. 23-75-30012, https://rscf.ru/project/23-75-30012/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. S. Ulizko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ulizko, M.S., Tukumbetova, R.R., Artamonov, A.A., Antonov, E.V., Ionkina, K.V. (2024). Data Preparation for Advanced Data Analysis on Elastic Stack. In: Samsonovich, A.V., Liu, T. (eds) Biologically Inspired Cognitive Architectures 2023. BICA 2023. Studies in Computational Intelligence, vol 1130. Springer, Cham. https://doi.org/10.1007/978-3-031-50381-8_96

Download citation

Publish with us

Policies and ethics