Skip to main content

Data Lakes: Trends and Perspectives

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Included in the following conference series:

Abstract

As a relatively new concept, data lake has neither a standard definition nor an acknowledged architecture. Thus, we study the existing work and propose a complete definition and a generic and extensible architecture of data lake. What’s more, we introduce three future research axes in connection with our health-care Information Technology (IT) activities. They are related to (i) metadata management that consists of intra- and inter-metadata, (ii) a unified ecosystem for companies’ data warehouses and data lakes and (iii) data lake governance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alserafi, A., Abelló, A., Romero, O., Calders, T.: Towards information profiling: data lake content metadata management. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 178–185. IEEE (2016)

    Google Scholar 

  2. Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Towards intelligent data analysis: the metadata challenge. In: Proceedings of the International Conference on Internet of Things and Big Data, Rome, Italy, pp. 331–338 (2016)

    Google Scholar 

  3. Campbell, C.: Top five differences between data lakes and data warehouse, January 2015. https://www.blue-granite.com/blog/bid/402596/top-five-differences-between-data-lakes-and-data-warehouses

  4. Dixon, J.: Pentaho, Hadoop, and data lakes, October 2010. https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/

  5. Fang, H.: Managing data lakes in big data era: what’s a data lake and why has it became popular in data management ecosystem. In: 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 820–824. IEEE (2015)

    Google Scholar 

  6. Foshay, N., Mukherjee, A., Taylor, A.: Does data warehouse end-user metadata add value? Commun. ACM 50(11), 70–77 (2007)

    Article  Google Scholar 

  7. Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016)

    Google Scholar 

  8. Haines, R.: What is just enough governance for the data lake?, February 2015. https://infocus.dellemc.com/rachel---haines/just--enough--governance--data--lake/

  9. Halevy, A.Y., et al.: Managing google’s data lake: an overview of the goods system. IEEE Data Eng. Bull. 39(3), 5–14 (2016)

    Google Scholar 

  10. Inmon, B.: Data Lake Architecture: Designing the Data Lake and avoiding the garbage dump. Technics publications (2016)

    Google Scholar 

  11. Kaluba, K.: Data lake governance - do you need it?, March 2018. https://blogs.sas.com/content/datamanagement/2018/03/27/data-lake-governance/

  12. Khatri, V., Brown, C.V.: Designing data governance. Commun. ACM 53(1), 148 (2010). https://doi.org/10.1145/1629175.1629210. http://portal.acm.org/citation.cfm?doid=1629175.1629210

    Article  Google Scholar 

  13. Kwon, O., Lee, N., Shin, B.: Data quality management, data usage experience and acquisition intention of big data analytics. Int. J. Inf. Manag. 34(3), 387–394 (2014)

    Article  Google Scholar 

  14. LaPlante, A., Sharma, B.: Architecting Data Lakes. O’Reilly Media, Sebastopol (2014)

    Google Scholar 

  15. Llave, M.R.: Data lakes in business intelligence: reporting from the trenches. Procedia Comput. Sci. 138, 516–524 (2018)

    Article  Google Scholar 

  16. Lopez Pino, J.L.: Metadata in business intelligence, January 2014. https://www.slideshare.net/jlpino/metadata-in-business-intelligence

  17. Maccioni, A., Torlone, R.: Crossing the finish line faster when paddling the data lake with kayak. Proc. VLDB Endow. 10(12), 1853–1856 (2017)

    Article  Google Scholar 

  18. Madera, C., Laurent, A.: The next information architecture evolution: the data lake wave. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, pp. 174–180. ACM (2016)

    Google Scholar 

  19. Menon, P.: Demystifying data lake architecture, July 2017. https://medium.com/@rpradeepmenon/demystifying-data-lake-architecture-30cf4ac8aa07

  20. Merino, J., Caballero, I., Rivas, B., Serrano, M., Piattini, M.: A data quality in use model for big data. Future Gener. Comput. Syst. 63, 123–130 (2016)

    Article  Google Scholar 

  21. Miloslavskaya, N., Tolstoy, A.: Big data, fast data and data lake concepts. Procedia Comput. Sci. 88, 300–305 (2016)

    Article  Google Scholar 

  22. Nadipalli, R.: Effective Business Intelligence with QuickSight. Packt Publishing Ltd., Birmingham (2017)

    Google Scholar 

  23. O’Leary, D.E.: Embedding AI and crowdsourcing in the big data lake. IEEE Intell. Syst. 29(5), 70–73 (2014)

    Article  Google Scholar 

  24. Patel, P., Greg, W., Diaz, A.: Data lake governance best practices, April 2017. https://dzone.com/articles/data-lake-governance-best-practices

  25. Piatetsky-Shapiro, G.: Data lake vs data warehouse: key differences, September 2015. https://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html

  26. Ponniah, P.: Data Warehousing Fundamentals: a Comprehensive Guide for IT Professionals. Wiley, Hoboken (2004)

    Google Scholar 

  27. Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with gemms. Complex Syst. Inf. Model. Q. 9, 67–83 (2016)

    Google Scholar 

  28. Ravat, F., Zhao, Y.: Metadata management for data lakes. In: East European Conference on Advances in Databases and Information Systems. Springer (2019)

    Google Scholar 

  29. Sawadogo, P., Kibata, T., Darmont, J.: Metadata management for textual documents in data lakes. In: 21st International Conference on Enterprise Information Systems (ICEIS 2019) (2019)

    Google Scholar 

  30. Varga, J., Romero, O., Pedersen, T.B., Thomsen, C.: Towards next generation BI systems: the analytical metadata challenge. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 89–101. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6_9

    Chapter  Google Scholar 

  31. Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167. IEEE (2015)

    Google Scholar 

  32. Weill, P., Ross, J.W.: IT Governance: How Top Performers Manage IT Decision Rights for Superior Results. Harvard Business Press, Boston (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ravat, F., Zhao, Y. (2019). Data Lakes: Trends and Perspectives. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27615-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27614-0

  • Online ISBN: 978-3-030-27615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics