Abstract
As a relatively new concept, data lake has neither a standard definition nor an acknowledged architecture. Thus, we study the existing work and propose a complete definition and a generic and extensible architecture of data lake. What’s more, we introduce three future research axes in connection with our health-care Information Technology (IT) activities. They are related to (i) metadata management that consists of intra- and inter-metadata, (ii) a unified ecosystem for companies’ data warehouses and data lakes and (iii) data lake governance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alserafi, A., Abelló, A., Romero, O., Calders, T.: Towards information profiling: data lake content metadata management. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 178–185. IEEE (2016)
Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Towards intelligent data analysis: the metadata challenge. In: Proceedings of the International Conference on Internet of Things and Big Data, Rome, Italy, pp. 331–338 (2016)
Campbell, C.: Top five differences between data lakes and data warehouse, January 2015. https://www.blue-granite.com/blog/bid/402596/top-five-differences-between-data-lakes-and-data-warehouses
Dixon, J.: Pentaho, Hadoop, and data lakes, October 2010. https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
Fang, H.: Managing data lakes in big data era: what’s a data lake and why has it became popular in data management ecosystem. In: 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 820–824. IEEE (2015)
Foshay, N., Mukherjee, A., Taylor, A.: Does data warehouse end-user metadata add value? Commun. ACM 50(11), 70–77 (2007)
Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016)
Haines, R.: What is just enough governance for the data lake?, February 2015. https://infocus.dellemc.com/rachel---haines/just--enough--governance--data--lake/
Halevy, A.Y., et al.: Managing google’s data lake: an overview of the goods system. IEEE Data Eng. Bull. 39(3), 5–14 (2016)
Inmon, B.: Data Lake Architecture: Designing the Data Lake and avoiding the garbage dump. Technics publications (2016)
Kaluba, K.: Data lake governance - do you need it?, March 2018. https://blogs.sas.com/content/datamanagement/2018/03/27/data-lake-governance/
Khatri, V., Brown, C.V.: Designing data governance. Commun. ACM 53(1), 148 (2010). https://doi.org/10.1145/1629175.1629210. http://portal.acm.org/citation.cfm?doid=1629175.1629210
Kwon, O., Lee, N., Shin, B.: Data quality management, data usage experience and acquisition intention of big data analytics. Int. J. Inf. Manag. 34(3), 387–394 (2014)
LaPlante, A., Sharma, B.: Architecting Data Lakes. O’Reilly Media, Sebastopol (2014)
Llave, M.R.: Data lakes in business intelligence: reporting from the trenches. Procedia Comput. Sci. 138, 516–524 (2018)
Lopez Pino, J.L.: Metadata in business intelligence, January 2014. https://www.slideshare.net/jlpino/metadata-in-business-intelligence
Maccioni, A., Torlone, R.: Crossing the finish line faster when paddling the data lake with kayak. Proc. VLDB Endow. 10(12), 1853–1856 (2017)
Madera, C., Laurent, A.: The next information architecture evolution: the data lake wave. In: Proceedings of the 8th International Conference on Management of Digital EcoSystems, pp. 174–180. ACM (2016)
Menon, P.: Demystifying data lake architecture, July 2017. https://medium.com/@rpradeepmenon/demystifying-data-lake-architecture-30cf4ac8aa07
Merino, J., Caballero, I., Rivas, B., Serrano, M., Piattini, M.: A data quality in use model for big data. Future Gener. Comput. Syst. 63, 123–130 (2016)
Miloslavskaya, N., Tolstoy, A.: Big data, fast data and data lake concepts. Procedia Comput. Sci. 88, 300–305 (2016)
Nadipalli, R.: Effective Business Intelligence with QuickSight. Packt Publishing Ltd., Birmingham (2017)
O’Leary, D.E.: Embedding AI and crowdsourcing in the big data lake. IEEE Intell. Syst. 29(5), 70–73 (2014)
Patel, P., Greg, W., Diaz, A.: Data lake governance best practices, April 2017. https://dzone.com/articles/data-lake-governance-best-practices
Piatetsky-Shapiro, G.: Data lake vs data warehouse: key differences, September 2015. https://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html
Ponniah, P.: Data Warehousing Fundamentals: a Comprehensive Guide for IT Professionals. Wiley, Hoboken (2004)
Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with gemms. Complex Syst. Inf. Model. Q. 9, 67–83 (2016)
Ravat, F., Zhao, Y.: Metadata management for data lakes. In: East European Conference on Advances in Databases and Information Systems. Springer (2019)
Sawadogo, P., Kibata, T., Darmont, J.: Metadata management for textual documents in data lakes. In: 21st International Conference on Enterprise Information Systems (ICEIS 2019) (2019)
Varga, J., Romero, O., Pedersen, T.B., Thomsen, C.: Towards next generation BI systems: the analytical metadata challenge. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 89–101. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6_9
Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167. IEEE (2015)
Weill, P., Ross, J.W.: IT Governance: How Top Performers Manage IT Decision Rights for Superior Results. Harvard Business Press, Boston (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ravat, F., Zhao, Y. (2019). Data Lakes: Trends and Perspectives. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-27615-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)