Skip to main content

Multidimensional Information System Metadata Description Using the “Data Vault” Methodology

  • Conference paper
  • First Online:
Distributed Computer and Communication Networks (DCCN 2022)

Abstract

The task of determining the metadata of a multidimensional information system corresponds to the description of the parameters of cells that contain information about the facts that are included in the multidimensional data cube. Classification schemes can be used when constructing metadata. The classification scheme corresponds to certain structural component of the observed phenomenon. The cell parameters are presented in the classification scheme in a hierarchical form and are combined in metadata when connecting several classification schemes. To construct a hierarchy of elements of the classification scheme, it is necessary to identify groups of members for which there is a semantic connection with groups of members of other dimensions. Cartesian product can be applied to groups of members. As a result, clusters of members’ combinations will be formed in the metadata. The complete metadata structure can be achieved by combining all clusters. In case of a large amount of aspects of analysis, a multidimensional data cube has specific properties related to sparsity. The use of classification schemes makes it possible to identify parts in the metadata that correspond to individual structural components of the observed phenomenon. If a multidimensional data cube is constructed in the process of automated data collection, the “Data vault” methodology can be used to describe the metadata. This method allows you to reflect the relationships between business objects in the metadata.

This paper has been supported by the RUDN University Strategic Academic Leadership Program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abello, A., et al.: Using semantic web technologies for exploratory OLAP: a survey. IEEE Trans. Knowl. Data Eng. 27(2), 571–588 (2015). https://doi.org/10.1109/TKDE.2014.2330822

    Article  Google Scholar 

  2. de Castro Lima, J., Hirata, C.M.: Multidimensional cyclic graph approach: representing a data cube without common sub-graphs. Inf. Sci. 181(13), 2626–2655 (2011). https://doi.org/10.1016/j.ins.2010.05.012

    Article  Google Scholar 

  3. Chen, C., Feng, J., Xiang, L.: Computation of sparse data cubes with constraints. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 14–23. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45228-7_3

    Chapter  Google Scholar 

  4. Chun, S.-J.: Partial prefix sum method for large data warehouses. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 473–477. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39592-8_67

    Chapter  Google Scholar 

  5. Cuzzocrea, A.: OLAP data cube compression techniques: a ten-year-long history. In: Kim, Th., Lee, Yh., Kang, BH., Slezak, D. (eds.) FGIT 2010. LNCS, vol. 6485, pp. 751–754. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17569-5_74

  6. Fomin, M.: Cluster method of description of information system data model based on multidimensional approach. In: Vishnevskiy, V.M., Samouylov, K.E., Kozyrev, D.V. (eds.) DCCN 2016. CCIS, vol. 678, pp. 657–668. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-51917-3_56

    Chapter  Google Scholar 

  7. Fomin, M.: The application of classification schemes while describing metadata of the multidimensional information system based on the cluster method. In: Vishnevskiy, V.M., Samouylov, K.E., Kozyrev, D.V. (eds.) DCCN 2017. CCIS, vol. 700, pp. 307–318. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66836-9_26

    Chapter  Google Scholar 

  8. Fu, L.: Efficient evaluation of sparse data cubes. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 336–345. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27772-9_34

    Chapter  Google Scholar 

  9. Gautam, V., Parimala, N.: E-metadata versioning system for data warehouse schema. Int. J. Metadata Semant. Ontol. 7(2), 101–113 (2012). https://doi.org/10.1504/IJMSO.2012.050015

    Article  Google Scholar 

  10. Giebler, C., Gröger, C., Hoos, E., Schwarz, H., Mitschang, B.: Modeling data lakes with data vault: practical experiences, assessment, and lessons learned. In: Laender, A.H.F., Pernici, B., Lim, E.-P., de Oliveira, J.P.M. (eds.) ER 2019. LNCS, vol. 11788, pp. 63–77. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33223-5_7

    Chapter  Google Scholar 

  11. Goil, S., Choudhary, A.: Design and implementation of a scalable parallel system for multidimensional analysis and OLAP. In: Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP 1999), pp. 576–581 (1999). https://doi.org/10.1109/IPPS.1999.760535

  12. Gómez, L.I., Gómez, S.A., Vaisman, A.A.: A generic data model and query language for spatiotemporal OLAP cube analysis. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 300–311. Association for Computing Machinery, New York, USA (2012). https://doi.org/10.1145/2247596.2247632

  13. Inmon, W., Linstedt, D., Levins, M.: Introduction to data vault architecture. In: Inmon, W., Linstedt, D., Levins, M. (eds.) Data Architecture, 2nd edn., pp. 157–162. Academic Press (2019). https://doi.org/10.1016/B978-0-12-816916-2.00020-6

  14. Jin, R., Vaidyanathan, J., Yang, G., Agrawal, G.: Communication and memory optimal parallel data cube construction. IEEE Trans. Parallel Distrib. Syst. 16, 1105–1119 (2005)

    Article  Google Scholar 

  15. Karayannidis, N., Sellis, T., Kouvaras, Y.: CUBE file: a file structure for hierarchically clustered OLAP cubes. In: Bertino, E., et al. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 621–638. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24741-8_36

    Chapter  Google Scholar 

  16. Leonhardi, B., Mitschang, B., Pulido, R., Sieb, C., Wurst, M.: Augmenting OLAP exploration with dynamic advanced analytics. In: Proceedings of the 13th International Conference on Extending Database Technology (EDBT 2010), pp. 687–692. Association for Computing Machinery, New York, USA (2010). https://doi.org/10.1145/1739041.1739127

  17. Luo, Z.W., Ling, T.W., Ang, C.H., Lee, S.Y., Cui, B.: Range top/bottom k queries in OLAP sparse data cubes. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 678–687. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44759-8_66

    Chapter  Google Scholar 

  18. Messaoud, R., Boussaid, O., Loudcher, S.: A multiple correspondence analysis to organize data cubes. Front. Artif. Intell. Appl. 155, 133–146 (2007)

    Google Scholar 

  19. Puonti, M., Raitalaakso, T., Aho, T., Mikkonen, T.: Automating transformations in data vault data warehouse loads. Front. Artif. Intell. Appl. 292, 215–230 (2017). https://doi.org/10.3233/978-1-61499-720-7-215

    Article  Google Scholar 

  20. Salmam, F.Z., Fakir, M., Errattahi, R.: Prediction in OLAP data cubes. J. Inf. Knowl. Manag. 15(02), 449–458 (2016). https://doi.org/10.1142/S0219649216500222

    Article  Google Scholar 

  21. Schneider, S., Frosch-Wilke, D.: Analysis patterns in dimensional data modeling. In: Kannan, R., Andres, F. (eds.) ICDEM 2010. LNCS, vol. 6411, pp. 109–116. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27872-3_17

    Chapter  Google Scholar 

  22. Thomsen, E.: OLAP Solution: Building Multidimensional Information System. Willey, New York (2002)

    Google Scholar 

  23. Tsai, M.-F., Chu, W.: A multidimensional aggregation object (MAO) framework for computing distributive aggregations. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 45–54. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45228-7_6

    Chapter  Google Scholar 

  24. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), pp. 193–204. Association for Computing Machinery, New York, USA (1999). https://doi.org/10.1145/304182.304199

  25. Wang, W., Lu, H., Feng, J., Yu, J.: Condensed cube: an effective approach to reducing data cube size. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pp. 155–165. IEEE Computer Society, Washington (2002)

    Google Scholar 

  26. Yessad, L., Labiod, A.: Comparative study of data warehouses modeling approaches: Inmon, Kimball and data vault. In: 2016 International Conference on System Reliability and Science (ICSRS), pp. 95–99 (2016). https://doi.org/10.1109/ICSRS.2016.7815845

  27. Zhang, R., Pan, D.: Metadata management based on lifecycle for DW 2.0. In: 2010 8th World Congress on Intelligent Control and Automation, pp. 5154–5157 (2010). https://doi.org/10.1109/WCICA.2010.5554915

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxim Fomin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fomin, M. (2023). Multidimensional Information System Metadata Description Using the “Data Vault” Methodology. In: Vishnevskiy, V.M., Samouylov, K.E., Kozyrev, D.V. (eds) Distributed Computer and Communication Networks. DCCN 2022. Communications in Computer and Information Science, vol 1748. Springer, Cham. https://doi.org/10.1007/978-3-031-30648-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30648-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30647-1

  • Online ISBN: 978-3-031-30648-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics