Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter (O) January 13, 2023

A concept for providing and utilizing metadata in data analytics applications

Ein Konzept zur Bereitstellung und Verwendung von Metadaten in Datenanalyse-Applikationen
  • Wan Li

    Wan Li, M. Sc. RWTH (born 1990) is a research associate at the Chair of Information and Automation Systems for Process and Material Technology at RWTH Aachen University since 2019. His research focuses on data contextualization, semantic annotation and metadata-based data mining with emphasis on applications in industrial areas.

    EMAIL logo
    and Tobias Kleinert

    Prof. Dr.-Ing. Tobias Kleinert (born 1971) graduated in Mechanical Engineering in 1999 at RWTH Aachen University and completed his PhD in 2005 at the Chair of Automation and Computer Control of Prof. Jan Lunze at Ruhr-Universität Bochum. His career led him to BASF SE where he worked in the areas of Advanced Process Control, Production Technology Propylene Oxide, Regulated Automation Solutions, Digital Control Systems, Manufacturing Execution Solutions and Smart Manufacturing. As senior manager automation and digitalization he had assignments at the BASF sites in Ludwigshafen/D, Antwerp/B and Schwarzheide/D. Since 2020, he leads the Chair of Information and Automation Systems for Process and Material Technology at RWTH Aachen University with a focus on information processing, automation and digitalization.

Abstract

Providing data for data analysis projects is one core task of automation technology, however, it still has to be done with a lot of manual effort. One challenge is to keep the meaning of data remain interpretable within or across multiple software environments so that provider and user of data share a common understanding of the transferred data. It is acknowledged that machine interpretable metadata is one crucial building block for reaching this goal. However, in industrial automation and information systems today, exporting and utilizing data coupled with metadata is still not a common practice. Therefore, we propose a general concept for extracting metadata and utilizing it in data analytics applications, which may help with system design in the future. The concept is prototypically implemented regarding the structural metadata for tabular data.

Zusammenfassung

Die Bereitstellung von Daten für Datenanalyseprojekte ist eine Kernaufgabe der Automatisierungstechnik, die jedoch immer noch mit viel manuellem Aufwand verbunden ist. Eine Herausforderung besteht darin, die Bedeutung der Daten innerhalb oder über mehrere Softwareumgebungen hinweg interpretierbar zu halten, so dass Anbieter und Nutzer von Daten ein gemeinsames Verständnis der übertragenen Daten haben. Es ist anerkannt, dass maschineninterpretierbare Metadaten ein entscheidender Baustein zur Erreichung dieses Ziels sind. In industriellen Automatisierungs-und Informationssystemen sind der Export und die Nutzung von Daten, die mit Metadaten gekoppelt sind, heute jedoch noch nicht üblich. Daher schlagen wir ein allgemeines Konzept zur Extraktion von Metadaten und deren Nutzung in Datenanalyse-Applikationen vor, das in Zukunft bei der Systemgestaltung helfen kann. Das Konzept wird prototypisch für strukturelle Metadaten für tabellarische Daten umgesetzt.


Corresponding author: Wan Li, Chair of Information and Automation Systems for Process and Material Technology, RWTH Aachen University, Turmstr. 46, 52064 Aachen, Germany, E-mail:

About the authors

Wan Li

Wan Li, M. Sc. RWTH (born 1990) is a research associate at the Chair of Information and Automation Systems for Process and Material Technology at RWTH Aachen University since 2019. His research focuses on data contextualization, semantic annotation and metadata-based data mining with emphasis on applications in industrial areas.

Tobias Kleinert

Prof. Dr.-Ing. Tobias Kleinert (born 1971) graduated in Mechanical Engineering in 1999 at RWTH Aachen University and completed his PhD in 2005 at the Chair of Automation and Computer Control of Prof. Jan Lunze at Ruhr-Universität Bochum. His career led him to BASF SE where he worked in the areas of Advanced Process Control, Production Technology Propylene Oxide, Regulated Automation Solutions, Digital Control Systems, Manufacturing Execution Solutions and Smart Manufacturing. As senior manager automation and digitalization he had assignments at the BASF sites in Ludwigshafen/D, Antwerp/B and Schwarzheide/D. Since 2020, he leads the Chair of Information and Automation Systems for Process and Material Technology at RWTH Aachen University with a focus on information processing, automation and digitalization.

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

[1] T. Bauernhansl, “Industrie 4.0 in Produktion, Automatisierung und Logistik,” in Anwendung, Technologien und Migration, Wiesbaden, Springer Vieweg, 2014.10.1007/978-3-658-04682-8Search in Google Scholar

[2] D. Fasel and A. Meier, Big Data. Grundlagen, Systeme und Nutzungspotenziale, Wiesbaden, Springer Fachmedien Wiesbaden, 2016.10.1007/978-3-658-11589-0Search in Google Scholar

[3] B. Vogel-Heuser, T. Bauernhansl, and M. t. Hompel, Handbuch Industrie 4.0 Bd.4. Allgemeine Grundlagen, 2nd ed. Heidelberg, Springer Vieweg Berlin, 2017.10.1007/978-3-662-53254-6Search in Google Scholar

[4] J. Han, M. Kamber, and J. Pei, Data Mining - Concepts and Techniques, 3rd ed. Waltham, MA, Morgan Kaufmann, 2012.Search in Google Scholar

[5] B. Hjørland, “Data (with big data and database semantics),” Knowl. Organ., vol. 45, no. 8, pp. 685–708, 2018, https://doi.org/10.5771/0943-7444-2018-8-685.Search in Google Scholar

[6] C. Shearer, “The CRISP-DM model: the new blueprint for data mining,” J. Data Warehous., vol. 5, no. 4, pp. 13–22, 2018.Search in Google Scholar

[7] NIST Big Data Public Working Group, NIST Big Data Interoperability Framework: Volume 7, Standards Roadmap, Gaithersburg, MD, National Institute of Standards and Technology, 2019.Search in Google Scholar

[8] NIST Big Data Public Working Group, NIST Big Data Interoperability Framework: Volume 1, Definitions, Gaithersburg, MD, National Institute of Standards and Technology, 2019.Search in Google Scholar

[9] P. Vincent, K. Iijima, M. Driver, J. Wong, and Y. Natis, Magic Quadrant for Enterprise Low-Code Application Platforms, Stamford, Connecticut, Gartner, 2019.Search in Google Scholar

[10] M. Allen and D. Cervo, “Metadata management,” in Multi-Domain Master Data Management, Waltham, MA, Morgan Kaufmann, 2015, pp. 161–178.10.1016/B978-0-12-800835-5.00010-5Search in Google Scholar

[11] J. Han, M. Kamber, and J. Pei, “Data warehousing and online analytical processing,” in Data mining - concepts and techniques, 3rd ed. Waltham, MA, Morgan Kaufmann, 2012.10.1016/B978-0-12-381479-1.00004-6Search in Google Scholar

[12] D. Loshin, “Metadata,” in Business Intelligence, 2nd ed. Waltham, MA, Morgan Kaufmann, 2013, pp. 119–130.10.1016/B978-0-12-385889-4.00009-0Search in Google Scholar

[13] J. Melton and S. Buxton, “Metadata – an overview,” in Querying XML, San Francisco, CA, Morgan Kaufmann, 2006, pp. 67–84.10.1016/B978-155860711-8/50005-8Search in Google Scholar

[14] R. Pollock, J. Tennison, G. Kellogg, and I. Herman, Metadata Vocabulary for Tabular Data, W3C, 2015, [Online]. Available at: https://www.w3.org/TR/tabular-metadata/.Search in Google Scholar

[15] A. B. Zhang and D. Gourley, “Metadata strategy,” in Creating Digital Collections, Station Lane, Witney, Chandos Publishing, 2009, pp. 31–53.10.1016/B978-1-84334-396-7.50004-3Search in Google Scholar

[16] Y. Gil, J. Cheney, P. Groth, et al.., Provenance XG Final Report, [Online], W3C, 2010. Available at: http://www.w3.org/2005/Incubator/prov/XGR-prov/.Search in Google Scholar

[17] C. Quix, R. Hai, and I. Vatov, “Metadata extraction and management in data lakes with GEMMS,” Complex Syst. Inf. Model. Q., vol. 9, pp. 67–83, 2016, https://doi.org/10.7250/csimq.2016-9.04.Search in Google Scholar

[18] T. Haase, R. Glück, P. Kaufmann, and M. Willmeroth, Shepard - Storage for Heterogeneous Product and Research Data, DLR, 2021, [Online]. Available at: https://zenodo.org/record/5091604 [accessed: Jul. 22, 2022].Search in Google Scholar

[19] E. Kandogan, M. Roth, P. Schwarz, et al.., “LabBook: metadata-driven social collaborative data analysis,” in IEEE International Conference, 2015.10.1109/BigData.2015.7363784Search in Google Scholar

[20] J. M. Hellerstein, V. Sreekanti, J. E. Gonzalez, et al.., “Ground: a data context service,” in CIDR 2017, 2017.Search in Google Scholar

[21] S. Kruse, D. Hahn, M. Walter, and F. Naumann, “Metacrate: organize and analyze millions of data profiles,” in Proceedings of the 2017 ACM, 2017.10.1145/3132847.3133180Search in Google Scholar

[22] Matlab help center, Tables, MathWorks. [Online]. Available at: https://www.mathworks.com/help/matlab/tables.html [accessed: Aug. 08, 2022].Search in Google Scholar

[23] Pandas, API, Input/output, Pandas. [Online]. Available: https://pandas.pydata.org/docs/reference/io.html [accessed: Aug. 08, 2022].Search in Google Scholar

[24] J. Tennison, G. Kellogg, and I. Herman, Model for Tabular Data and Metadata on the Web, W3C, 2015, [Online]. Available at: https://www.w3.org/TR/tabular-data-model/.Search in Google Scholar

[25] L. Visengeriyeva and Z. Abedjan, “Anatomy of metadata for data curation,” J. Data Inf. Q., vol. 12, no. 3, pp. 16–30, 2020, https://doi.org/10.1145/3371925.Search in Google Scholar

[26] J. Vanschoren, “Meta-learning,” in Automated Machine Learning, Cham, Springer, 2019.10.1007/978-3-030-05318-5_2Search in Google Scholar

[27] eclass. [Online]. Available at: http://www.eclass.eu/ [accessed: Feb. 02, 2022].Search in Google Scholar

[28] IEC 61987 - Industrial Process Measurement and Control - Data Structures and Elements in Process Equipment Catalogues, IEC, 2016.Search in Google Scholar

[29] IEC 61360-4 - IEC/SC 3D - Common Data Dictionary (CDD—V2.0015.0004). [Online]. Available at: https://cdd.iec.ch/ [accessed: Feb. 11, 2022].Search in Google Scholar

[30] ISO/IEC 9075-11:2016 - Information Technology - Database Languages - SQL - Part 11: Information and Definition Schemas (SQL/Schemata), 4th ed. Vernier, Geneva, Switzerland, International Standard ISO/IEC, 2016.Search in Google Scholar

[31] SQLite Documentation, The Schema, Table. [Online]. Available: https://www.sqlite.org/schematab.html [accessed: Apr. 16, 2022].Search in Google Scholar

[32] C. Winter and T. Lownds, PEP 3107 – Function Annotations, 2006, [Online]. Available at: https://peps.python.org/pep-3107 [accessed: Jul. 04, 2022].Search in Google Scholar

[33] CSVW.org, “CSV on the Web”. [Online]. Available at: https://csvw.org/ [accessed: Aug. 20, 2022].Search in Google Scholar

[34] S. Bader, E. Barnstedt, H. Bedenbender, et al.., “Details of the asset administration shell: Part 1: the exchange of information between partners in the value chain of industrie 4.0 (version 3.0RC01),” in Federal Ministry for Economic Affairs and Energy, Berlin, BMWi, 2020.Search in Google Scholar

[35] W. Li, M. Winter, and T. Kleinert, “Structure Graph of Production: a concept for process data integration and analysis with application example,” in AUTOMATION 2022: 23, Baden, Leitkongress Mess- u. Automatisierungstechnik, 2022.10.11128/arep.17.a17093Search in Google Scholar

Received: 2022-08-31
Revised: 2022-10-21
Accepted: 2022-10-24
Published Online: 2023-01-13
Published in Print: 2023-01-27

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 14.5.2024 from https://www.degruyter.com/document/doi/10.1515/auto-2022-0107/html
Scroll to top button