Abstract
The Portuguese General Directorate for Book, Archives and Libraries (DGLAB) has selected CIDOC CRM as the basis for its next-generation digital archive management software. Given the ontological foundations of the Conceptual Reference Model (CRM), a graph database or a triplestore was seen as the best candidate to represent a CRM-based data model for the new software. We thus decided to compare several of these databases, based on their maturity, features, performance in standard tasks and, most importantly, the Object-Graph Mappers (OGM) available to interact with each database in an object-oriented way. Our conclusions are drawn not only from a systematic review of related works but from an experimental scenario. For our experiment, we designed a simple CRM-compliant graph designed to test the ability of each OGM/database combination to tackle the so-called “diamond-problem” in Object-Oriented Programming (OOP) to ensure that property instances follow domain and range constraints.
Our results show that (1) ontological consistency enforcement in graph databases and triplestores is much harder to achieve than in a relational database, making them more suited to an analytical rather than a transactional role; (2) OGMs are still rather immature solutions; and (3) neomodel, an OGM for the Neo4j graph database, is the most mature solution in the study as it satisfies all requirements, although it is also the least performing.
- [1] . 2013. Type of NOSQL databases and its comparison with relational databases. International Journal of Applied Information Systems 5, January 2013 (2013), 16–19.Google Scholar
- [2] . 2012. A comparison of current graph database models. In Proceedings of the IEEE 28th International Conference on Data Engineering Workshops (ICDEW’12). IEEE, 171–177. Google ScholarDigital Library
- [3] . 2020. ArangoDB. ArangoDB. Retrieved February 28, 2022 from https://www.arangodb.com/.Google Scholar
- [4] . 2007. A versioning and evolution framework for RDF knowledge bases. In Perspectives of Systems Informatics, and (Eds.). Springer, Berlin,55–69.Google ScholarDigital Library
- [5] . 2001. The semantic web. Scientific American 284, 5 (2001), 34–43.Google ScholarCross Ref
- [6] . 2003. Inferencing and truth maintenance in RDF schema. In PSSS1 - Practical and Scalable Semantic Systems, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, Sanibel Island, Florida, October 20, 2003(
CEUR Workshop Proceedings , Vol. 89), , , and (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-89/broekstra-et-al.pdf.Google Scholar - [7] . 2017. Comparing relational and ontological triple stores in healthcare domain. Entropy 19, 1 (2017), 30.Google ScholarCross Ref
- [8] . 2007. Version control for RDF triple stores. In Proceedings of the 2nd International Conference on Software and Data Technologies (ICSOFT’07) ISDM, WSEHS (2007), 5–12.Google Scholar
- [9] . 2017. Shapes Constraint Language (SHACL). World Wide Web Consortium. Retrieved February 28, 2022 from https://www.w3.org/TR/shacl/.Google Scholar
- [10] . 2020. Large Triple Stores. World Wide Web Consortium. Retrieved February 28, 2022 from https://www.w3.org/wiki/LargeTripleStores.Google Scholar
- [11] . 2016. An open-source object-graph-mapping framework for Neo4j and Scala: Renesca. In Availability, Reliability, and Security in Information Systems, , , , , and (Eds.). Springer International Publishing, Cham, 204–218.Google Scholar
- [12] . 2016. An open-source object-graph-mapping framework for Neo4j and Scala: Renesca. In Availability, Reliability, and Security in Information Systems - IFIP WG 8.4, 8.9, TC 5 International Cross-Domain Conference, CD-ARES 2016, and Workshop on Privacy Aware Machine Learning for Health Data Science, PAML 2016, Salzburg, Austria, August 31 - September 2, 2016, Proceedings(
Lecture Notes in Computer Science , Vol. 9817), , , , , and (Eds.). Springer, 204–218. Google ScholarCross Ref - [13] . 2011. A discussion on the design of graph database benchmarks. In Performance Evaluation, Measurement and Characterization of Complex Systems, and (Eds.). Springer, Berlin, 25–40.Google ScholarDigital Library
- [14] . 2011. Metamodeling semantics of multiple inheritance. Science of Computer Programming 76, 7 (2011), 555–586.Google ScholarDigital Library
- [15] . 2018. Graph databases comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4j, and OrientDB. In Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA’18), Porto, Portugal, July 26-28, 2018, , and (Eds.). SciTePress, 373–380. Google ScholarDigital Library
- [16] . 2018. Graph databases comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4j, and OrientDB. In Proceedings of the 7th International Conference on Data Science, Technology and Applications (Porto, Portugal) (
DATA’18 ). SciTePress —Science and Technology Publications, Lda, Setubal, PRT, 373–380. Google ScholarDigital Library - [17] . 2015. RDF constraint checking. CEUR Workshop Proceedings 1330 (2015), 205–212.Google Scholar
- [18] . 1997. Call graph construction in object-oriented languages. Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) 32, 10 (1997), 108–124. Google ScholarDigital Library
- [19] . 2011. The OWL API: A Java API for OWL ontologies. Semantic Web 2, 1 (2011), 11–21. Google ScholarCross Ref
- [20] . 2011. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment 4, 11 (
Aug. 2011), 1123–1134. Google ScholarDigital Library - [21] . 2016. WarSampo data service and semantic portal for publishing linked open data about the Second World War history. In European Semantic Web Conference. Springer, 758–773.Google ScholarDigital Library
- [22] Manish Jain and Dgraph Labs. 2020. Dgraph: Synchronously replicated, transactional and distributed graph database.Version: 0.8 Retrieved on February 23, 2020 from https://dogy.io/wp-content/uploads/2021/04/dgraph.pdf.Google Scholar
- [23] . 2012. A survey and comparison of relational and non-relational database. International Journal of Engineering Research & Technology 1, 6 (2012), 1–5.Google Scholar
- [24] . 2013. An empirical comparison of graph databases. In International Conference on Social Computing (SocialCom’13), SocialCom/PASSAT/BigData/EconCom/BioMedCom 2013, Washington, DC, September 8-14, 2013. IEEE,708–715. Google ScholarDigital Library
- [25] . 2019. Knowledge graph implementation of archival descriptions through CIDOC-CRM. In Digital Libraries for Open Knowledge, , , , , and (Eds.). Springer International Publishing, Cham, 99–106. Google ScholarDigital Library
- [26] . 2020. ArchOnto, a CIDOC-CRM-based linked data model for the Portuguese archives. In Proceedings of the 24th International Conference on Theory and Practice of Digital Libraries, , , , and (Eds.). Springer International Publishing, Cham, 133–146. Google ScholarDigital Library
- [27] . 2011. On the elasticity of NoSQL databases over cloud management platforms. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2385–2388.Google ScholarDigital Library
- [28] . 2010. The CIDOC conceptual reference model (CIDOC-CRM): PRIMER. CIDOC-CRM Official Web Site 53 (2010), 333–338. http://www.cidoc-crm.org/.Google Scholar
- [29] . 2010. Will NoSQL databases live up to their promise? Computer 43, 2 (2010), 12–14.Google ScholarDigital Library
- [30] . 2014. Consistency evaluation of RDF data: How data and updates are relevant. In 10th International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2014, Marrakech, Morocco, November 23-27, 2014, , , and (Eds.). IEEE, 187–193. Google ScholarDigital Library
- [31] . 2014. A performance evaluation of open source graph databases. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA 14), Orlando, Florida, February 16, 2014, , , and (Eds.). ACM, 11–18. Google ScholarDigital Library
- [32] . 2015. Ontology consistency and instance checking for real world linked data. In Proceedings of the 2nd Workshop on Linked Data Quality co-located with 12th Extended Semantic Web Conference (ESWC’15), Portorož, Slovenia, June 1, 2015(
CEUR Workshop Proceedings , Vol. 1376), , , , and (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-1376/LDQ2015_paper_03.pdf.Google Scholar - [33] . 2013. Graph database applications and concepts with Neo4j. In Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, Vol. 2324. https://aisel.aisnet.org/sais2013/24/?utm_source=aisel.aisnet.org%2Fsais2013%2F24&utm_medium=PDF&utm_campaign=PDFCoverPages.Google Scholar
- [34] . 2013. NoSQL database: New era of databases for big data analytics — classification, characteristics and comparison. CoRR abs/1307.0191. (2013).
arxiv:1307.0191 . http://arxiv.org/abs/1307.0191.Google Scholar - [35] . 2020. Rdf Triple Stores vs. Labeled Property Graphs: What’s the Difference? Retrieved March 1, 2022 from https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-difference/.Google Scholar
- [36] . 2014. Relational vs. NoSQL databases : A survey. International Journal of Computer and Information Technology 03, 03 (2014), 2279–2764.Google Scholar
- [37] . 2002. Tracking changes in RDF(S) repositories. In Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, and (Eds.). Springer, Berlin, 373–378.Google ScholarDigital Library
- [38] . 2018. Reshaping the knowledge graph by connecting researchers, data and practices in researchspace. In The Semantic Web – ISWC 2018, , , , , , , , and (Eds.). Springer International Publishing, Cham, 325–340.Google ScholarDigital Library
- [39] . 2018. Orango. Retrieved March 1, 2022 from https://orango.js.org/.Google Scholar
- [40] . 2013. Graph Databases. O’Reilly Media, Inc.Google ScholarDigital Library
- [41] . 2015. A flexible framework for understanding the dynamics of evolving RDF datasets. In The Semantic Web - ISWC 2015, , , , , , , , , , , , and (Eds.). Springer International Publishing, Cham, 495–512.Google Scholar
- [42] . 2011. BIG DATA ANALYTICS - TDWI BEST PRACTICES REPORT introduction to big data analytics. TDWI Best Practices Report, Fourth Quarter 19, 4 (2011), 1–34. Retrieved March 1, 2022 from https://vivomente.com/wp-content/uploads/2016/04/big-data-analytics-white-paper.pdf.Google Scholar
- [43] . 2006. Business Intelligence: Tecnologias da Informação na Gestão de Conhecimento. FCA-Editora de Informática, Lda.Google Scholar
- [44] . 2019. A schema-first formalism for labeled property graph databases: Enabling structured data loading and analytics. In Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (Auckland, New Zealand) (
BDCAT’19 ). ACM, New York, NY, 71–80. Google ScholarDigital Library - [45] . 2007. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics 5, 2 (2007), 51–53. Google ScholarDigital Library
- [46] . 2006. FaCT++ description logic reasoner: System description. In Automated Reasoning, and (Eds.). Springer, Berlin, 292–297.Google Scholar
- [47] . 2014. Learning Neo4j. Packt Publishing Ltd.Google Scholar
- [48] . 2017. Enhancing CIDOC-CRM and compatible models with the concept of multiple interpretation. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2 (2017), 287–294.Google ScholarCross Ref
- [49] . 2010. A comparison of a graph database and a relational database: A data provenance perspective. In Proceedings of the 48th Annual Southeast Regional Conference, 2010, Oxford, MS, April 15-17, 2010, , , and (Eds.). ACM, 42. Google ScholarDigital Library
Index Terms
- An Evaluation of Graph Databases and Object-Graph Mappers in CIDOC CRM-Compliant Digital Archives
Recommendations
Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB
DATA 2018: Proceedings of the 7th International Conference on Data Science, Technology and ApplicationsGraph databases are a very powerful solution for storing and searching for data designed for data rich in relationships, such as Facebook and Twitter. With data multiplication and data type diversity there has been a need to create new storage and ...
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Numerous irregular graph datasets, for example social networks or web graphs, may contain even trillions of edges. Often, their structure changes over time and they have domain-specific rich data associated with vertices and edges. Graph database systems ...
Experimental Comparison of Graph Databases
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesIn the recent years a new type of NoSQL databases, called graph databases (GDBs), has gained significant popularity due to the increasing need of processing and storing data in the form of a graph. The objective of this paper is a research on ...
Comments