skip to main content
10.1145/2566486.2568002acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Test-driven evaluation of linked data quality

Published:07 April 2014Publication History

ABSTRACT

Linked Open Data (LOD) comprises an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with Linked Open Vocabularies (LOV). One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics.

References

  1. S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? extracting semantics from wiki content. In Proceedings of the ESWC (2007), volume 4519 of Lecture Notes in Computer Science, pages 503--517, Berlin / Heidelberg, 2007. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bizer and R. Cyganiak. Quality-driven information filtering using the WIQA policy framework. Web Semantics, 7(1):1 -- 10, Jan 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Buhmann and J. Lehmann. Universal OWL axiom enrichment for large knowledge bases. In Proceedings of EKAW 2012, pages 57--71. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Buhmann and J. Lehmann. Pattern based knowledge base enrichment. In 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Demter, S. Auer, M. Martin, and J. Lehmann. LODStats -- an extensible framework for high-performance dataset analytics. In Proceedings of the EKAW 2012, Lecture Notes in Computer Science (LNCS) 7603. Springer, 2012. 29 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Deutsch. Fol modeling of integrity constraints (dependencies). In L. LIU and M. OZSU, editors, Encyclopedia of Database Systems, pages 1155--1161. Springer US, 2009.Google ScholarGoogle Scholar
  8. W. Fan. Dependencies revisited for improving data quality. In Proceedings of the Twenty-seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '08, pages 159--170, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Flemming. Quality characteristics of linked data publishing datasources. Master's thesis, Humboldt-Universitat of Berlin, 2010.Google ScholarGoogle Scholar
  10. C. Furber and M. Hepp. Using semantic web resources for data quality management. In P. Cimiano and H. Pinto, editors, Knowledge Engineering and Management by the Masses, volume 6317 of Lecture Notes in Computer Science, pages 211--225. Springer Berlin Heidelberg, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Furber and M. Hepp. Using SPARQL and SPIN for data quality management on the semantic web. In W. Abramowicz and R. Tolksdorf, editors, BIS, volume 47 of Lecture Notes in Business Information Processing, pages 35--46. Springer, 2010.Google ScholarGoogle Scholar
  12. C. Guéret, P. T. Groth, C. Stadler, and J. Lehmann. Assessing linked data mappings using network measures. In Proceedings of the 9th Extended Semantic Web Conference, volume 7295 of Lecture Notes in Computer Science, pages 87--102. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Hellmann, J. Lehmann, S. Auer, and M. Brummer. Integrating nlp using linked data. In 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Hogan, A. Harth, A. Passant, S. Decker, and A. Polleres. Weaving the pedantic web. In LDOW, 2010.Google ScholarGoogle Scholar
  15. Q. Ji, P. Haase, G. Qi, P. Hitzler, and S. Stadtmuller. Radon - repair and diagnosis in ontology networks. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. Hyvonen, R. Mizoguchi, E. Oren, M. Sabou, and E. P. B. Simperl, editors, ESWC, volume 5554 of Lecture Notes in Computer Science, pages 863--867. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Juran. Quality Control Handbook. McGraw-Hill, 4th edition, August 1988.Google ScholarGoogle Scholar
  17. H. Knublauch, J. A. Hendler, and K. Idehen. SPIN - overview and motivation. W3C Member Submission, W3C, February 2011.Google ScholarGoogle Scholar
  18. D. Kontokostas, C. Bratsas, S. Auer, S. Hellmann, I. Antoniou, and G. Metakides. Internationalization of linked data: The case of the greek dbpedia edition. Web Semantics: Science, Services and Agents on the World Wide Web, 15(0):51 -- 61, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Lausen, M. Meier, and M. Schmidt. SPARQLing constraints for RDF. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '08, pages 499--509, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Lehmann, C. Bizer, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - a crystallization point for the web of data. Journal of Web Semantics, 7(3):154--165, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal, 2014.Google ScholarGoogle Scholar
  22. P. N. Mendes, H. Muhleisen, and C. Bizer. Sieve: linked data quality assessment and fusion. In D. Srivastava and I. Ari, editors, EDBT/ICDT Workshops, pages 116--123. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Rieß, N. Heino, S. Tramp, and S. Auer. EvoPat -- Pattern-Based Evolution and Refactoring of RDF Knowledge Bases. In Proceedings of the 9th International Semantic Web Conference (ISWC2010), Lecture Notes in Computer Science, Berlin / Heidelberg, 2010. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Sirin and J. Tao. Towards integrity constraints in owl. In Proceedings of the Workshop on OWL: Experiences and Directions, OWLED, 2009.Google ScholarGoogle Scholar
  25. C. Stadler, J. Lehmann, K. Hoffner, and S. Auer. Linkedgeodata: A core for a web of spatial open data. Semantic Web Journal, 3(4):333--354, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Suominen and E. Hyvonen. Improving the quality of SKOS vocabularies with skosify. In Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management, EKAW'12, pages 383--397, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Zaveri, D. Kontokostas, M. A. Sherif, L. Buhmann, M. Morsey, S. Auer, and J. Lehmann. User-driven quality evaluation of DBpedia. In Proceedings of 9th International Conference on Semantic Systems, I-SEMANTICS '13, Graz, Austria, September 4-6, 2013. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Zhu, P. A. V. Hall, and J. H. R. May. Software unit test coverage and adequacy. ACM Comput. Surv., 29(4):366--427, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Test-driven evaluation of linked data quality

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '14: Proceedings of the 23rd international conference on World wide web
          April 2014
          926 pages
          ISBN:9781450327442
          DOI:10.1145/2566486

          Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 April 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader