skip to main content
10.1145/1772690.1772768acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

DSNotify: handling broken links in the web of data

Published:26 April 2010Publication History

ABSTRACT

The Web of Data has emerged as a way of exposing structured linked data on the Web. It builds on the central building blocks of the Web (URIs, HTTP) and benefits from its simplicity and wide-spread adoption. It does, however, also inherit the unresolved issues such as the broken link problem. Broken links constitute a major challenge for actors consuming Linked Data as they require them to deal with reduced accessibility of data. We believe that the broken link problem is a major threat to the whole Web of Data idea and that both Linked Data consumers and providers will require solutions that deal with this problem. Since no general solutions for fixing such links in the Web of Data have emerged, we make three contributions into this direction: first, we provide a concise definition of the broken link problem and a comprehensive analysis of existing approaches. Second, we present DSNotify, a generic framework able to assist human and machine actors in fixing broken links. It uses heuristic feature comparison and employs a time-interval-based blocking technique for the underlying instance matching problem. Third, we derived benchmark datasets from knowledge bases such as DBpedia and evaluated the effectiveness of our approach with respect to the broken link problem. Our results show the feasibility of a time-interval-based blocking approach for systems that aim at detecting and fixing broken links in the Web of Data.

References

  1. W. Y. Arms. Uniform resource names: handles, purls, and digital object identifiers. Commun. ACM, 44(5):68, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Ashman. Electronic document addressing: dealing with change. ACM Comput. Surv., 32(3), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Auer, S. Dietzold, J. Lehmann, S. Hellmann, and D. Aumuller. Triplify: light-weight linked data publication from relational databases. In WWW '09, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Beynon and A. Flegg. Hypertext request integrity and user experience, 2004. US Patent 0267726A1.Google ScholarGoogle Scholar
  5. M. Beynon and A. Flegg. Guaranteeing hypertext link integrity, 2007. US Patent 7290131 B2.Google ScholarGoogle Scholar
  6. C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3), 2009.Google ScholarGoogle Scholar
  7. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. C. Davis. Referential integrity of links in open hypermedia systems. In Proceedings of the 9th ACM conference on Hypertext and hypermedia, pages 207--216, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. on Knowl. and Data Eng., 19(1):1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64(328):1183--1210, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Ferrara, D. Lorusso, S. Montanelli, and G. Varese. Towards a benchmark for instance matching. In Ontology Matching (OM 2008), volume 431 of CEUR Workshop Proceedings. CEUR-WS.org, 2008.Google ScholarGoogle Scholar
  12. B. Haslhofer and N. Popitsch. DSNotify -- detecting and fixing broken links in linked data sets. In 8th International Workshop on Web Semantics (WebS 09), co-located with DEXA 2009, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hepp, K. Siorpaes, and D. Bachlechner. Harvesting wiki consensus: Using Wikipedia entries as vocabulary for knowledge management. IEEE Internet Computing, 11(5):54--65, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Hogan, A. Harth, and S. Decker. Performing object consolidation on the semantic web data graph. In Proceedings of the 1st I3: Identity, Identifiers, Identification Workshop, 2007.Google ScholarGoogle Scholar
  15. D. Ingham, S. Caughey, and M. Little. Fixing the "broken-link" problem: the W3Objects approach. Comput. Netw. ISDN Syst., 28(7-11):1255--1268, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. Jacobs and N. Walsh. Architecture of the World Wide Web, volume one. Technical report, W3C, December 2004. Retrieved May 7, 2009.Google ScholarGoogle Scholar
  17. F. Kappe. A scalable architecture for maintaining referential integrity in distributed information systems. Journal of Universal Computer Science, 1(2):84--104, 1995.Google ScholarGoogle Scholar
  18. S. Lawrence, D. M. Pennock, G. W. Flake, R. Krovetz, F. M. Coetzee, E. Glover, F. A. Nielsen, A. Kruger, and C. L. Giles. Persistence of web references in scientific research. Computer, 34(2):26--31, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Morishima, A. Nakamizo, T. Iida, S. Sugimoto, and H. Kitagawa. Bringing your dead links back to life: a comprehensive approach and lessons learned. In HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermedia, pages 15--24, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. A. Phelps and R. Wilensky. Robust hyperlinks cost just five words each. Technical Report UCB/CSD-00-1091, EECS Department, University of California, Berkeley, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. S. H. Rosenthal and V. Reich. Permanent web publishing. In ATEC '00: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 129--140. USENIX Association, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Van de Sompel, M. L. Nelson, R. Sanderson, L. L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. Arxiv preprint., arxiv:0911.1112, November 2009.Google ScholarGoogle Scholar
  23. L. Veiga and P. Ferreira. Repweb: replicated web with referential integrity. In SAC '03: Proceedings of the 2003 ACM symposium on Applied computing, pages 1206--1211, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov. Discovering and maintaining links on the web of data. In 8th International Semantic Web Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. E. Winkler. Overview of record linkage and current research directions. Technical report, U.S. Bureau of the Census, 2006.Google ScholarGoogle Scholar

Index Terms

  1. DSNotify: handling broken links in the web of data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub