ABSTRACT
The Web of Data has emerged as a way of exposing structured linked data on the Web. It builds on the central building blocks of the Web (URIs, HTTP) and benefits from its simplicity and wide-spread adoption. It does, however, also inherit the unresolved issues such as the broken link problem. Broken links constitute a major challenge for actors consuming Linked Data as they require them to deal with reduced accessibility of data. We believe that the broken link problem is a major threat to the whole Web of Data idea and that both Linked Data consumers and providers will require solutions that deal with this problem. Since no general solutions for fixing such links in the Web of Data have emerged, we make three contributions into this direction: first, we provide a concise definition of the broken link problem and a comprehensive analysis of existing approaches. Second, we present DSNotify, a generic framework able to assist human and machine actors in fixing broken links. It uses heuristic feature comparison and employs a time-interval-based blocking technique for the underlying instance matching problem. Third, we derived benchmark datasets from knowledge bases such as DBpedia and evaluated the effectiveness of our approach with respect to the broken link problem. Our results show the feasibility of a time-interval-based blocking approach for systems that aim at detecting and fixing broken links in the Web of Data.
- W. Y. Arms. Uniform resource names: handles, purls, and digital object identifiers. Commun. ACM, 44(5):68, 2001. Google ScholarDigital Library
- H. Ashman. Electronic document addressing: dealing with change. ACM Comput. Surv., 32(3), 2000. Google ScholarDigital Library
- S. Auer, S. Dietzold, J. Lehmann, S. Hellmann, and D. Aumuller. Triplify: light-weight linked data publication from relational databases. In WWW '09, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. Beynon and A. Flegg. Hypertext request integrity and user experience, 2004. US Patent 0267726A1.Google Scholar
- M. Beynon and A. Flegg. Guaranteeing hypertext link integrity, 2007. US Patent 7290131 B2.Google Scholar
- C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3), 2009.Google Scholar
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, July 2009. Google ScholarDigital Library
- H. C. Davis. Referential integrity of links in open hypermedia systems. In Proceedings of the 9th ACM conference on Hypertext and hypermedia, pages 207--216, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. on Knowl. and Data Eng., 19(1):1--16, 2007. Google ScholarDigital Library
- I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64(328):1183--1210, 1969.Google ScholarCross Ref
- A. Ferrara, D. Lorusso, S. Montanelli, and G. Varese. Towards a benchmark for instance matching. In Ontology Matching (OM 2008), volume 431 of CEUR Workshop Proceedings. CEUR-WS.org, 2008.Google Scholar
- B. Haslhofer and N. Popitsch. DSNotify -- detecting and fixing broken links in linked data sets. In 8th International Workshop on Web Semantics (WebS 09), co-located with DEXA 2009, 2009. Google ScholarDigital Library
- M. Hepp, K. Siorpaes, and D. Bachlechner. Harvesting wiki consensus: Using Wikipedia entries as vocabulary for knowledge management. IEEE Internet Computing, 11(5):54--65, 2007. Google ScholarDigital Library
- A. Hogan, A. Harth, and S. Decker. Performing object consolidation on the semantic web data graph. In Proceedings of the 1st I3: Identity, Identifiers, Identification Workshop, 2007.Google Scholar
- D. Ingham, S. Caughey, and M. Little. Fixing the "broken-link" problem: the W3Objects approach. Comput. Netw. ISDN Syst., 28(7-11):1255--1268, 1996. Google ScholarDigital Library
- I. Jacobs and N. Walsh. Architecture of the World Wide Web, volume one. Technical report, W3C, December 2004. Retrieved May 7, 2009.Google Scholar
- F. Kappe. A scalable architecture for maintaining referential integrity in distributed information systems. Journal of Universal Computer Science, 1(2):84--104, 1995.Google Scholar
- S. Lawrence, D. M. Pennock, G. W. Flake, R. Krovetz, F. M. Coetzee, E. Glover, F. A. Nielsen, A. Kruger, and C. L. Giles. Persistence of web references in scientific research. Computer, 34(2):26--31, 2001. Google ScholarDigital Library
- A. Morishima, A. Nakamizo, T. Iida, S. Sugimoto, and H. Kitagawa. Bringing your dead links back to life: a comprehensive approach and lessons learned. In HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermedia, pages 15--24, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- T. A. Phelps and R. Wilensky. Robust hyperlinks cost just five words each. Technical Report UCB/CSD-00-1091, EECS Department, University of California, Berkeley, 2000. Google ScholarDigital Library
- D. S. H. Rosenthal and V. Reich. Permanent web publishing. In ATEC '00: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 129--140. USENIX Association, 2000. Google ScholarDigital Library
- H. Van de Sompel, M. L. Nelson, R. Sanderson, L. L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. Arxiv preprint., arxiv:0911.1112, November 2009.Google Scholar
- L. Veiga and P. Ferreira. Repweb: replicated web with referential integrity. In SAC '03: Proceedings of the 2003 ACM symposium on Applied computing, pages 1206--1211, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov. Discovering and maintaining links on the web of data. In 8th International Semantic Web Conference, 2009. Google ScholarDigital Library
- W. E. Winkler. Overview of record linkage and current research directions. Technical report, U.S. Bureau of the Census, 2006.Google Scholar
Index Terms
- DSNotify: handling broken links in the web of data
Recommendations
Automatic Recovery of Broken Links Using Information Retrieval Techniques
NLPIR '18: Proceedings of the 2nd International Conference on Natural Language Processing and Information RetrievalWorld Wide Web is very dynamic in its nature and we experienced changes in web pages every day. Web pages are updated, deleted, created or moved from one domain to another. Due to its dynamic nature often the web users experience broken links. Internet ...
Data Linking for the Semantic Web
By specifying that published datasets must link to other existing datasets, the 4th linked data principle ensures a Web of data and not just a set of unconnected data islands. The authors propose in this paper the term data linking to name the problem ...
A unified approach to matching semantic data on the Web
In recent years, the Web has evolved from a global information space of linked documents to a space where data are linked as well. The Linking Open Data (LOD) project has enabled a large number of semantic datasets to be published on the Web. Due to the ...
Comments