ABSTRACT
Place name disambiguation is an important task for improving the accuracy of geographic information retrieval. This task becomes more challenging when the input texts are short. Wikipedia provides information about places and has often been employed for named entity recognition. However, the natural language representation of Wikipedia articles limits more effective use of this rich knowledge base. DBpedia is the Semantic Web version of Wikipedia, which provides structured and machine-understandable knowledge mined from Wikipedia articles. This paper presents an approach for combining Wikipedia and DBpedia to disambiguate place names in short texts. We discuss the pros and cons of the two knowledge bases, and argue that a combination of both performs better than each of them alone. We evaluate our proposed method by conducting experiments against baselines of three established methods. The result indicates that our method has a generally higher precision and recall. While our study employs DBpedia, the proposed method is generic and can be extended to other structured Linked Datasets such as Freebase or Wikidata.
- M. Andrea Rodriguez and M. J. Egenhofer. Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. International Journal of Geographical Information Science, 18(3):229--256, 2004.Google ScholarCross Ref
- M. Bazire and P. Brézillon. Understanding context before using it. In Modeling and using context, pages 29--40. Springer, 2005. Google ScholarDigital Library
- R. C. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, volume 6, pages 9--16, 2006.Google Scholar
- D. Buscaldi and P. Rosso. A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographical Information Science, 22(3):301--313, 2008. Google ScholarDigital Library
- D. Buscaldi, P. Rosso, and E. S. Arnal. Using the wordnet ontology in the geoclef geographical information retrieval task. Springer, 2006.Google ScholarDigital Library
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, volume 7, pages 708--716. Citeseer, 2007.Google Scholar
- A. Fader, S. Soderland, O. Etzioni, and T. Center. Scaling wikipedia-based named entity disambiguation to arbitrary web text. In Proceedings of the IJCAI Workshop on User-contributed Knowledge and Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA, pages 21--26, 2009.Google Scholar
- J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 782--792. Association for Computational Linguistics, 2011. Google ScholarDigital Library
- K. Janowicz. Kinds of contexts and their impact on semantic similarity measurement. In Pervasive Computing and Communications, 2008. PerCom 2008. Sixth Annual IEEE International Conference on, pages 441--446. IEEE, 2008. Google ScholarDigital Library
- C. B. Jones and R. S. Purves. Geographical information retrieval. International Journal of Geographical Information Science, 22(3):219--228, 2008. Google ScholarDigital Library
- J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, S. Auer, et al. Dbpedia--a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 2014.Google Scholar
- J. L. Leidner. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding of place names. Universal-Publishers, 2008.Google Scholar
- P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, pages 1--8. ACM, 2011. Google ScholarDigital Library
- R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 233--242. ACM, 2007. Google ScholarDigital Library
- D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 509--518. ACM, 2008. Google ScholarDigital Library
- H. T. Nguyen and T. H. Cao. Named entity disambiguation on an ontology enriched by wikipedia. In Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on, pages 247--254. IEEE, 2008.Google Scholar
- S. Overell and S. Rüger. Using co-occurrence models for placename disambiguation. International Journal of Geographical Information Science, 22(3):265--287, 2008. Google ScholarDigital Library
- D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In Research and Advanced Technology for Digital Libraries, pages 127--136. Springer, 2001. Google ScholarCross Ref
- R. Volz, J. Kleb, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In I3, 2007.Google Scholar
Index Terms
- Improving wikipedia-based place name disambiguation in short texts using structured data from DBpedia
Recommendations
Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling
Knowledge Engineering and Knowledge ManagementAbstractPlace name disambiguation is the task of correctly identifying a place from a set of places sharing a common name. It contributes to tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. ...
On assigning place names to geography related web pages
JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital librariesIn this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Comments