Skip to main content
Log in

Clustering-based disambiguation of fine-grained place names from descriptions

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Everyday place descriptions often contain place names of fine-grained features, such as buildings or businesses, that are more difficult to disambiguate than names referring to larger places, for example cities or natural geographic features. Fine-grained places are often significantly more frequent and more similar to each other, and disambiguation heuristics developed for larger places, such as those based on population or containment relationships, are often not applicable in these cases. In this research, we address the disambiguation of fine-grained place names from everyday place descriptions. For this purpose, we evaluate the performance of different existing clustering-based approaches, since clustering approaches require no more knowledge other than the locations of ambiguous place names. We consider not only approaches developed specifically for place name disambiguation, but also clustering algorithms developed for general data mining that could potentially be leveraged. We compare these methods with a novel algorithm, and show that the novel algorithm outperforms the other algorithms in terms of disambiguation precision and distance error over several tested datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

(Source: https://www.travelandleisure.com/travel-guide/melbourne/things-to-do/federation-square)

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://www.geonames.org/

  2. https://nominatim.openstreetmap.org/

  3. https://developers.google.com/maps/documentation/geocoding/intro

  4. https://www.geonames.org/

References

  1. Adelfio MD, Samet H (2013) Structured toponym resolution using combined hierarchical place categories. In: Proceedings of the 7th workshop on geographic information retrieval, pp 49–56

  2. Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: Proceedings of SIGIR ’04 conference on research and development in information retrieval, pp 273–280

  3. Angiulli F (2006) Clustering by exceptions. In: Proceedings of the national conference on artificial intelligence, pp 312–317

  4. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD conference. Philadelphia, pp 49–60

  5. Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas CTM (eds) Grouping multidimensional data. Springer, Berlin, pp 25–71

  6. Buscaldi D (2011) Approaches to disambiguating toponyms. SIGSPATIAL Special 3(2):16–19

    Article  Google Scholar 

  7. Buscaldi D, Magnini B (2010) Grounding toponyms in an Italian local news corpus. In: Proceedings of the 6th workshop on geographic information retrieval, pp 70–75

  8. Buscaldi D, Rosso P (2008) A conceptual density-based approach for the disambiguation of toponyms. Int J Geogr Inf Sci 22(3):301–313

    Article  Google Scholar 

  9. Buscaldi D, Rosso P (2008) Map-based vs. knowledge-based toponym disambiguation. In: Proceedings of the 2nd international workshop on geographic information retrieval, pp 19–22

  10. Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda HXG (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 160–172

  11. Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332

    Article  Google Scholar 

  12. Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International conference on information and knowledge management, pp 759–768

  13. Derungs C, Palacio D, Purves RS (2012) Resolving fine granularity toponyms: evaluation of a disambiguation approach. In: Proceedings of the 7th international conference on geographic information science, pp 1–5

  14. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining, pp 47–58

  15. Ester M, Kriegel HP, Sander J, Xu X, et al. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). Portland, pp 226–231

  16. Goodchild MF, Hill LL (2008) Introduction to digital gazetteer research. Int J Geograph Inf Sci 22(10):1039–1044. https://doi.org/10.1080/13658810701850497. http://www.tandfonline.com/doi/abs/10.1080/13658810701850497, arXiv:1011.1669v3

    Article  Google Scholar 

  17. Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: ACM Sigmod record, vol 27. ACM, pp 73–84

  18. Habib MB, Keulen MV, van Keulen M (2012) Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proceedings of the international conference on knowledge discovery and information retrieval, KDIR 2012. Barcelona, pp 399–410

  19. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc 28(1):100–108

    Google Scholar 

  20. Hill LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Research and advanced technology for digital libraries. Springer, pp 280–290 https://doi.org/10.1007/3-540-45268-0_26. http://link.springer.com/10.1007/3-540-45268-0_26

  21. Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75

    Article  Google Scholar 

  22. Kim J, Vasardani M, Winter S (2015) Harvesting large corpora for generating place graphs. In: Workshop on cognitive engineering for spatial information processes, COSIT 2015, pp 20–26

  23. Kohonen T (1998) The self-organizing map. Neurocomputing 21(1–3):1–6

    Article  Google Scholar 

  24. Leidner JL (2008) Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. Universal-Publishers

  25. Leidner JL, Sinclair G, Webber B (2003) Grounding spatial named entities for information extraction and question answering. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references, pp 31–38

  26. Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2007) STEWARD: architecture of a spatio-textual search engine. In: Samet H, Shahabi C, Schneider M (eds) Proceedings of the 15th annual ACM international symposium on advances in geographic information systems, Seattle, pp 186–193

  27. Liu F, Vasardani M, Baldwin T (2014) Automatic identification of locative expressions from social media text: a comparative analysis. In: Proceedings of the 4th international workshop on location and the web. ACM, pp 9–16

  28. Moncla L, Renteria-Agualimpia W, Nogueras-iso J, Gaio M (2014) Geocoding for texts with fine-grain toponyms : an experiment on a geoparsed hiking descriptions corpus. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 183–192

  29. Palacio D, Derungs C, Purves R (2015) Development and evaluation of a geographic information retrieval system using fine grained toponyms. J Spat Inf Sci 2015(11):1–29

    Google Scholar 

  30. Ripley BD (1976) The second-order analysis of stationary point processes. J Appl Probab 13(2):255–266

    Article  Google Scholar 

  31. Roberts K, Bejan CA, Harabagiu SM (2010) Toponym disambiguation using events. In: FLAIRS conference, vol 10, p 1

  32. Roller S, Speriosu M, Rallapalli S, Wing B, Baldridge J (2012) Supervised text-based geolocation using language models on an adaptive grid. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1500–1510

  33. Smith DA, Crane G (2001) Disambiguating geographic names in a historical digital library. In: International conference on theory and practice of digital libraries, Springer, pp 127–136

  34. Smith DA, Mann GS (2003) Bootstrapping toponym classifier. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references. Association for Computational Linguistics, pp 45-49

  35. Teitler BE, Lieberman MD, Panozzo D, Sankaranarayanan J, Samet H, Sperling J (2008) NewsStand: a new view on news. In: Aref W G, Mokbel M F, Schneider M (eds) Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 144–153

  36. Vasardani M, Timpf S, Winter S, Tomko M (2013) From descriptions to depictions: a conceptual framework. In: Tenbrink T, Stell J, Galton A, Wood Z (eds) Spatial information theory: 11th international conference COSIT 2013. Springer, pp 299–319

  37. Vasardani M, Winter S, Richter KF (2013b) Locating place names from place descriptions. Int J Geogr Inf Sci 27(12):2509–2532

    Article  Google Scholar 

  38. Wing B, Baldridge J (2014) Hierarchical discriminative classification for text-based geolocation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 336–348

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Vasardani, M. & Winter, S. Clustering-based disambiguation of fine-grained place names from descriptions. Geoinformatica 23, 449–472 (2019). https://doi.org/10.1007/s10707-019-00341-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00341-6

Keywords

Navigation