skip to main content
10.1145/1498759.1498810acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Classifying tags using open content resources

Published:09 February 2009Publication History

ABSTRACT

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.

References

  1. S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? In Proc. of ESWC, pages 503--517, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proc. of EACL, pages 9--16, 2006.Google ScholarGoogle Scholar
  3. D. Buscaldi, P. Rosso, and P. García. Inferring geographic ontologies from multiple resources for geographical information retrieval. In Proc. of the SIGIR workshop on GIR, pages 53--55, 2006.Google ScholarGoogle Scholar
  4. P. Clough, A. Al-Maskari, and K. Darwish. Providing multilingual access to Flickr for arabic users. In Proc. of CLEF, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. of EMNLP-CoNLL, pages 708--716, 2007.Google ScholarGoogle Scholar
  6. DBpedia. http://dbpedia.org/. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  7. Delicious. http://del.icio.us/. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  8. Flickr. http://www.Flickr.com/. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  9. FlickrAPI. http://www.flickr.com/services/api/. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  10. T. Joachims. Making large-scale SVM learning practical. In Advances in Kernal Methods - Support Vector Learning, pages 41--56, 1998.Google ScholarGoogle Scholar
  11. R. Mihalcea. Using wikipedia for automatic word sense disambiguation. In Proc. of NAACL, pages 196--203, 2007.Google ScholarGoogle Scholar
  12. S. Overell and S. Rüger. Geographic co-occurrence as a tool for GIR. In Proc. of the CIKM workshop on GIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In Proc. of SIGIR, pages 103--110, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Proc. of AWIC, pages 380--386, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Schmitz. Inducing an ontology from flickr tags. In Proc. of the Workshop on Collaborative Web Tagging at WWW'06, 2006.Google ScholarGoogle Scholar
  16. B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In Proc. of WWW'08, pages 327--336, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Suchanek, G. Kasneci, and G. Weikem. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proc. of WWW'07, pages 697--706, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. TagExplorer. http://sandbox.yahoo.com/TagExplorer. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  19. G. Weaver, B. Strickland, and G. Crane. Quantifying the accuracy of relational statements in Wikipedia: A methodology. In Proc. of JCDL, pages 358--358, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wikipedia. http://www.wikipedia.org/. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  21. WordNet. http://wordnet.princeton.edu/. Accessed 5 Dec 08.Google ScholarGoogle Scholar
  22. P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In Proc. of ACM CHI, pages 401--408, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. YouTube. http://youtube.com/. Accessed 5 Dec 08.Google ScholarGoogle Scholar

Index Terms

  1. Classifying tags using open content resources

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
      February 2009
      314 pages
      ISBN:9781605583907
      DOI:10.1145/1498759

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 February 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader