skip to main content
10.1145/2736277.2741135acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Incorporating Social Context and Domain Knowledge for Entity Recognition

Published:18 May 2015Publication History

ABSTRACT

Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances?

In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution.

We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; $p l 1e-5 with t-test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.

References

  1. D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In ICML'09, pages 25--32, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Bai, F. P. Junqueira, and S. H. Sengamedu. Exploiting user clicks for automatic seed set generation for entity matching. In KDD'13, pages 980--988, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Bellare, S. Iyengar, A. G. Parameswaran, and V. Rastogi. Active sampling for entity matching. In KDD'12, pages 1131--1139, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1):1--36, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR'2004, pages 25--32, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Buntine and A. Jakulin. Applying discrete pca in data analysis. In UAI'04, pages 59--66, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Chiticariu, R. Krishnamurthy, Y. Li, F. Reiss, and S. Vaithyanathan. Domain adaptation of rule-based annotators for named-entity recognition tasks. In EMNLP'10, pages 1002--1012, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Collins. Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In ACL'02, pages 489--496, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dean, G. Schreiber, S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein. Owl web ontology language reference. w3c recommendation., Feb. 2004.Google ScholarGoogle Scholar
  11. S. Y. Dennis. On the hyper-dirichlet type 1 and hyper-liouville distributions. Communications in Statistics - Theory and Methods, 20:4069--4081, 1991.Google ScholarGoogle Scholar
  12. A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In UAI'00, pages 176--183, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL'05, pages 363--370, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Heinrich. Parameter estimation for text analysis. Technical report, University of Leipzig, Germany, 2004.Google ScholarGoogle Scholar
  15. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Hu, J. Boyd-Graber, and B. Satinoff. Interactive topic modeling. In HLT'11, pages 248--257, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Huang, Z. Wen, D. Yu, H. Ji, Y. Sun, J. Han, and H. Li. Resolving entity morphs in censored data. In ACL'13, pages 1083--1093, 2013.Google ScholarGoogle Scholar
  18. S. Kataria, K. S. Kumar, R. Rastogi, P. Sen, and S. H. Sengamedu. Entity disambiguation with hierarchical topic models. In KDD'11, pages 1037--1045, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML'01, pages 282--289, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. Twiner: Named entity recognition in targeted twitter stream. In SIGIR'12, pages 721--730, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Li, J. Tang, Y. Li, and Q. Luo. Rimom: A dynamic multi-strategy ontology alignment framework. IEEE TKDE, 21(8):1218--1232, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining evidences for named entity disambiguation. In KDD'13, pages 1070--1078, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, ACL '11, pages 359--367, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google ScholarGoogle Scholar
  25. D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30:3--26, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  26. D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In EMNLP '09, pages 248--256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets: An experimental study. In EMNLP'11, pages 1524--1534, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI'04, pages 487--494, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In KDD'13, pages 68--76, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD'04, pages 306--315, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y.-C. Tam and T. Schultz. Correlated latent semantic model for unsupervised lm adaptation. In ICASSP'07, volume 4, pages IV--41--IV--44, 2007.Google ScholarGoogle Scholar
  32. J. Tang, A. Fong, B. Wang, and J. Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE TKDE, 24(6):975--987, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD'09, pages 807--816, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain collaboration recommendation. In KDD'12, pages 1285--1294, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271--289, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  37. O. Uzuner, Y. Juo, and P. Szolovits. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550--563, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  38. S. Wu, Z. Fang, and J. Tang. Accurate product name recognition from user generated content. In ICDM 2012 Contest, pages 874--877, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incorporating Social Context and Domain Knowledge for Entity Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '15: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1460 pages
      ISBN:9781450334693

      Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 18 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader