Topic-Oriented Words as Features for Named Entity Recognition

Zhang, Ziqi; Cohn, Trevor; Ciravegna, Fabio

doi:10.1007/978-3-642-37247-6_25

Topic-Oriented Words as Features for Named Entity Recognition

Ziqi Zhang¹⁷,
Trevor Cohn¹⁷ &
Fabio Ciravegna¹⁷

Conference paper

2294 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

Research has shown that topic-oriented words are often related to named entities and can be used for Named Entity Recognition. Many have proposed to measure topicality of words in terms of ‘informativeness’ based on global distributional characteristics of words in a corpus. However, this study shows that there can be large discrepancy between informativeness and topicality; empirically, informativeness based features can damage learning accuracy of NER. This paper proposes to measure words’ topicality based on local distributional features specific to individual documents, and proposes methods to transform topicality into gazetteer-like features for NER by binning. Evaluated using five datasets from three domains, the methods have shown consistent improvement over a baseline by between 0.9 and 4.0 in F-measure, and always outperformed methods that use informativeness measures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmed, K., Gillam, L., Tostevin, L.: University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER). In: The 8th Text Retrieval Conference, TREC-8 (1999)
Google Scholar
Chang, J., Schütze, H., Altman, R.: GAPSCORE: finding gene and protein names one word at a time. Bioinformatics 20(2), 216–225 (2004)
Article Google Scholar
Church, K., Gale, W.: Inverse Document Frequency (IDF): A Measure of Deviation from Poisson. In: Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, Massachusetts, USA, pp. 121–130 (1995a)
Google Scholar
Church, K., Gale, W.: Poisson mixtures. Natural Language Engineering 1(2), 163–190 (1995b)
Article Google Scholar
Clifton, C., Cooley, R., Rennie, J.: TopCat: Data Mining for Topic Identification in a Text Corpus. In: Proceedings of the 3rd European Conference of Principles and Practice of Knowledge Discovery in Databases, pp. 949–964 (1999)
Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Extracting the Names of Genes and Gene Products with a Hidden Markov Model. In: Proceedings of COLING 2000, pp. 201–207 (2000)
Google Scholar
Dagan, I., Church, K.: Termight: Identify-ing and Translating Technical Terminology. In: Proceedings of EACL, pp. 34–40 (1994)
Google Scholar
Downey, D., Broadhead, M., Etzioni, O.: Locating Complex Named Entities in Web Text. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007)
Google Scholar
Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A brief history. In: Proceedings of the 16th International Conference on Computational Linguistics (1996)
Google Scholar
Gupta, S., Bhattacharyya, P.: Think Globally, Apply Locally: Using Distributional Characteristics for Hindi Named Entity Identification. In: Proceedings of the 2010 Named Entities Workshop, ACL 2010, pp. 116–125 (2010)
Google Scholar
Harter, S.: A probabilistic approach to automatic keyword indexing: Part I. On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science 26(4), 197–206 (1975)
Article Google Scholar
Hassel, M.: Exploitation of Named Entities in Automatic Text Summarization for Swedish. In: Proceedings of the 14th Nordic Conference on Computational Linguistics (2003)
Google Scholar
Jones, K.: Index term weighting. Information Storage and Retrieval 9(11), 619–633 (1973)
Article Google Scholar
Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y.: Introduction to the Bio-Entity Recognition Task at JNLPBA. In: Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (2004)
Google Scholar
Mizzaro, S.: Relevance: The Whole History. Journal of the American Society for Information Science 48(9), 810–832 (1997)
Article Google Scholar
Morgan, A., Hirschman, L., Yeh, A., Colosimo, M.: Gene Name Extraction Using FlyBase Resources. In: ACL 2003 Workshop on Language Processing in Biomedicine, Sapporo, Japan, pp. 1–8 (2003)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Rennie, J., Jaakkola, T.: Using Term Informativeness for Named Entity Detection. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2005)
Google Scholar
Saha, S., Sarkar, S., Mitra, P.: Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics 42(5), 905–911 (2009)
Article Google Scholar
Silva, J., Kozareva, Z., Noncheva, V., Lopes, G.: Extracting Named Entities: A Statistical Approach. In: Proceeding of TALN (2004)
Google Scholar
Tjong, E., Sang, K., Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147 (2003)
Google Scholar
Wan, X., Zhong, L., Huang, X., Ma, T., Jia, H., Wu, Y., Xiao, J.: Named Entity Recognition in Chinese News Comments on the Web. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 856–864 (2011)
Google Scholar
Zhang, L., Pan, Y., Zhang, T.: Focused Named Entity Recognition using Machine Learning. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
Google Scholar
Zhang, Z., Iria, J.: A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In: Proceedings of the ACL 2009 Workshop on Collaboratively Constructed Semantic Resources (2009)
Google Scholar
Zhang, Z., Iria, J., Ciravegna, F.: Improving Domain-specific Entity Recognition with Automatic Term Recognition and Feature Extraction. In: Proceedings of LREC 2010, Malta (May 2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, 211 Portobello, Regent Court, Sheffield, UK, S1 4DP
Ziqi Zhang, Trevor Cohn & Fabio Ciravegna

Authors

Ziqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Ciravegna
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Cohn, T., Ciravegna, F. (2013). Topic-Oriented Words as Features for Named Entity Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics