ABSTRACT
This paper introduces four different notions of correctness to be used when measuring the performance of protein name taggers, each of which reflects certain characteristics of the tagger under evaluation. The discussion regarding the different notions is centered around the evaluation of two protein name taggers; Yapex, developed by the authors, and KeX developed by Fukuda et al. (1998). For the purpose of illustrating the difference between the ways of evaluation, both taggers are applied to a test corpus of 101 MEDLINE abstracts in which all occurrences of protein names have been marked up by domain experts.
- Amos Bairoch and Rolf Apweiler. 2000. The swiss-prot protein sequence database and its supplement trembl in 2000. Nucl. Acids. Res., 28:45--48.Google ScholarCross Ref
- Christian Blaschke, Miguel A. Andrade, Christos Ouzounis, and Alfonso Valencia. 1999. Automatic extraction of biological information from scientific text: protein---protein interactions. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB'99), pages 60--67, Heidelberg, Germany, August 6--10. Google ScholarDigital Library
- Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Nyu: Description of the mene named entity system as used in muc-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, VA, USA, April 29 - May 1.Google Scholar
- Nigel Collier, Hyun Seok Park, Norihiro Ogata, Yuka Tateishi, Chikashi Nobata, Tomoko Ohta, Tateshi Sekimizu, Hisao Imai, Katsutoshi Ibushi, and Jun-ichi Tsujii. 1999. The genia project: corpus-based knowledge acquisition and information extraction from genome research papers. In Proceedings of the European Association for Computational Linguistics (EACL) conference. Google ScholarDigital Library
- Nigel Collier, Chikashi Nobata, and Jun-ichi Tsujii. 2000. Extracting the names of genes and gene products with a hidden markov model. In Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000), pages 201--207, August. Google ScholarDigital Library
- Berry de Bruijn and Joel Martin. 2000. Protein name tagging. Presented as a poster at the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB'00).Google Scholar
- Ken-ichiro Fukuda, Tatsuhiko Tsunoda, Ayuchi Tamura, and Toshihisa Takagi. 1998. Toward information extraction: Identifying protein names from biological papers. In Proceedings of the Pacific Symposium on Biocomputing (PSB'98), pages 705--716, Maui, Hawaii, January 4--9.Google Scholar
- Kevin Humphreys, George Demetriou, and Robert Gaizauskas. 2000. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of the 5th Pacific Symposium of Biocomputing, pages 72--80.Google Scholar
- Chikashi Nobata, Nigel Collier, and Jun-ichi Tsujii. 1999. Automatic term identification and classification in biology texts. In Proceedings of the Natural Language Pacific Rim Symposium (NLPRS'2000), pages 369--374, November.Google Scholar
- Pasi Tapanainen and Timo Järvinen. 1997. A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 64--71, Washington D.C., April. Association for Computational Linguistics. Google ScholarDigital Library
- James Thomas, David Milward, Chirtos Ouzounis, Stephen Pulman, and Mark Carroll. 2000. Automatic extraction of protein interactions from scientific abstracts. In Proceedings of the Pacific Symposium on Bio-computing (PSB 2000), pages 538--549, Oahu, Hawaii, January 4--9.Google Scholar
Recommendations
Two learning approaches for protein name extraction
Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted ...
Evaluating author name disambiguation for digital libraries: a case of DBLP
Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name disambiguation in DBLP, a widely used but insufficiently evaluated digital library for its ...
Protein name tagging guidelines: lessons learned: Conference Papers
Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper ...
Comments