Article

Free Access

Notions of correctness when evaluating protein name taggers

Authors:
Fredrik Olsson

Swedish Institute of Computer Science, Kista, Sweden

Swedish Institute of Computer Science, Kista, Sweden
View Profile

,
Gunnar Eriksson

Swedish Institute of Computer Science, Kista, Sweden

Swedish Institute of Computer Science, Kista, Sweden
View Profile

,
Kristofer Franzén

Swedish Institute of Computer Science, Kista, Sweden

Swedish Institute of Computer Science, Kista, Sweden
View Profile

,
Lars Asker

Virtual Genetics Laboratory AB, Stockholm, Sweden

Virtual Genetics Laboratory AB, Stockholm, Sweden
View Profile

,
Per Lidén

Virtual Genetics Laboratory AB, Stockholm, Sweden

Virtual Genetics Laboratory AB, Stockholm, Sweden
View Profile

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1August 2002Pages 1–7https://doi.org/10.3115/1072228.1072338

Published:24 August 2002Publication History

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

Pages 1–7

ABSTRACT

This paper introduces four different notions of correctness to be used when measuring the performance of protein name taggers, each of which reflects certain characteristics of the tagger under evaluation. The discussion regarding the different notions is centered around the evaluation of two protein name taggers; Yapex, developed by the authors, and KeX developed by Fukuda et al. (1998). For the purpose of illustrating the difference between the ways of evaluation, both taggers are applied to a test corpus of 101 MEDLINE abstracts in which all occurrences of protein names have been marked up by domain experts.

References

Amos Bairoch and Rolf Apweiler. 2000. The swiss-prot protein sequence database and its supplement trembl in 2000. Nucl. Acids. Res., 28:45--48.Google ScholarCross Ref
Christian Blaschke, Miguel A. Andrade, Christos Ouzounis, and Alfonso Valencia. 1999. Automatic extraction of biological information from scientific text: protein---protein interactions. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB'99), pages 60--67, Heidelberg, Germany, August 6--10. Google ScholarDigital Library
Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Nyu: Description of the mene named entity system as used in muc-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, VA, USA, April 29 - May 1.Google Scholar
Nigel Collier, Hyun Seok Park, Norihiro Ogata, Yuka Tateishi, Chikashi Nobata, Tomoko Ohta, Tateshi Sekimizu, Hisao Imai, Katsutoshi Ibushi, and Jun-ichi Tsujii. 1999. The genia project: corpus-based knowledge acquisition and information extraction from genome research papers. In Proceedings of the European Association for Computational Linguistics (EACL) conference. Google ScholarDigital Library
Nigel Collier, Chikashi Nobata, and Jun-ichi Tsujii. 2000. Extracting the names of genes and gene products with a hidden markov model. In Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000), pages 201--207, August. Google ScholarDigital Library
Berry de Bruijn and Joel Martin. 2000. Protein name tagging. Presented as a poster at the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB'00).Google Scholar
Ken-ichiro Fukuda, Tatsuhiko Tsunoda, Ayuchi Tamura, and Toshihisa Takagi. 1998. Toward information extraction: Identifying protein names from biological papers. In Proceedings of the Pacific Symposium on Biocomputing (PSB'98), pages 705--716, Maui, Hawaii, January 4--9.Google Scholar
Kevin Humphreys, George Demetriou, and Robert Gaizauskas. 2000. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of the 5th Pacific Symposium of Biocomputing, pages 72--80.Google Scholar
Chikashi Nobata, Nigel Collier, and Jun-ichi Tsujii. 1999. Automatic term identification and classification in biology texts. In Proceedings of the Natural Language Pacific Rim Symposium (NLPRS'2000), pages 369--374, November.Google Scholar
Pasi Tapanainen and Timo Järvinen. 1997. A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 64--71, Washington D.C., April. Association for Computational Linguistics. Google ScholarDigital Library
James Thomas, David Milward, Chirtos Ouzounis, Stephen Pulman, and Mark Carroll. 2000. Automatic extraction of protein interactions from scientific abstracts. In Proceedings of the Pacific Symposium on Bio-computing (PSB 2000), pages 538--549, Oahu, Hawaii, January 4--9.Google Scholar

Recommendations

Two learning approaches for protein name extraction

Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted ...
Read More
Evaluating author name disambiguation for digital libraries: a case of DBLP

Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name disambiguation in DBLP, a widely used but insufficiently evaluated digital library for its ...
Read More
Protein name tagging guidelines: lessons learned: Conference Papers

Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1
August 2002
1184 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 24 August 2002
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 221
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Notions of correctness when evaluating protein name taggers

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Two learning approaches for protein name extraction

Evaluating author name disambiguation for digital libraries: a case of DBLP

Protein name tagging guidelines: lessons learned: Conference Papers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Notions of correctness when evaluating protein name taggers

COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Two learning approaches for protein name extraction

Evaluating author name disambiguation for digital libraries: a case of DBLP

Protein name tagging guidelines: lessons learned: Conference Papers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media