research-article

Free Access

Disambiguation of biomedical abbreviations

Authors:
Mark Stevenson

University of Sheffield, Sheffield, United Kingdom

University of Sheffield, Sheffield, United Kingdom
View Profile

,
Yikun Guo

University of Sheffield, Sheffield, United Kingdom

University of Sheffield, Sheffield, United Kingdom
View Profile

,
Abdulaziz Al Amri

University of Sheffield, Sheffield, United Kingdom

University of Sheffield, Sheffield, United Kingdom
View Profile

,
Robert Gaizauskas

University of Sheffield, Sheffield, United Kingdom

University of Sheffield, Sheffield, United Kingdom
View Profile

Authors Info & Claims

BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language ProcessingJune 2009Pages 71–79

Published:04 June 2009Publication History

BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

Pages 71–79

ABSTRACT

Abbreviations are common in biomedical documents and many are ambiguous in the sense that they have several potential expansions. Identifying the correct expansion is necessary for language understanding and important for applications such as document retrieval. Identifying the correct expansion can be viewed as a Word Sense Disambiguation (WSD) problem. A WSD system that uses a variety of knowledge sources, including two types of information specific to the biomedical domain, is also described. This system was tested on a corpus of ambiguous abbreviations, created by automatically identifying the correct expansion in Medline abstracts, and found to identify the correct expansion with up to 99% accuracy.

References

E. Adar. 2004. SaRAD: A simple and robust abbreviation dictionary. Bioinformatics, 20(4):527--533. Google ScholarDigital Library
E. Agirre and D. Martínez. 2004. The Basque Country University system: English and Basque tasks. In Rada Mihalcea and Phil Edmonds, editors, Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 44--48, Barcelona, Spain, July.Google Scholar
A. Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the American Medical Informatics Association (AMIA), pages 17--21.Google Scholar
R. Artstein and M. Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555--596. Google ScholarDigital Library
J. Chang, H. Schütze, and R. Altman. 2002. Creating an Online Dictionary of Abbreviations from MEDLINE. The Journal of the American Medical Informatics Association, 9(6):612--620.Google ScholarCross Ref
H. Fred and T. Cheng. 1999. Acronymesis: the exploding misuse of acronyms. Texas Heart Institute Journal, 30:255--257.Google Scholar
S. Gaudan, H. Kirsch, and D. Rebholz-Schuhmann. 2005. Resolving abbreviations to their senses in Medline. Bioinformatics, 21(18):3658--3664. Google ScholarDigital Library
M. Joshi, T. Pedersen, and R. Maclin. 2005. A Comparative Study of Support Vector Machines Applied to the Word Sense Disambiguation Problem for the Medical Domain. In Proceedings of the Second Indian Conference on Artificial Intelligence (IICAI-05), pages 3449--3468, Pune, India.Google Scholar
M. Joshi, S. Pakhomov, T. Pedersen, and C. Chute. 2006. A comparative study of supervised learning as applied to acronym expansion in clinical reports. In Proceedings of the Annual Symposium of the American Medical Informatics Association, pages 399--403, Washington, DC.Google Scholar
A. Kilgarriff. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities, 26:356--387.Google Scholar
H. Liu, Y. Lussier, and C. Friedman. 2001. Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method. Journal of Biomedical Informatics, 34:249--261. Google ScholarDigital Library
H. Liu, S. Johnson, and C. Friedman. 2002. Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS. Journal of the American Medical Informatics Association, 9(6):621--636.Google ScholarCross Ref
H. Liu, V. Teller, and C. Friedman. 2004. A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation. Journal of the American Medical Informatics Association, 11(4):320--331.Google ScholarCross Ref
B. McInnes, T. Pedersen, and J. Carlis. 2007. Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain. In Proceedings of the Annual Symposium of the American Medical Informatics Association, pages 533--537, Chicago, IL.Google Scholar
R. Mihalcea, T. Chklovski, and A. Kilgarriff. 2004. The Senseval-3 English lexical sample task. In Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain.Google Scholar
S. Nelson, T. Powell, and B. Humphreys. 2002. The Unified Medical Language System (UMLS) Project. In Allen Kent and Carolyn M. Hall, editors, Encyclopedia of Library and Information Science. Marcel Dekker, Inc.Google Scholar
H. Ng, B. Wang, and S. Chan. 2003. Exploiting Parallel Texts for Word Sense Disambiguation: an Empirical Study. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 455--462, Sapporo, Japan. Google ScholarDigital Library
N. Okazaki, S. Ananiadou, and J. Tsujii. 2008. A discriminative alignment model for abbreviation recognition. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 657--664, Manchester, UK. Google ScholarDigital Library
S. Pakhomov. 2002. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 160--167, Philadelphia, PA. Google ScholarDigital Library
T. Pedersen. 2001. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-01), pages 79--86, Pittsburgh, PA. Google ScholarDigital Library
J. Pustejovsky, J. Castano, R. Saur, A. Rumshisky, J. Zhang, and W. Luo. 2002. Medstract: Creating Large-scale Information Servers for Biomedical Libraries. In ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain. Google ScholarDigital Library
A. Schwartz and M. Hearst. 2003. A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of the Pacific Symposium on Biocomputing, Kauai.Google Scholar
M. Stevenson, Y. Guo, R. Gaizauskas, and D. Martinez. 2008. Disambiguation of biomedical text using diverse sources of information. BMC Bioinformatics, 9(Suppl 11):S7.Google ScholarCross Ref
M. Weeber, J. Mork, and A. Aronson. 2001. Developing a Test Collection for Biomedical Word Sense Disambiguation. In Proceedings of AMAI Symposium, pages 746--50, Washington, DC.Google Scholar
I. Witten and E. Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
H. Xu, J. Fan, G. Hripcsak, E. Mendonça, Markatou M., and Friedman C. 2007. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics, 23(8):1015--22. Google ScholarDigital Library
H. Yu, W. Kim, V. Hatzivassiloglou, and J. Wilbur. 2006. A large scale, corpus-based approach for automatically disambigutaing biomedical abbreviations. ACM Transactions on Information Systems, 24(3):380--404. Google ScholarDigital Library
W. Zhou, I. Vetle, and N. Smalheiser. 2006. ADAM: another database of abbreviations in MEDLINE. Bioinformatics, 22(22):2813--2818. Google ScholarDigital Library

Index Terms

Disambiguation of biomedical abbreviations
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Biomedical Term Disambiguation: An Application to Gene-Protein Name Disambiguation
ITNG '06: Proceedings of the Third International Conference on Information Technology: New Generations

The huge volumes of biomedical texts available online drives the increasing need for automated techniques to analyze and extract knowledge from these repositories of information. Resolving the ambiguity in biological terms in these texts is an important ...
Read More
Disambiguation in the biomedical domain: The role of ambiguity type

Word Sense Disambiguation (WSD), the automatic identification of the meanings of ambiguous terms in a document, is an important stage in text processing. We describe a WSD system that has been developed specifically for the types of ambiguities found in ...
Read More
A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

Abbreviations and acronyms are widely used in the biomedical literature and many of them represent important biomedical concepts. Because many abbreviations are ambiguous (e.g., CAT denotes both chloramphenicol acetyl transferase and computed axial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
June 2009
214 pages
ISBN:9781932432305
Conference Chairs:
Kevin Bretonnel Cohen
Center for Computational Pharmacology, University of Colorado School of Medicine and The MITRE Corporation
,
Dina Demner-Fushman
Lister Hill National Center for Biomedical Communications, US National Library of Medicine
,
Sophia Ananiadou
University of Manchester and UK National Centre for Text Mining
,
John Pestian
Computational Medicine Center, University of Cincinnati, Cincinnati Children's Hospital Medical Center
,
Jun'ichi Tsujii
University of Tokyo and UK National Centre for Text Mining
,
Bonnie Webber
University of Edinburgh
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 4 June 2009
Qualifiers
- research-article
Conference

Acceptance Rates
BioNLP '09 Paper Acceptance Rate12of29submissions,41%Overall Acceptance Rate33of92submissions,36%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 420
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Disambiguation of biomedical abbreviations

BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Biomedical Term Disambiguation: An Application to Gene-Protein Name Disambiguation

Disambiguation in the biomedical domain: The role of ambiguity type

A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Disambiguation of biomedical abbreviations

BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Biomedical Term Disambiguation: An Application to Gene-Protein Name Disambiguation

Disambiguation in the biomedical domain: The role of ambiguity type

A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media