ABSTRACT
Abbreviations are common in biomedical documents and many are ambiguous in the sense that they have several potential expansions. Identifying the correct expansion is necessary for language understanding and important for applications such as document retrieval. Identifying the correct expansion can be viewed as a Word Sense Disambiguation (WSD) problem. A WSD system that uses a variety of knowledge sources, including two types of information specific to the biomedical domain, is also described. This system was tested on a corpus of ambiguous abbreviations, created by automatically identifying the correct expansion in Medline abstracts, and found to identify the correct expansion with up to 99% accuracy.
- E. Adar. 2004. SaRAD: A simple and robust abbreviation dictionary. Bioinformatics, 20(4):527--533. Google ScholarDigital Library
- E. Agirre and D. Martínez. 2004. The Basque Country University system: English and Basque tasks. In Rada Mihalcea and Phil Edmonds, editors, Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 44--48, Barcelona, Spain, July.Google Scholar
- A. Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the American Medical Informatics Association (AMIA), pages 17--21.Google Scholar
- R. Artstein and M. Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555--596. Google ScholarDigital Library
- J. Chang, H. Schütze, and R. Altman. 2002. Creating an Online Dictionary of Abbreviations from MEDLINE. The Journal of the American Medical Informatics Association, 9(6):612--620.Google ScholarCross Ref
- H. Fred and T. Cheng. 1999. Acronymesis: the exploding misuse of acronyms. Texas Heart Institute Journal, 30:255--257.Google Scholar
- S. Gaudan, H. Kirsch, and D. Rebholz-Schuhmann. 2005. Resolving abbreviations to their senses in Medline. Bioinformatics, 21(18):3658--3664. Google ScholarDigital Library
- M. Joshi, T. Pedersen, and R. Maclin. 2005. A Comparative Study of Support Vector Machines Applied to the Word Sense Disambiguation Problem for the Medical Domain. In Proceedings of the Second Indian Conference on Artificial Intelligence (IICAI-05), pages 3449--3468, Pune, India.Google Scholar
- M. Joshi, S. Pakhomov, T. Pedersen, and C. Chute. 2006. A comparative study of supervised learning as applied to acronym expansion in clinical reports. In Proceedings of the Annual Symposium of the American Medical Informatics Association, pages 399--403, Washington, DC.Google Scholar
- A. Kilgarriff. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities, 26:356--387.Google Scholar
- H. Liu, Y. Lussier, and C. Friedman. 2001. Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method. Journal of Biomedical Informatics, 34:249--261. Google ScholarDigital Library
- H. Liu, S. Johnson, and C. Friedman. 2002. Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS. Journal of the American Medical Informatics Association, 9(6):621--636.Google ScholarCross Ref
- H. Liu, V. Teller, and C. Friedman. 2004. A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation. Journal of the American Medical Informatics Association, 11(4):320--331.Google ScholarCross Ref
- B. McInnes, T. Pedersen, and J. Carlis. 2007. Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain. In Proceedings of the Annual Symposium of the American Medical Informatics Association, pages 533--537, Chicago, IL.Google Scholar
- R. Mihalcea, T. Chklovski, and A. Kilgarriff. 2004. The Senseval-3 English lexical sample task. In Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain.Google Scholar
- S. Nelson, T. Powell, and B. Humphreys. 2002. The Unified Medical Language System (UMLS) Project. In Allen Kent and Carolyn M. Hall, editors, Encyclopedia of Library and Information Science. Marcel Dekker, Inc.Google Scholar
- H. Ng, B. Wang, and S. Chan. 2003. Exploiting Parallel Texts for Word Sense Disambiguation: an Empirical Study. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 455--462, Sapporo, Japan. Google ScholarDigital Library
- N. Okazaki, S. Ananiadou, and J. Tsujii. 2008. A discriminative alignment model for abbreviation recognition. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 657--664, Manchester, UK. Google ScholarDigital Library
- S. Pakhomov. 2002. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 160--167, Philadelphia, PA. Google ScholarDigital Library
- T. Pedersen. 2001. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-01), pages 79--86, Pittsburgh, PA. Google ScholarDigital Library
- J. Pustejovsky, J. Castano, R. Saur, A. Rumshisky, J. Zhang, and W. Luo. 2002. Medstract: Creating Large-scale Information Servers for Biomedical Libraries. In ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain. Google ScholarDigital Library
- A. Schwartz and M. Hearst. 2003. A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of the Pacific Symposium on Biocomputing, Kauai.Google Scholar
- M. Stevenson, Y. Guo, R. Gaizauskas, and D. Martinez. 2008. Disambiguation of biomedical text using diverse sources of information. BMC Bioinformatics, 9(Suppl 11):S7.Google ScholarCross Ref
- M. Weeber, J. Mork, and A. Aronson. 2001. Developing a Test Collection for Biomedical Word Sense Disambiguation. In Proceedings of AMAI Symposium, pages 746--50, Washington, DC.Google Scholar
- I. Witten and E. Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
- H. Xu, J. Fan, G. Hripcsak, E. Mendonça, Markatou M., and Friedman C. 2007. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics, 23(8):1015--22. Google ScholarDigital Library
- H. Yu, W. Kim, V. Hatzivassiloglou, and J. Wilbur. 2006. A large scale, corpus-based approach for automatically disambigutaing biomedical abbreviations. ACM Transactions on Information Systems, 24(3):380--404. Google ScholarDigital Library
- W. Zhou, I. Vetle, and N. Smalheiser. 2006. ADAM: another database of abbreviations in MEDLINE. Bioinformatics, 22(22):2813--2818. Google ScholarDigital Library
Index Terms
- Disambiguation of biomedical abbreviations
Recommendations
Biomedical Term Disambiguation: An Application to Gene-Protein Name Disambiguation
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsThe huge volumes of biomedical texts available online drives the increasing need for automated techniques to analyze and extract knowledge from these repositories of information. Resolving the ambiguity in biological terms in these texts is an important ...
Disambiguation in the biomedical domain: The role of ambiguity type
Word Sense Disambiguation (WSD), the automatic identification of the meanings of ambiguous terms in a document, is an important stage in text processing. We describe a WSD system that has been developed specifically for the types of ambiguities found in ...
A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations
Abbreviations and acronyms are widely used in the biomedical literature and many of them represent important biomedical concepts. Because many abbreviations are ambiguous (e.g., CAT denotes both chloramphenicol acetyl transferase and computed axial ...
Comments