ABSTRACT
Identifying hedged information in biomedical literature is an important subtask in information extraction because it would be misleading to extract speculative information as factual information. In this paper we present a machine learning system that finds the scope of hedge cues in biomedical texts. The system is based on a similar system that finds the scope of negation cues. We show that the same scope finding approach can be applied to both negation and hedging. To investigate the robustness of the approach, the system is tested on the three subcorpora of the BioScope corpus that represent different text types.
- S. Buchholz and E. Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proc. of the X CoNLL Shared Task, New York. SIGNLL. Google ScholarDigital Library
- N. Collier, H. S. Park, N. Ogata, Y. Tateisi, C. Nobata, T. Sekimizu, H. Imai, and J. Tsujii. 1999. The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers. In Proc. of EACL 1999. Google ScholarDigital Library
- T. M. Cover and P. E. Hart. 1967. Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13:21--27.Google ScholarDigital Library
- W. Daelemans, J. Zavrel, K. Van der Sloot, and A. Van den Bosch. 2007. TiMBL: Tilburg memory based learner, version 6.1, reference guide. Technical Report Series 07-07, ILK, Tilburg, The Netherlands.Google Scholar
- C. Di Marco and R. E. Mercer, 2005. Computing attitude and affect in text: Theory and applications, chapter Hedging in scientific articles as a means of classifying citations. Springer-Verlag, Dordrecht.Google Scholar
- C. Friedman, P. Alderson, J. Austin, J. J. Cimino, and S. B. Johnson. 1994. A general natural--language text processor for clinical radiology. JAMIA, 1(2):161--174.Google Scholar
- K. Hyland. 1998. Hedging in scientific research articles. John Benjamins B.V, Amsterdam.Google Scholar
- T. Joachims, 1999. Advances in Kernel Methods - Support Vector Learning, chapter Making large-Scale SVM Learning Practical, pages 169--184. MIT-Press, Cambridge, MA. Google ScholarDigital Library
- H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC Bioinformatics, 9(Suppl 11):S10. Google ScholarDigital Library
- M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia. 2008a. Overview of the protein--protein interaction annotation extraction task of BioCreative II. Genome Biology, 9(Suppl 2):S4.Google ScholarCross Ref
- M. Krallinger, A. Valencia, and L. Hirschman. 2008b. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biology, 9(Suppl 2):S8.Google ScholarCross Ref
- M. Krauthammer, P. Kra, I. Iossifov, S. M. Gomez, G. Hripcsak, V. Hatzivassiloglou, C. Friedman, and A. Rzhetsky. 2002. Of truth and pathways: chasing bits of information through myriads of articles. Bioinformatics, 18(Suppl 1):S249--57.Google ScholarCross Ref
- J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML 2001, pages 282--289. Google ScholarDigital Library
- G. Lakoff. 1972. Hedges: a study in meaning criteria and the logic of fuzzy concepts. Chicago Linguistics Society Papers, 8:183--228.Google Scholar
- M. Light, X. Y. Qiu, and P. Srinivasan. 2004. The language of bioscience: facts, speculations, and statements in between. In Proc. of the BioLINK 2004, pages 17--24.Google Scholar
- B. Medlock and T. Briscoe. 2007. Weakly supervised learning for hedge classification in scientific literature. In Proc. of ACL 2007, pages 992--999.Google Scholar
- B. Medlock. 2008. Exploring hedge identification in biomedical literature. JBI, 41:636--654. Google ScholarDigital Library
- T. Mitsumori, M. Murata, Y. Fukuda, K Doi, and H. Doi. 2006. Extracting protein-protein interaction information from biomedical text with svm. IEICE - Trans. Inf. Syst., E89-D(8):2464--2466. Google ScholarDigital Library
- R. Morante and W. Daelemans. 2009. A metalearning approach to processing the scope of negation. In Proc. of CoNLL 2009, Boulder, Colorado. Google ScholarDigital Library
- F. R. Palmer. 1986. Mood and modality. CUP, Cambridge, UK.Google Scholar
- R. Saurí, M. Verhagen, and J. Pustejovsky. 2006. Annotating and recognizing event modality in text. In Proc. of FLAIRS 2006, pages 333--339.Google Scholar
- G. Szarvas, V. Vincze, R. Farkas, and J. Csirik. 2008. The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts. In Proc. of BioNLP 2008, pages 38--45, Columbus, Ohio. ACL. Google ScholarDigital Library
- G. Szarvas. 2008. Hedge classification in biomedical texts with a weakly supervised selection of keywords. In Proc. of ACL 2008, pages 281--289, Columbus, Ohio, USA. ACL.Google Scholar
- P. Thompson, G. Venturi, J. McNaught, S. Montemagni, and S. Ananiadou. 2008. Categorising modality in biomedical texts. In Proc. of the LREC 2008 Workshop on Building and Evaluating Resources for Biomedical Text Mining 2008, pages 27--34, Marrakech. LREC.Google Scholar
- Y. Tsuruoka and J. Tsujii. 2005. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proc. of HLT/EMNLP 2005, pages 467--474. Google ScholarDigital Library
- Y. Tsuruoka, Y. Tateishi, J. Kim, T. Ohta, J. McNaught, S. Ananiadou, and J. Tsujii, 2005. Advances in Informatics - 10th Panhellenic Conference on Informatics, volume 3746 of LNCS, chapter Part-of-Speech Tagger for Biomedical Text, Advances in Informatics, pages 382--392. Springer, Berlin/Heidelberg. Google ScholarDigital Library
- C. J. Van Rijsbergen. 1979. Information Retrieval. Butterworths, London. Google ScholarDigital Library
- V. Vincze, G. Szarvas, R. Farkas, G. Móra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9.Google ScholarCross Ref
Index Terms
- Learning the scope of hedge cues in biomedical texts
Recommendations
Detecting hedge cues and their scope in biomedical text with conditional random fields
Objective: Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are ...
Recognizing names in biomedical texts: a machine learning approach
Motivation: With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective and efficient literature mining and knowledge discovery that can help biologists to gather and make use of the ...
Comments