skip to main content
10.1145/2064747.2064755acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Generality and reuse in a common type system for clinical natural language processing

Published:28 October 2011Publication History

ABSTRACT

The aim of Area 4 of the Strategic Healthcare IT Advanced Research Project (SHARP 4) is to facilitate secondary use of data stored in Electronic Medical Records (EMR) through high throughput phenotyping. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to standard representation that is comparable and interoperable. To meet the NLP requirement of different secondary use cases of EMR, accommodate different NLP approaches, enable the interoperability between structured and unstructured data generated in different clinical settings, we define a common type system for clinical NLP that integrates a comprehensive model of clinical semantics with language processing types for SHARP 4. The type system has been implemented in UIMA (Unstructured Information Management Architecture), which allows for flexible passing of input and output data types among NLP components, and is available at the SHARP 4 website.

References

  1. Ferrucci, D. and Lally, A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng., 10, 3-4 (Sept 1 2004), 327--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Klabbers, E., Odijk, J., De Pijper, J. and Theune, M. GoalGetter: Football results, from teletext to speech. IPO Annual Progress Report, 311996), 66--75.Google ScholarGoogle Scholar
  3. Stent, A., Dowding, J., Gawron, J. M., Bratt, E. O. and Moore, R. The CommandTalk spoken dialogue system. In Proc. 37th annual meeting of the Association for Computational Linguistics (College Park, MD, 1999), 183--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C. and Chute, C. G. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc, 17, 5 (Sep-Oct 2010), 507--513.Google ScholarGoogle ScholarCross RefCross Ref
  5. Verspoor, K., Baumgartner Jr, W., Roeder, C. and Hunter, L. Abstracting the types away from a UIMA type system. In C. Chiarcos, E. de Castilho and M. Stede. From Form to Meaning: Processing Texts Automatically. Narr, Tubingen, 2009.Google ScholarGoogle Scholar
  6. Hahn, U., Buyko, E., Landefeld, R., Mühlhausen, M., Poprat, M., Tomanek, K. and Wermter, J. An overview of JCoRe, the JULIE lab UIMA component repository. In Proceedings of the LREC (Marrakech, Morocco, 2008), 1--7.Google ScholarGoogle Scholar
  7. Kano, Y., Baumgartner, W. A., Jr., McCrohon, L., Ananiadou, S., Cohen, K. B., Hunter, L. and Tsujii, J. U-Compare: share and compare text mining tools with UIMA. Bioinformatics, 25, 15 (Aug 1 2009), 1997--1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marcus, M. P., Marcinkiewicz, M. A. and Santorini, B. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19, 2 (June 1993), 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Buchholz, S. and Marsi, E. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning (New York City, New York, 2006), 149--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. de Marneffe, M.-C. and Manning, C. D. The Stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation (Manchester, United Kingdom, 2008), 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kingsbury, P. and Palmer, M. Propbank: the next level of treebank. In Proc. Treebanks and Lexical Theories (2003).Google ScholarGoogle Scholar
  12. Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D. and Ferro, L. The timebank corpus. In Proceedings of Corpus Linguistics 2003 (2003), 647--656.Google ScholarGoogle Scholar
  13. Friedman, C., Kra, P. and Rzhetsky, A. Two biomedical sublanguages: a description based on the theories of Zellig Harris. J. of Biomedical Informatics, 35, 4 (August 2002), 222--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Haghighi, A. and Klein, D. An entity-level approach to information extraction. In Proceedings of the ACL 2010 Conference Short Papers (Uppsala, Sweden, 2010), 291--295. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generality and reuse in a common type system for clinical natural language processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MIXHS '11: Proceedings of the first international workshop on Managing interoperability and complexity in health systems
        October 2011
        100 pages
        ISBN:9781450309547
        DOI:10.1145/2064747

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 October 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader