research-article

Generality and reuse in a common type system for clinical natural language processing

Authors:
Stephen T. Wu

Mayo Clinic, Rochester, MN, USA

Mayo Clinic, Rochester, MN, USA
View Profile

,
Vinod C. Kaggal

Mayo Clinic, Rochester, MN, USA

Mayo Clinic, Rochester, MN, USA
View Profile

,
Guergana K. Savova

Childrens Hospital Boston and Harvard Medical School, Boston, MA, USA

Childrens Hospital Boston and Harvard Medical School, Boston, MA, USA
View Profile

,
Hongfang Liu

University of Colorado at Boulder, Boulder, CO, USA

University of Colorado at Boulder, Boulder, CO, USA
View Profile

,
Jiaping Zheng

Childrens Hospital Boston and Harvard Medical School, Boston, MA, USA

Childrens Hospital Boston and Harvard Medical School, Boston, MA, USA
View Profile

,
Wendy W. Chapman

University of California, San Diego, San Diego, CA, USA

University of California, San Diego, San Diego, CA, USA
View Profile

,
Christopher G. Chute

Mayo Clinic, Rochester, MN, USA

Mayo Clinic, Rochester, MN, USA
View Profile

,
Dmitriy Dligach

University of Colorado at Boulder, Boulder, CO, USA

University of Colorado at Boulder, Boulder, CO, USA
View Profile

MIXHS '11: Proceedings of the first international workshop on Managing interoperability and complexity in health systemsOctober 2011Pages 27–34https://doi.org/10.1145/2064747.2064755

Published:28 October 2011Publication History

MIXHS '11: Proceedings of the first international workshop on Managing interoperability and complexity in health systems

Pages 27–34

ABSTRACT

The aim of Area 4 of the Strategic Healthcare IT Advanced Research Project (SHARP 4) is to facilitate secondary use of data stored in Electronic Medical Records (EMR) through high throughput phenotyping. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to standard representation that is comparable and interoperable. To meet the NLP requirement of different secondary use cases of EMR, accommodate different NLP approaches, enable the interoperability between structured and unstructured data generated in different clinical settings, we define a common type system for clinical NLP that integrates a comprehensive model of clinical semantics with language processing types for SHARP 4. The type system has been implemented in UIMA (Unstructured Information Management Architecture), which allows for flexible passing of input and output data types among NLP components, and is available at the SHARP 4 website.

References

Ferrucci, D. and Lally, A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng., 10, 3-4 (Sept 1 2004), 327--348. Google ScholarDigital Library
Klabbers, E., Odijk, J., De Pijper, J. and Theune, M. GoalGetter: Football results, from teletext to speech. IPO Annual Progress Report, 311996), 66--75.Google Scholar
Stent, A., Dowding, J., Gawron, J. M., Bratt, E. O. and Moore, R. The CommandTalk spoken dialogue system. In Proc. 37th annual meeting of the Association for Computational Linguistics (College Park, MD, 1999), 183--190. Google ScholarDigital Library
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C. and Chute, C. G. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc, 17, 5 (Sep-Oct 2010), 507--513.Google ScholarCross Ref
Verspoor, K., Baumgartner Jr, W., Roeder, C. and Hunter, L. Abstracting the types away from a UIMA type system. In C. Chiarcos, E. de Castilho and M. Stede. From Form to Meaning: Processing Texts Automatically. Narr, Tubingen, 2009.Google Scholar
Hahn, U., Buyko, E., Landefeld, R., Mühlhausen, M., Poprat, M., Tomanek, K. and Wermter, J. An overview of JCoRe, the JULIE lab UIMA component repository. In Proceedings of the LREC (Marrakech, Morocco, 2008), 1--7.Google Scholar
Kano, Y., Baumgartner, W. A., Jr., McCrohon, L., Ananiadou, S., Cohen, K. B., Hunter, L. and Tsujii, J. U-Compare: share and compare text mining tools with UIMA. Bioinformatics, 25, 15 (Aug 1 2009), 1997--1998. Google ScholarDigital Library
Marcus, M. P., Marcinkiewicz, M. A. and Santorini, B. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19, 2 (June 1993), 313--330. Google ScholarDigital Library
Buchholz, S. and Marsi, E. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning (New York City, New York, 2006), 149--164. Google ScholarDigital Library
de Marneffe, M.-C. and Manning, C. D. The Stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation (Manchester, United Kingdom, 2008), 1--8. Google ScholarDigital Library
Kingsbury, P. and Palmer, M. Propbank: the next level of treebank. In Proc. Treebanks and Lexical Theories (2003).Google Scholar
Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D. and Ferro, L. The timebank corpus. In Proceedings of Corpus Linguistics 2003 (2003), 647--656.Google Scholar
Friedman, C., Kra, P. and Rzhetsky, A. Two biomedical sublanguages: a description based on the theories of Zellig Harris. J. of Biomedical Informatics, 35, 4 (August 2002), 222--235. Google ScholarDigital Library
Haghighi, A. and Klein, D. An entity-level approach to information extraction. In Proceedings of the ACL 2010 Conference Short Papers (Uppsala, Sweden, 2010), 291--295. Google ScholarDigital Library

Index Terms

Generality and reuse in a common type system for clinical natural language processing
1. Applied computing
  1. Life and medical sciences
    1. Health care information systems
2. Information systems
  1. Information systems applications

Recommendations

Natural language processing

Graphical abstractDisplay Omitted We report on a natural language workshop sponsored by the National Library of Medicine.We summarize the current state of the art in biomedical natural language processing.We report on research strategies for advancing ...
Read More
A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

Since Sag et al. (2002) highlighted a key problem that had been underappreciated in the past in natural language processing (NLP), namely idiosyncratic multiword expressions (MWEs) such as idioms, quasi-idioms, cliches, quasi-cliches, institutionalized ...
Read More
Introduction to Chinese Natural Language Processing
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MIXHS '11: Proceedings of the first international workshop on Managing interoperability and complexity in health systems
October 2011
100 pages
ISBN:9781450309547
DOI:10.1145/2064747
Program Chairs:
Matt-Mouley Bouamrane
The University of Glasgow, Scotland, U.K.
,
Cui Tao
Mayo Clinic, U.S.A.
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clinical element models
clinical information standards
interoperability
medical semantics
natural language processing
uima
Qualifiers
- research-article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 203
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generality and reuse in a common type system for clinical natural language processing

MIXHS '11: Proceedings of the first international workshop on Managing interoperability and complexity in health systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Natural language processing

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

Introduction to Chinese Natural Language Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Generality and reuse in a common type system for clinical natural language processing

MIXHS '11: Proceedings of the first international workshop on Managing interoperability and complexity in health systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Natural language processing

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

Introduction to Chinese Natural Language Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media