Abstract
The N.Y.U. Linguistic String Project (LSP) is presently engaged in applying its programs for natural language processing to medical records. The programs transform the free narrative input into a structured data base suitable for automatic information processing, such as question answering, editing of records, or statistical summaries of the data. In order to determine the appropriate structures for a given type of material we first perform a manual linguistic analysis on a sample of the texts prior to processing. From this we obtain a set of word classes and a tabular form (called an information format) for this type of material. We then apply the series of processing programs to the sentences of the texts. Each sentence is parsed with the Linguistic String Parser English grammar in order to obtain its grammatical structure; then certain standard English transformations are applied to regularize the grammatical form of the sentence. Finally a set of "formatting transformations" map the words of the sentence into the slots of the information format, or table, for this material in such a way that the sentence is reconstructible up to paraphrase from its representation in the table.
Index Terms
- Transforming medical records into a structured data base
Recommendations
Semantic relations for problem-oriented medical records
Objective: We describe semantic relation (SR) classification on medical discharge summaries. We focus on relations targeted to the creation of problem-oriented records. Thus, we define relations that involve the medical problems of patients. Methods and ...
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
Objectives:: We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific ...
High-performance tagging on medical texts
COLING '04: Proceedings of the 20th international conference on Computational LinguisticsWe ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance ...
Comments