Abstract
The common approach to the analysis of natural texts implies that semantic analysis should following the stage of parsing. However, medical texts are known as very complicated and written in a very specific language. Traditional parsers are demonstrating relatively small productivity here. In this article, we are demonstrating an opposite approach: ontology-based entailing of words in combination with simple shallow parsing rules. It allows us to increase UAS metrics from 0.82 for SpaCy to 0.834 for our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chomsky, N.: Language and Mind, 3rd edn. Cambridge University Press, Cambridge (2006)
Apresjan, Ju., et al.: ETAP-3 linguistic processor: a full-fledged NLP implementation of the MTT. In: Proceedings of the First International Conference on Meaning-Text Theory, Paris, École Normale Supérieure, pp. 279–288 (2003)
Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C., (eds.) Principle-Based Parsing. Kluwer Academic Publishers (1991)
Bolshakova, E.I., Baeva, N.V., Bordachenkova, E.A., Vasilyeva, N.E., Morozov, S.S.: Lexicosyntactic patterns for automatic text processing. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2007”, pp. 70–75 (2007). (in Russian)
Molina, A., Pla, F.: Shallow parsing using specialized HMMs. J. Mach. Learn. Res. 2, 595–613 (2002)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, Edmonton, pp. 134–141 (2003)
Nozhov, I.M.: Implementation of an automatic syntactical segmentation of a Russian sentence. Ph.D. thesis, RSUH, Moscow (2003)
Anastasyev, D.G.: Exploring pretrained models for joint morpho-syntactic parsing of Russian. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2020”, pp. 1–12 (2020)
Korzun, V.A.: R-BERT for relationship extraction on Russian business documents. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2020”, pp. 467–463 (2020)
Lyashevskaya, O.N., Shavrina, T.O., Trofimov, I.V., Vlasova, N.A.: GRAMEVAL 2020 shared task: Russian full morphology and universal dependencies parsing. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2020”, pp. 553–569 (2020)
Current Bibliographies in Medicine. https://www.nlm.nih.gov/archive/20040831/pubs/cbm/umlscbm.html. Accessed 20 Apr 2021
MSHRUS (MeSH Russian) – Statistics. https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MSHRUS/stats.html. Accessed 20 Apr 2021
Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)
Shelmanov, A.O., Smirnov, I.V., Vishneva, E.A.: Information extraction from clinical texts in Russian. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2015”, pp. 560–572 (2015)
Klyshinsky, E., et al.: Formalization of medical records using an ontology: patient complaints. In: van der Aalst, W.M.P., et al. (eds.) AIST 2019. CCIS, vol. 1086, pp. 143–153. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39575-9_14
Jagannatha, A., Yu., H.: Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of Association for Computational Linguistics. North American Chapter, pp. 473–482 (2016)
Miftahutdinov, Z., Alimova, I., Tutubalina, E.: On biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts. In: Jose, J., et al. (eds.) Advances in Information Retrieval, vol. 12036, pp. 281–288. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_35
Gribova, V.V., Moskalenko, Ph.M., Shahgeldyan, C.I., Gmar’, D.V., Geltser, B.I.: A concept for a heterogeneous biomedical information warehouse. Inf. Technol. 2(25), 97–106 (2019). (in Russian)
Straka, M., Straková, J., Haji, J.: UDPipe at SIGMORPHON 2019: contextualized embeddings, regularization with morphological categories, corpora merging. In: Proceedings of the 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 95–103 (2019)
Whats New in v3.0 (2021). https://spacy.io/usage/v3. Accessed Apr 2020
Nivre, J., et al.: Universal dependencies v1: a multilingualtreebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659–1666 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Geltser, B. et al. (2021). Chomsky Was (Almost) Right: Ontology-Based Parsing of Texts of a Narrow Domain. In: Kravets, A.G., Shcherbakov, M., Parygin, D., Groumpos, P.P. (eds) Creativity in Intelligent Technologies and Data Science. CIT&DS 2021. Communications in Computer and Information Science, vol 1448. Springer, Cham. https://doi.org/10.1007/978-3-030-87034-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-87034-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87033-1
Online ISBN: 978-3-030-87034-8
eBook Packages: Computer ScienceComputer Science (R0)