Skip to main content

Chomsky Was (Almost) Right: Ontology-Based Parsing of Texts of a Narrow Domain

  • Conference paper
  • First Online:
Creativity in Intelligent Technologies and Data Science (CIT&DS 2021)

Abstract

The common approach to the analysis of natural texts implies that semantic analysis should following the stage of parsing. However, medical texts are known as very complicated and written in a very specific language. Traditional parsers are demonstrating relatively small productivity here. In this article, we are demonstrating an opposite approach: ontology-based entailing of words in combination with simple shallow parsing rules. It allows us to increase UAS metrics from 0.82 for SpaCy to 0.834 for our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chomsky, N.: Language and Mind, 3rd edn. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  2. Apresjan, Ju., et al.: ETAP-3 linguistic processor: a full-fledged NLP implementation of the MTT. In: Proceedings of the First International Conference on Meaning-Text Theory, Paris, École Normale Supérieure, pp. 279–288 (2003)

    Google Scholar 

  3. Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C., (eds.) Principle-Based Parsing. Kluwer Academic Publishers (1991)

    Google Scholar 

  4. Bolshakova, E.I., Baeva, N.V., Bordachenkova, E.A., Vasilyeva, N.E., Morozov, S.S.: Lexicosyntactic patterns for automatic text processing. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2007”, pp. 70–75 (2007). (in Russian)

    Google Scholar 

  5. Molina, A., Pla, F.: Shallow parsing using specialized HMMs. J. Mach. Learn. Res. 2, 595–613 (2002)

    MATH  Google Scholar 

  6. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, Edmonton, pp. 134–141 (2003)

    Google Scholar 

  7. Nozhov, I.M.: Implementation of an automatic syntactical segmentation of a Russian sentence. Ph.D. thesis, RSUH, Moscow (2003)

    Google Scholar 

  8. Anastasyev, D.G.: Exploring pretrained models for joint morpho-syntactic parsing of Russian. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2020”, pp. 1–12 (2020)

    Google Scholar 

  9. Korzun, V.A.: R-BERT for relationship extraction on Russian business documents. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2020”, pp. 467–463 (2020)

    Google Scholar 

  10. Lyashevskaya, O.N., Shavrina, T.O., Trofimov, I.V., Vlasova, N.A.: GRAMEVAL 2020 shared task: Russian full morphology and universal dependencies parsing. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2020”, pp. 553–569 (2020)

    Google Scholar 

  11. Current Bibliographies in Medicine. https://www.nlm.nih.gov/archive/20040831/pubs/cbm/umlscbm.html. Accessed 20 Apr 2021

  12. MSHRUS (MeSH Russian) – Statistics. https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MSHRUS/stats.html. Accessed 20 Apr 2021

  13. Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)

    Article  Google Scholar 

  14. Shelmanov, A.O., Smirnov, I.V., Vishneva, E.A.: Information extraction from clinical texts in Russian. In: Proceedings of International Conference on Computational Linguistics and Intellectual Technologies “Dialog-2015”, pp. 560–572 (2015)

    Google Scholar 

  15. Klyshinsky, E., et al.: Formalization of medical records using an ontology: patient complaints. In: van der Aalst, W.M.P., et al. (eds.) AIST 2019. CCIS, vol. 1086, pp. 143–153. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39575-9_14

    Chapter  Google Scholar 

  16. Jagannatha, A., Yu., H.: Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of Association for Computational Linguistics. North American Chapter, pp. 473–482 (2016)

    Google Scholar 

  17. Miftahutdinov, Z., Alimova, I., Tutubalina, E.: On biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts. In: Jose, J., et al. (eds.) Advances in Information Retrieval, vol. 12036, pp. 281–288. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_35

    Chapter  Google Scholar 

  18. Gribova, V.V., Moskalenko, Ph.M., Shahgeldyan, C.I., Gmar’, D.V., Geltser, B.I.: A concept for a heterogeneous biomedical information warehouse. Inf. Technol. 2(25), 97–106 (2019). (in Russian)

    Google Scholar 

  19. Straka, M., Straková, J., Haji, J.: UDPipe at SIGMORPHON 2019: contextualized embeddings, regularization with morphological categories, corpora merging. In: Proceedings of the 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 95–103 (2019)

    Google Scholar 

  20. Whats New in v3.0 (2021). https://spacy.io/usage/v3. Accessed Apr 2020

  21. Nivre, J., et al.: Universal dependencies v1: a multilingualtreebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659–1666 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduard Klyshinsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geltser, B. et al. (2021). Chomsky Was (Almost) Right: Ontology-Based Parsing of Texts of a Narrow Domain. In: Kravets, A.G., Shcherbakov, M., Parygin, D., Groumpos, P.P. (eds) Creativity in Intelligent Technologies and Data Science. CIT&DS 2021. Communications in Computer and Information Science, vol 1448. Springer, Cham. https://doi.org/10.1007/978-3-030-87034-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87034-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87033-1

  • Online ISBN: 978-3-030-87034-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics