skip to main content
10.5555/1690339.1690343dlproceedingsArticle/Chapter ViewAbstractPublication PagesbuccConference Proceedingsconference-collections
research-article
Free Access

Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

Published:06 August 2009Publication History

ABSTRACT

Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We identified two devices used in such paraphrases: nominalizations and neo-classical compounds. The results showed that the paraphrases had a good precision and that nominalizations were indeed relevant in the context of studying the differences between specialized and lay language. Neo-classical compounds were less conclusive. This study also demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected.

References

  1. Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In HLT-NAACL, pages 16--23, Edmonton, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Regina Barzilay and Kathleen McKeown. 2001. Extracting paraphrases from a parallel corpus. In ACL/EACL, pages 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Regina Barzilay. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yun-Chuang Chiao and Pierre Zweigenbaum. 2002. Looking for French-English translations in comparable medical corpora. In Proc AMIA Symp, pages 150--4.Google ScholarGoogle Scholar
  6. Louise Deléger and Pierre Zweigenbaum. 2008a. Aligning lay and specialized passages in comparable medical corpora. In Stud Health Technol Inform, volume 136, pages 89--94.Google ScholarGoogle Scholar
  7. Louise Deléger and Pierre Zweigenbaum. 2008b. Paraphrase acquisition from comparable medical corpora of specialized and lay texts. In Proceedings of the AMIA Annual Fall Symposium, pages 146--150, Washington, DC.Google ScholarGoogle Scholar
  8. Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In ACL BioNLP Workshop, pages 49--56, Prague, Czech Republic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zhihui Fang. 2005. Scientific literacy: A systemic functional linguistics perspective. Science Education, 89(2):335--347.Google ScholarGoogle ScholarCross RefCross Ref
  10. Nabil Hathout, Fiammetta Namer, and Georgette Dal. 2002. An Experimental Constructional Database: The MorTAL Project. In Many Morphologies, pages 178--209.Google ScholarGoogle Scholar
  11. Marti A. Hearst. 1997. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ali Ibrahim, Boris Katz, and Jimmy Lin. 2003. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of the second international workshop on Paraphrasing, pages 57--64, Sapporo, Japan. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christian Jacquemin. 1999. Syntagmatic and paradigmatic representations of term variation. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 341--348, College Park, Maryland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Adam Kilgarriff and Gregory Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3):333--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aurélien Max. 2008. Local rephrasing suggestions for supporting the work of writers. In Proceedings of GoTAL, Gothenburg, Sweden. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fiammetta Namer and Pierre Zweigenbaum. 2004. Acquiring meaning for French medical terminology: contribution of morphosemantics. In Marius Fieschi, Enrico Coiera, and Yu-Chuan Jack Li, editors, MEDINFO, pages 535--539, San Francisco.Google ScholarGoogle Scholar
  17. Marius Pasca and Peter Dienes. 2005. Aligning needles in a haystack: Paraphrase acquisition across the web. In Proceedings of IJCNLP, pages 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 320--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yusuke Shinyama and Satoshi Sekine. 2003. Paraphrase acquisition for information extraction. In Proceedings of the second international workshop on Paraphrasing (IWP), pages 65--71, Sapporo, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          BUCC '09: Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
          August 2009
          77 pages
          ISBN:9781932432534

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 6 August 2009

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader