research-article

Free Access

Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

Authors:
Louise Deléger

INSERM, Paris, France

INSERM, Paris, France
View Profile

,
Pierre Zweigenbaum

CNRS, LIMSI, Orsay, France

CNRS, LIMSI, Orsay, France
View Profile

BUCC '09: Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel CorporaAugust 2009Pages 2–10

Published:06 August 2009Publication History

BUCC '09: Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora

Pages 2–10

ABSTRACT

Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We identified two devices used in such paraphrases: nominalizations and neo-classical compounds. The results showed that the paraphrases had a good precision and that nominalizations were indeed relevant in the context of studying the differences between specialized and lay language. Neo-classical compounds were less conclusive. This study also demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected.

References

Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597--604. Google ScholarDigital Library
Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In HLT-NAACL, pages 16--23, Edmonton, Canada. Google ScholarDigital Library
Regina Barzilay and Kathleen McKeown. 2001. Extracting paraphrases from a parallel corpus. In ACL/EACL, pages 50--57. Google ScholarDigital Library
Regina Barzilay. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University. Google ScholarDigital Library
Yun-Chuang Chiao and Pierre Zweigenbaum. 2002. Looking for French-English translations in comparable medical corpora. In Proc AMIA Symp, pages 150--4.Google Scholar
Louise Deléger and Pierre Zweigenbaum. 2008a. Aligning lay and specialized passages in comparable medical corpora. In Stud Health Technol Inform, volume 136, pages 89--94.Google Scholar
Louise Deléger and Pierre Zweigenbaum. 2008b. Paraphrase acquisition from comparable medical corpora of specialized and lay texts. In Proceedings of the AMIA Annual Fall Symposium, pages 146--150, Washington, DC.Google Scholar
Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In ACL BioNLP Workshop, pages 49--56, Prague, Czech Republic. Google ScholarDigital Library
Zhihui Fang. 2005. Scientific literacy: A systemic functional linguistics perspective. Science Education, 89(2):335--347.Google ScholarCross Ref
Nabil Hathout, Fiammetta Namer, and Georgette Dal. 2002. An Experimental Constructional Database: The MorTAL Project. In Many Morphologies, pages 178--209.Google Scholar
Marti A. Hearst. 1997. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64. Google ScholarDigital Library
Ali Ibrahim, Boris Katz, and Jimmy Lin. 2003. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of the second international workshop on Paraphrasing, pages 57--64, Sapporo, Japan. Association for Computational Linguistics. Google ScholarDigital Library
Christian Jacquemin. 1999. Syntagmatic and paradigmatic representations of term variation. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 341--348, College Park, Maryland. Google ScholarDigital Library
Adam Kilgarriff and Gregory Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3):333--47. Google ScholarDigital Library
Aurélien Max. 2008. Local rephrasing suggestions for supporting the work of writers. In Proceedings of GoTAL, Gothenburg, Sweden. Google ScholarDigital Library
Fiammetta Namer and Pierre Zweigenbaum. 2004. Acquiring meaning for French medical terminology: contribution of morphosemantics. In Marius Fieschi, Enrico Coiera, and Yu-Chuan Jack Li, editors, MEDINFO, pages 535--539, San Francisco.Google Scholar
Marius Pasca and Peter Dienes. 2005. Aligning needles in a haystack: Paraphrase acquisition across the web. In Proceedings of IJCNLP, pages 119--130. Google ScholarDigital Library
Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 320--322. Google ScholarDigital Library
Yusuke Shinyama and Satoshi Sekine. 2003. Paraphrase acquisition for information extraction. In Proceedings of the second international workshop on Paraphrasing (IWP), pages 65--71, Sapporo, Japan. Google ScholarDigital Library

Index Terms

Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Extracting Translation Equivalents from Bilingual Comparable Corpora

An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon---which is used to bridge contexts in ...
Read More
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...
Read More
Unsupervised word sense disambiguation using bilingual comparable corpora
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

An unsupervised method for word sense disambiguation using a bilingual comparable corpus was developed. First, it extracts statistically significant pairs of related words from the corpus of each language. Then, aligning pairs of related words ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BUCC '09: Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
August 2009
77 pages
ISBN:9781932432534
General Chairs:
Pascale Fung
Hong Kong University of Science & Technology-HKUST
,
Pierre Zweigenbaum
LIMSI-CNRS, France
,
Reinhard Rapp
University of Mainz, Germany & University of Tarragona, Spain
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 August 2009
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 297
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

BUCC '09: Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extracting Translation Equivalents from Bilingual Comparable Corpora

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Unsupervised word sense disambiguation using bilingual comparable corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora

BUCC '09: Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extracting Translation Equivalents from Bilingual Comparable Corpora

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Unsupervised word sense disambiguation using bilingual comparable corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media