ABSTRACT
Whereas multilingual comparable corpora have been used to identify translations of words or terms, monolingual corpora can help identify paraphrases. The present work addresses paraphrases found between two different discourse types: specialized and lay texts. We therefore built comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions. We identified two devices used in such paraphrases: nominalizations and neo-classical compounds. The results showed that the paraphrases had a good precision and that nominalizations were indeed relevant in the context of studying the differences between specialized and lay language. Neo-classical compounds were less conclusive. This study also demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected.
- Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597--604. Google ScholarDigital Library
- Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In HLT-NAACL, pages 16--23, Edmonton, Canada. Google ScholarDigital Library
- Regina Barzilay and Kathleen McKeown. 2001. Extracting paraphrases from a parallel corpus. In ACL/EACL, pages 50--57. Google ScholarDigital Library
- Regina Barzilay. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University. Google ScholarDigital Library
- Yun-Chuang Chiao and Pierre Zweigenbaum. 2002. Looking for French-English translations in comparable medical corpora. In Proc AMIA Symp, pages 150--4.Google Scholar
- Louise Deléger and Pierre Zweigenbaum. 2008a. Aligning lay and specialized passages in comparable medical corpora. In Stud Health Technol Inform, volume 136, pages 89--94.Google Scholar
- Louise Deléger and Pierre Zweigenbaum. 2008b. Paraphrase acquisition from comparable medical corpora of specialized and lay texts. In Proceedings of the AMIA Annual Fall Symposium, pages 146--150, Washington, DC.Google Scholar
- Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In ACL BioNLP Workshop, pages 49--56, Prague, Czech Republic. Google ScholarDigital Library
- Zhihui Fang. 2005. Scientific literacy: A systemic functional linguistics perspective. Science Education, 89(2):335--347.Google ScholarCross Ref
- Nabil Hathout, Fiammetta Namer, and Georgette Dal. 2002. An Experimental Constructional Database: The MorTAL Project. In Many Morphologies, pages 178--209.Google Scholar
- Marti A. Hearst. 1997. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64. Google ScholarDigital Library
- Ali Ibrahim, Boris Katz, and Jimmy Lin. 2003. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of the second international workshop on Paraphrasing, pages 57--64, Sapporo, Japan. Association for Computational Linguistics. Google ScholarDigital Library
- Christian Jacquemin. 1999. Syntagmatic and paradigmatic representations of term variation. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 341--348, College Park, Maryland. Google ScholarDigital Library
- Adam Kilgarriff and Gregory Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3):333--47. Google ScholarDigital Library
- Aurélien Max. 2008. Local rephrasing suggestions for supporting the work of writers. In Proceedings of GoTAL, Gothenburg, Sweden. Google ScholarDigital Library
- Fiammetta Namer and Pierre Zweigenbaum. 2004. Acquiring meaning for French medical terminology: contribution of morphosemantics. In Marius Fieschi, Enrico Coiera, and Yu-Chuan Jack Li, editors, MEDINFO, pages 535--539, San Francisco.Google Scholar
- Marius Pasca and Peter Dienes. 2005. Aligning needles in a haystack: Paraphrase acquisition across the web. In Proceedings of IJCNLP, pages 119--130. Google ScholarDigital Library
- Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 320--322. Google ScholarDigital Library
- Yusuke Shinyama and Satoshi Sekine. 2003. Paraphrase acquisition for information extraction. In Proceedings of the second international workshop on Paraphrasing (IWP), pages 65--71, Sapporo, Japan. Google ScholarDigital Library
Index Terms
- Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora
Recommendations
Extracting Translation Equivalents from Bilingual Comparable Corpora
An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon---which is used to bridge contexts in ...
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...
Unsupervised word sense disambiguation using bilingual comparable corpora
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1An unsupervised method for word sense disambiguation using a bilingual comparable corpus was developed. First, it extracts statistically significant pairs of related words from the corpus of each language. Then, aligning pairs of related words ...
Comments