ABSTRACT
Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that part of this is due to evaluation issues and estimate that a deletion model is in fact compatible with approximately 55% of the observed data. We analyse the remaining problems and conclude that in those cases word order changes and paraphrasing are crucial, and argue for more elaborate sentence compression models which build on NLG work.
- Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 16--23, Morristown, NJ, USA. Google ScholarDigital Library
- Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of NLG systems. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 313--320.Google Scholar
- James Clarke and Mirella Lapata. 2006. Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 377--384, Morristown, NJ, USA. Google ScholarDigital Library
- James Clarke and Mirella Lapata. 2008. Global inference for sentence compression an integer linear programming approach. Journal of Artificial Intelligence Research, 31:399--429. Google ScholarDigital Library
- Simon Corston-Oliver. 2001. Text compaction for display on very small screens. In Proceedings of the Workshop on Automatic Summarization (WAS 2001), pages 89--98, Pittsburgh, PA, USA.Google Scholar
- Walter Daelemans, Anita Höthker, and Erik Tjong Kim Sang. 2004. Automatic sentence simplification for subtitling in Dutch and English. In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 1045--1048.Google Scholar
- Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, pages 350--356, Morristown, NJ, USA. Google ScholarDigital Library
- Ali Ibrahim, Boris Katz, and Jimmy Lin. 2003. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of the 2nd International Workshop on Paraphrasing, volume 16, pages 57--64, Sapporo, Japan. Google ScholarDigital Library
- Kentaro Inui, Takenobu Tokunaga, and Hozumi Tanaka. 1992. Text Revision: A Model and Its Implementation. In Proceedings of the 6th International Workshop on Natural Language Generation: Aspects of Automated Natural Language Generation, pages 215--230. Springer-Verlag London, UK. Google ScholarDigital Library
- Hongyan Jing and Kathleen McKeown. 2000. Cut and paste based text summarization. In Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics, pages 178--185, San Francisco, CA, USA. Google ScholarDigital Library
- Kevin Knight and Daniel Marcu. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91--107. Google ScholarDigital Library
- Nguyen Minh Le and Susumu Horiguchi. 2003. A New Sentence Reduction based on Decision Tree Model. In Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, pages 290--297.Google Scholar
- Dekang Lin and Patrick Pantel. 2001. Discovery of inference rules for question answering. Natural Language Engineering, 7(4):343--360. Google ScholarDigital Library
- Chin-Yew Lin. 2003. Improving summarization performance by sentence compression - A pilot study. In Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, volume 2003, pages 1--9. Google ScholarDigital Library
- Erwin Marsi and Emiel Krahmer. 2007. Annotating a parallel monolingual treebank with semantic similarity relations. In Proceedings of the 6th International Workshop on Treebanks and Linguistic Theories, pages 85--96, Bergen, Norway.Google Scholar
- Jenine Turner and Eugene Charniak. 2005. Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 290--297, Ann Arbor, Michigan, June. Google ScholarDigital Library
- Vincent Vandeghinste and Yi Pan. 2004. Sentence compression for automated subtitling: A hybrid approach. In Proceedings of the ACL Workshop on Text Summarization, pages 89--95.Google Scholar
- Vincent Vandeghinste and Erik Tsjong Kim Sang. 2004. Using a Parallel Transcript/Subtitle Corpus for Sentence Compression. In Proceedings of LREC 2004.Google Scholar
- David Zajic, Bonnie J. Dorr, Jimmy Lin, and Richard Schwartz. 2007. Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing Management, 43(6):1549--1570. Google ScholarDigital Library
- Is sentence compression an NLG task?
Recommendations
Paraphrastic sentence compression with a character-based metric: tightening without deletion
MTTG '11: Proceedings of the Workshop on Monolingual Text-To-Text GenerationWe present a substitution-only approach to sentence compression which "tightens" a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60% of the original length. In support ...
SemEval-2010 task 3: cross-lingual word sense disambiguation
SEW '09: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future DirectionsWe propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the ...
Discourse segmentation for sentence compression
MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part IEarlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization ...
Comments