Overview

Authors:

Philip Williams ⁰,
Rico Sennrich ¹,
Matt Post ²,
…
Philipp Koehn ³

Philip Williams
1. University of Edinburgh, USA
View author publications

You can also search for this author in PubMed Google Scholar
Rico Sennrich
1. University of Edinburgh, USA
View author publications

You can also search for this author in PubMed Google Scholar
Matt Post
1. Johns Hopkins University, USA
View author publications

You can also search for this author in PubMed Google Scholar
Philipp Koehn
1. Johns Hopkins University, USA
View author publications

You can also search for this author in PubMed Google Scholar

Part of the book series: Synthesis Lectures on Human Language Technologies (SLHLT)

324 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 39.99

Price excludes VAT (USA)

Softcover Book USD 54.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (7 chapters)

Front Matter

Pages i-xvii

Download chapter PDF
Models
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 1-30
Learning from Parallel Text
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 31-46
Decoding I: Preliminaries
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 47-67
Decoding II: Tree Decoding
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 69-100
Decoding III: String Decoding
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 101-123
Selected Topics
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 125-151
Closing Remarks
- Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Pages 153-156
Back Matter

Pages 157-190

Download chapter PDF

About this book

This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models.

The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, includingsearch approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.

Authors and Affiliations

University of Edinburgh, USA

Philip Williams, Rico Sennrich
Johns Hopkins University, USA

Matt Post, Philipp Koehn

About the authors

Philip Williams is a Research Associate at the University of Edinburgh, where he completed his Ph.D. in 2014. His main research interest is the integration of linguistic information into statistical machine translation. In his thesis, he applied unification-based constraints to syntax-based statistical machine translation. He is the main contributor to the syntax-based models in the Moses toolkit.Rico Sennrich is a Research Associate at the University of Edinburgh. He received his Ph.D. in Computational Linguistics from the University of Zurich in 2013. His research focuses on data-driven natural language processing, in particular machine translation, syntax, and morphology. His contributions to syntax-based machine translation include a more efficient algorithm for SCFG decoding, and novel models for syntactic language modelling and productive generation of compounds. He developed syntax-based SMT systems for English-German that were tied for first place in the shared translationtasks of WMT 2014 and 2015.
Rico Sennrich is a Research Associate at the University of Edinburgh. He received his Ph.D. in Computational Linguistics from the University of Zurich in 2013. His research focuses on data-driven natural language processing, in particular machine translation, syntax, and morphology. His contributions to syntax-based machine translation include a more efficient algorithm for SCFG decoding, and novel models for syntactic language modelling and productive generation of compounds. He developed syntax-based SMT systems for English-German that were tied for first place in the shared translation tasks of WMT 2014 and 2015.
Philipp Koehn is a Professor of Computer Science at Johns Hopkins University, where he is affiliated with the Center for Language and Speech Processing. He also is the Chair of Machine Translation at the University of Edinburgh. He received his Ph.D. in 2003 from the University of Southern California. He is the creator and maintainer of Moses, the de facto statistical machine translation system, used throughout the world in both research and industry. He is a co-founder of the WMT Conference on Statistical Machine Translation, and author of the 2009 textbook Statistical Machine Translation.

Bibliographic Information

Book Title: Syntax-based Statistical Machine Translation
Authors: Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
Series Title: Synthesis Lectures on Human Language Technologies
DOI: https://doi.org/10.1007/978-3-031-02164-0
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0), eBColl Synthesis Collection 7
Copyright Information: Springer Nature Switzerland AG 2016
Softcover ISBN: 978-3-031-01036-1Published: 11 August 2016
eBook ISBN: 978-3-031-02164-0Published: 31 May 2022
Series ISSN: 1947-4040
Series E-ISSN: 1947-4059
Edition Number: 1
Number of Pages: XVIII, 190
Topics: Artificial Intelligence, Natural Language Processing (NLP), Computational Linguistics

Publish with us

Policies and ethics

Syntax-based Statistical Machine Translation

Overview

Access this book

Other ways to access

Table of contents (7 chapters)

Front Matter

Models

Learning from Parallel Text

Decoding I: Preliminaries

Decoding II: Tree Decoding

Decoding III: String Decoding

Selected Topics

Closing Remarks

Back Matter

About this book

Authors and Affiliations

University of Edinburgh, USA

Johns Hopkins University, USA

About the authors

Bibliographic Information

Publish with us

Search

Navigation