NotesFAQContact Us
Search Tips
ERIC Number: ED548843
Record Type: Non-Journal
Publication Date: 2012
Pages: 172
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2676-8348-9
Bean Soup Translation: Flexible, Linguistically-Motivated Syntax for Machine Translation
Mehay, Dennis Nolan
ProQuest LLC, Ph.D. Dissertation, The Ohio State University
Machine translation (MT) systems attempt to translate texts from one language into another by translating words from a "source language" and rearranging them into fluent utterances in a "target language." When the two languages organize concepts in very different ways, knowledge of their general sentence structure, or "syntax," is crucial. The syntax of the target language is particularly useful, because it provides a means of testing whether the reorderings that a system might try are grammatically licensed. This thesis presents two novel syntactic techniques that aid in producing correct and grammatical translations. The first technique controls target language reordering using syntactic categories that span multiple words. The second technique complements the first by assessing the well-formedness of "sequences formed" by these reorderings using the same syntactic categories. These innovations are implemented in the context of statistical phrase-based machine translation [Zens et al., 2002; Koehn et al., 2003], which is the prevailing modern translation paradigm. The main contribution of this thesis is to use the flexible syntax of Combinatory Categorial Grammar [CCG, Steedman, 2000] as the basis for deriving syntactic constituent labels for target strings in phrase-based systems, providing CCG labels for many target strings that traditional syntactic theories struggle to describe. These CCG labels are used to train novel syntax-based reordering and language models, which efficiently describe translation reordering patterns, as well as assess the grammaticality of target translations. The models are easily incorporated into phrase-based systems with minimal disruption to existing technology and achieve superior automatic metric scores and human evaluation ratings over a strong phrase-based baseline, as well as over syntax-based techniques that do not use CCG. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A