NotesFAQContact Us
Collection
Advanced
Search Tips
ERIC Number: ED548910
Record Type: Non-Journal
Publication Date: 2012
Pages: 161
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2677-3538-6
ISSN: N/A
Dependency Structures for Statistical Machine Translation
Bach, Nguyen
ProQuest LLC, Ph.D. Dissertation, Carnegie Mellon University
Dependency structures represent a sentence as a set of dependency relations. Normally the dependency structures from a tree connect all the words in a sentence. One of the most defining characters of dependency structures is the ability to bring long distance dependency between words to local dependency structures. Another the main attraction of dependency structures has been its close correspondence to meaning. This thesis focuses on integrating dependency structures into machine translation components including decoder algorithm, reordering models, confidence measure, and sentence simplification. First, we develop four novel "cohesive soft constraints" for a phrase-based decoder namely exhaustive interruption check, interruption count, exhaustive interruption count, and rich interruption constraints. To ensure the robustness and effectiveness of the proposed constraints, we conduct experiments on four different language pairs, including English-{Iraqi, Spanish} and {Arabic, Chinese}-English. The improvements are in between 0.4 and 1.8 BLEU points. These experiments also cover a wide range of training corpus sizes, ranging from 500K sentence pairs up to 10 million sentence pairs. Furthermore, to show the effectiveness of our proposed methods we apply them to systems using a 2.7 billion words 5-gram LM, different reordering models and dependency parsers. Second, to go beyond cohesive soft constraints, we investigate efficient algorithms for learning and decoding with "source-side dependency tree reordering models." We propose a novel source-tree reordering model that exploits dependency subtree "inside/outside" movements and cohesive soft constraints. These movements and constraints enable us to efficiently capture the subtree-to-subtree transitions observed both in the source of word-aligned training data and in the decoding time. Representing subtree movements as features allows MERT to train the corresponding weights for these features relative to others in the model. Moreover, experimental results on English-{Iraqi, Spanish} show that we obtain improvements "+0.8" BLEU and "-1.4" TER on English-Spanish and "+0.8" BLEU and "-2.3" TER on English-Iraqi. Third, we develop "Goodness," a novel framework to predict word and sentence level of "machine translation confidence" with dependency structures. The framework allows MT systems to inform users which words are likely translated correctly and how confident it is about the whole sentence. Experimental results show that the MT error prediction accuracy is increased from "69.1" to "72.2" in F-score. The Pearson correlation between the proposed confidence measure and the human-targeted translation edit rate (HTER) is "0.6." Improvements between "0.4" and "0.9" TER reduction are obtained with the n-best list reranking task using the proposed confidence measure. Also, we present a visualization prototype of MT errors at the word and sentence levels with the objective to improve post-editor productivity. Finally, inspired by study in summarization we propose "TriS", a novel framework to simplify source sentences before translating them. We build a "statistical sentence" simplification system with log-linear models. In contrast to state-of-the-art methods that drive sentence simplification process by hand-written linguistic rules, our method used a margin-based discriminative learning algorithm operates on a feature set. The feature set is defined on statistics of dependency structures as well as surface form and syntactic structures of sentences. A stack decoding algorithm is developed in order to efficiently generate and search simplification hypotheses. Experimental results show that the simplified text produced by the proposed system reduces "1.7" Flesch-Kincaid grade level when compared with the original text. We show that a comparison of a state-of-the-art rule-based system to the proposed system demonstrates an improvement of "0.2," "0.6," and "4.5" points in ROUGE-2, ROUGE-4, and AveF 10, respectively. We present subjective evaluations of the simplified translation quality for an English-German MT system. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Assessments and Surveys: Flesch Kincaid Grade Level Formula