NotesFAQContact Us
Collection
Advanced
Search Tips
ERIC Number: ED546350
Record Type: Non-Journal
Publication Date: 2012
Pages: 168
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2676-3310-1
ISSN: N/A
Linguistically Motivated Features for CCG Realization Ranking
Rajkumar, Rajakrishnan
ProQuest LLC, Ph.D. Dissertation, The Ohio State University
Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface realization. This thesis is about designing novel, linguistically motivated features for surface realization (the final NLG module mentioned above), the process by which text is created from an abstract representation of language according to the rules of syntax and morphology. It primarily involves three interrelated problems: constituent ordering, inflection and agreement and function word insertion. For addressing these problems, most state-of-the-art realization ranking models (Velldal and Oepen, 2005; White and Rajkumar,2009) employ features which are based on very basic insights from linguistic theory (POS tags, rules derived from parse trees, for example). More sophisticated insights of linguistic theory have not been widely perceived as necessary for increased system performance, with very basic insights providing the most gains (similar to the situation Johnson (2009) describes in the context of natural language parsing). In contrast, our goal is to design features motivated by insights from theoretical linguistics and also based on cognitively plausible accounts of language comprehension discussed in the linguistics literature, so that the realization ranking model can better approximate human judgements of fluency and acceptability. We show that the minimal dependency length theory (Gibson, 1998; Temperley, 2007) helps with the constituent ordering problem in surface realization. For the problem of generating correct inflected word forms, we demonstrate that a machine learning-based approach is well-suited to encode insights from the theoretical linguistics literature on English agreement (Kathol, 1999; Pollard and Sag,1994). This approach leads to improvements over a competitive baseline model containing n-gram and parsing features (of the kind described in Johnson, 2009). Finally, we demonstrate empirically that the uniform information density principle discussed in (Jaeger, 2010) contributes towards the that-complementizer choice in the context of surface realization. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A