NotesFAQContact Us
Collection
Advanced
Search Tips
ERIC Number: ED563280
Record Type: Non-Journal
Publication Date: 2012
Pages: 132
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3035-0962-9
ISSN: N/A
English Complex Verb Constructions: Identification and Inference
Tu, Yuancheng
ProQuest LLC, Ph.D. Dissertation, University of Illinois at Urbana-Champaign
The fundamental problem faced by automatic text understanding in Natural Language Processing (NLP) is to identify semantically related pieces of text and integrate them together to compute the meaning of the whole text. However, the principle of compositionality runs into trouble very quickly when real language is examined with its frequent appearance of Multiword Expressions (MWEs) whose meaning is not based on the meaning of their parts. MWEs occur in all text genres and are far more frequent and productive than are generally recognized, and pose serious difficulties for every kind of NLP applications. Given these diverse kinds of MWEs, this dissertation focuses on English verb related MWEs, constructs stochastic models to identify these complex verb predicates within the given context and discusses empirically the significance of this MWE recognition component in the context of Textual Entailment (TE), an intricate semantic inference task that involves various levels of linguistic knowledge and logic reasoning. This dissertation develops high quality computational models for three of the most frequent kinds of English complex verb constructions: Light Verb Construction (LVC), Phrasal Verb Construction (PVC) and Embedded Verb Construction (EVC), and demonstrates empirically their usage in textual entailment. The discriminative model for LVC identification achieves an 86.3% accuracy when trained with groups of either contextual and statistical features. For PVC identification, the learning model reaches 79.4% accuracy, a 41.1% error reduction compared to the baseline. In addition, adding the LVC classifier helps the simple but robust lexical TE system achieve a 39.5% error reduction in accuracy and a 21.6% absolute F1 value improvement. Similar improvements are achieved by adding the PVC and EVC classifiers into this entailment system with a 30.6% and 39.4% absolute accuracy improvement respectively. In this dissertation, two types of automation are achieved with respect to English complex verb predicates: learning to recognize these MWEs within a given context and discovering the significance of this identification within an empirical semantic NLP application, i.e., textual entailment. The lack of benchmark datasets with respect to these special linguistic phenomena is the main bottleneck to advance the computational research in them. The study presented in this dissertation provides two benchmark datasets related to the identification of LVCs and PVCs respectively and three linguistic phenomenon specified TE datasets to automate the investigation of the significance of these linguistic phenomena within a TE system. These datasets enable us to make a direct evaluation and comparison of lexically based models, reveal insightful differences between them, and create a simple but robust improved model combination. In the long run, we believe that the availability of these datasets will facilitate improved models that consider the various special multiword related phenomena within the complex semantic systems, as well as applying supervised machine learning models to optimize model combination and performance. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A