NotesFAQContact Us
Search Tips
ERIC Number: ED518883
Record Type: Non-Journal
Publication Date: 2010
Pages: 221
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1242-1949-3
Building a Reference Resolution System Using Human Language Processing for Inspiration
Watters, Shana Kay
ProQuest LLC, Ph.D. Dissertation, University of Minnesota
For over 30 years, reference resolution, the process of determining what a noun phrase including a pronoun refers to in written and spoken language, has been an important and on-going area of research. Most existing pronominal reference resolution algorithms and systems are designed to use syntactic information and surface features (e.g. number and gender). These lines of research with regard to pronominal reference resolution have plateaued with accuracy rates in the vicinity of 80% (plus or minus 10), depending on the domain and techniques used. This thesis explores how to incorporate multiple theories and algorithms into a single system (i.e. a pipeline of components each specializing in a certain aspect of reference resolution). Our framework combines subsystems that each specialize in an aspect of reference resolution for the pronoun "it." The framework contains a total of five subsystems: (1) Creates a set of prospective antecedents that is previous forms such as noun phrases, clauses, and verb phrases that introduce possible referents. Rules established by our empirical study investigating the Givenness Hierarchy's claim that the cognitive status of being "in focus" is necessary for being a referent of "it" are used to guide antecedent selection. (2) Uses binding theory to disqualify possible antecedents using syntactic information. (3) Uses number and gender to disqualify possible antecedents. (4) Creates a framework for semantic reasoning by integrating information from VerbNet, Propbank, and WordNet. The framework allows for reasoning about what type of semantic restrictions and constraints for a given verb can be enforced on the prospective antecedent of "it." (5) When two or more forms remain in the set of prospective antecedents, a preference-based algorithm is employed to select the best guess from the set of possible antecedents. The framework created by this thesis includes a database and a computer system that implements a portion of the pipelined architecture. The database describes in tabular form all the information used to create the semantic reasoning subsystem, the parts of the Penn Treebank Wall Street Journal corpus used for testing, the information used by the number and gender subsystems, the results of each stage of the pipelined system, and the information used to create the preference-based algorithm for the best guess. The system integrates research from the fields of linguistics, cognitive science, and computer science to create the next generation of reference resolution systems capable of understanding what we mean when we write or talk. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A