NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED516920
Record Type: Non-Journal
Publication Date: 2010
Pages: 176
Abstractor: As Provided
ISBN: ISBN-978-1-1097-8222-6
ISSN: N/A
EISSN: N/A
Paradigms of Evaluation in Natural Language Processing: Field Linguistics for Glass Box Testing
Cohen, Kevin Bretonnel
ProQuest LLC, Ph.D. Dissertation, University of Colorado at Boulder
Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of experiments that in the aggregate demonstrate the feasibility of software testing or glass box evaluation for natural language processing, and in the process validates the claim that the techniques of descriptive linguistics and field methods are a sound methodological approach to doing such testing. Various chapters consider the issue from the perspectives of the application of fieldwork techniques to software testing, applications of linguistics-informed software engineering to NLP, applications of the descriptive linguistics concept of complementary distribution to problems in NLP, and applications of descriptive linguistics concepts to the problem of quality assurance for semantic representations in proposition banks. In the experiment that most clearly shows the connection between linguistic fieldwork and software testing, a test suite that is constructed like a field linguist's elicitation schedule is used to find performance errors in five named entity recognition programs and to predict the performance of one program on several equivalence classes of named entities. In another experiment, from the software engineering perspective, a linguistically-informed fault model is used to isolate the source of a performance anomaly in a language processing application. In three subsequent experiments, a discovery procedure for minimal pairs and free variation is used to approach a problem in the normalization of named entities and a discovery procedure for complementary distribution is used to diagnose problematic semantic representations. The latter technique is applied to two corpora and two sets of predicate-argument structures; it is shown that the technique labels true positives with an accuracy of 69%. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A