NotesFAQContact Us
Collection
Advanced
Search Tips
ERIC Number: ED357036
Record Type: Non-Journal
Publication Date: 1990-Sep
Pages: 76
Abstractor: N/A
Reference Count: N/A
ISBN: N/A
ISSN: N/A
Benchmarking Text Understanding Systems to Human Performance: An Exploration.
Butler, Frances A.; And Others
This study, part of a larger effort to develop a methodology for evaluating intelligent computer systems (Artificial Intelligence Systems), explores the use of benchmarking as an evaluation technique. Benchmarking means comparing the performance of intelligent computer systems with human performance on the same task. Benchmarking in evaluation has been concentrated in the areas of natural language understanding and expert systems. A criterion reading measure was used so that national grade level norms for reading could be established and would provide the anchor for benchmarking the text understanding systems. Eleven text understanding systems were considered, and 6 were finally chosen for a pilot test with 13 adults and 3 school-age students. The refined reading comprehension test was administered to 74 sixth graders, 273 eighth graders and 58 eleventh graders. Comprehensive Test of Basic Skills scores were available for all of the students in the sample. Due to the relatively small and clustered subject sample, it was possible to neither benchmark on a continuous scale nor determine the statistical significance of many of the results. Nevertheless, general descriptive results indicate that a human benchmark methodology can distinguish certain kinds of natural language processing abilities of intelligent computer systems. Six tables present study results. Seven appendixes present supplemental information about the study and the computer systems used. (SLD)
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: Office of Naval Research, Arlington, VA.
Authoring Institution: National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA.
Identifiers - Assessments and Surveys: Comprehensive Tests of Basic Skills