ERIC Number: ED463324
Record Type: Non-Journal
Publication Date: 2000
Reference Count: N/A
Comparing Computerized and Human Scoring of Students' Essays.
Sireci, Stephen G.; Rizavi, Saba
Although computer-based testing is becoming popular, many of these tests are limited to the use of selected-response item formats due to the difficulty in mechanically scoring constructed-response items. This limitation is unfortunate because many constructs, such as writing proficiency, can be measured more directly using items that require examinees to produce a response. Therefore, computerized scoring of essays and other constructed response items is an important area of research. This study compared computerized scoring of essays with the scores produced by two independent human graders. Data were essay scores for 931 students from 24 postsecondary institutions in Texas. Although high levels of computer-human congruence were observed, the human graders were more consistent with one another than the computer was with them. Statistical methods for evaluating computer-human congruence are presented. The case is made that the percentage agreement statistics that appear in the literature are insufficient for comparing the computerized and human scoring of constructed response items. In this study, scoring differences were most pronounced when researchers looked at the percentage of essays scored exactly the same, the percentage scored the same at specific score points, and the percentage of exact agreement corrected for chance. The implications for future research in this area are discussed. (Contains 11 tables, 2 figures, and 15 references.) (Author/SLD)
Publication Type: Numerical/Quantitative Data; Reports - Research
Education Level: N/A
Sponsor: College Board, New York, NY.
Authoring Institution: Massachusetts Univ., Amherst. Laboratory of Psychometric and Evaluative Research.