ERIC Number: EJ1056804
Record Type: Journal
Publication Date: 2015
Abstractor: As Provided
Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P.
Applied Measurement in Education, v28 n2 p130-142 2015
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to data from a major testing program and one system of automated essay scoring. The refinement capitalizes on the fact that essay raters differ in numerous ways (e.g., training and experience), any of which may affect the quality of ratings. We found that automated scores exhibited different correlations with scores awarded by experienced raters (a more compelling criterion) than with those awarded by untrained raters (a less compelling criterion). The results suggest potential for a refined machine-human agreement model that differentiates raters with respect to experience, expertise, and possibly even more salient characteristics.
Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests, Interrater Reliability, Expertise, Novices, Scoring, Scoring Rubrics, Correlation, Weighted Scores, Regression (Statistics), Achievement Rating, Evaluators, Academic Standards, Undergraduate Students
Routledge. Available from: Taylor & Francis, Ltd. 325 Chestnut Street Suite 800, Philadelphia, PA 19106. Tel: 800-354-1420; Fax: 215-625-2940; Web site: http://www.tandf.co.uk/journals
Publication Type: Journal Articles; Reports - Research
Education Level: Higher Education; Postsecondary Education
Authoring Institution: N/A