NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1042796
Record Type: Journal
Publication Date: 2014-Oct
Pages: 27
Abstractor: As Provided
Reference Count: 50
ISSN: ISSN-0265-5322
An Examination of Rater Performance on a Local Oral English Proficiency Test: A Mixed-Methods Approach
Yan, Xun
Language Testing, v31 n4 p501-527 Oct 2014
This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of rater performance and to inform rater training. Specifically, both quantitative (6338 valid rating scores) and qualitative data (506 sets of rater comments) were analyzed with respect to rater consistency, rater consensus, rater severity, rater interaction, and raters' use of rating scale. While raters achieved overall satisfactory inter-rater reliability (r = 0.73), they differed in severity and achieved relatively low exact score agreement. Disagreement of rating scores was largely explained by two significant main effects: (1) examinees' oral English proficiency level, that is, raters tend to agree more on higher score levels than on lower score levels; (2) raters' differential severity due to raters' varied perceptions of speech intelligibility toward Indian and low-proficient Chinese examinees. However, effect sizes of raters' differential severity effect on overall rater agreement were rather small, suggesting that varied perceptions among trained raters of second language (L2) intelligibility, though possible, are not likely to have a large impact on the overall evaluation of oral English proficiency. In contrast, at the lower score levels, examinees' varied language proficiency profiles generated difficulty for rater alignment. Rater disagreement at these levels accounted for most of the overall rater disagreement and thus should be focused on during rater training. Implication of this study is that interpretation of rater performance should not just focus on identifying interactions between raters' and examinees' linguistic background but also examine the impact of rater interactions across examinees' language proficiency levels. Findings of this study also indicate effectiveness of triangulating different sources of data on rater performance using a mixed-methods approach, especially in local testing contexts.
SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail:; Web site:
Publication Type: Journal Articles; Reports - Research
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A