NotesFAQContact Us
Search Tips
ERIC Number: ED526437
Record Type: Non-Journal
Publication Date: 2011
Pages: 291
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1245-3517-3
Investigating Raters' Development of Rating Ability on a Second Language Speaking Assessment
Kim, Hyun Jung
ProQuest LLC, Ed.D. Dissertation, Teachers College, Columbia University
The purpose of the study was to investigate the extent to which raters coming from diverse backgrounds exhibited different levels of rating ability while scoring speaking performances. The study also aimed to examine how raters with different backgrounds could develop their rating ability over time. For this purpose, raters' background characteristics were first explored in regard to (1) experience in rating L2 speaking assessments, (2) TESOL experience, (3) rater training accompanied with rating experience, and (4) relevant coursework completed. Raters were classified into novice, developing, and expert groups accordingly in order to examine the extent to which the three rater groups exhibited different scoring behaviors in each of the three rating sessions, which were separated by a one-month interval. Each rater group's changes in rating patterns were also investigated across the rating sessions. In each of the three rating sessions, the three groups of raters scored a set of pre-recorded speaking responses to five semi-direct placement speaking tasks with an analytic scoring rubric. The raters also recorded how they arrived at certain scoring decisions while rating examinee responses on the first two tasks. Before each rating session the raters were trained, and before the second and third rating sessions they were provided with individual feedback on their previous rating performance. The three groups of raters' analytic ratings were statistically analyzed in the first phase of the study, focusing on severity, internal consistency, and interaction effects. Statistically, the novice and developing rater groups did not show distinctive rating patterns, especially in regard to interaction effects, while the expert raters displayed the highest rating ability across the three rating sessions. However, in the second phase of the study, in which the raters' verbal reports were qualitatively analyzed focusing on their use of the given scoring criteria, the three groups of raters displayed different rating patterns and developmental paths across the three rating session's. The findings from this study suggest that the different weaknesses that the three rater groups exhibited need to be addressed through individual or group rater training to help raters improve rating ability, and ultimately to minimize rater effects. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A