NotesFAQContact Us
Search Tips
ERIC Number: ED554751
Record Type: Non-Journal
Publication Date: 2012
Pages: 279
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3030-3520-3
Rater Expertise in a Second Language Speaking Assessment: The Influence of Training and Experience
Davis, Lawrence Edward
ProQuest LLC, Ph.D. Dissertation, University of Hawai'i at Manoa
Speaking performance tests typically employ raters to produce scores; accordingly, variability in raters' scoring decisions has important consequences for test reliability and validity. One such source of variability is the rater's level of expertise in scoring. Therefore, it is important to understand how raters' performance is influenced by training and experience, as well as the features that distinguish more proficient raters from their less proficient counterparts. This dissertation examined the nature of rater expertise within a speaking test, and how training and increasing experience influenced raters' scoring patterns, cognition, and behavior. Experienced teachers of English (N = 20) scored recorded examinee responses from the TOEFL iBT speaking test prior to training and in three sessions following training (100 responses for each session). For an additional 20 responses, raters verbally reported (via stimulated recall) what they were thinking as they listened to the examinee response and made a scoring decision, with the resulting data coded for language features mentioned. Scores were analyzed using many-facet Rasch analysis, with scoring phenomena including consistency, severity, and use of the rating scale compared across dates. Various aspects of raters' interaction with the scoring instrument were also recorded to determine if certain behaviors, such as the time taken to reach a scoring decision, were associated with the reliability and accuracy of scores. Prior to training, rater severity and internal consistency (measured via Rasch analysis) were already of a standard typical for operational language performance tests, but training resulted in increased inter-rater correlation and agreement and improved correlation and agreement with established reference scores, although little change was seen in rater severity. Additional experience gained after training appeared to have little effect on rater scoring patterns, although agreement with reference scores continued to increase. More proficient raters reviewed benchmark responses more often and took longer to make scoring decisions, suggesting that rater behavior while scoring may influence the accuracy and reliability of scores. On the other hand, no obvious relationship was seen between raters' comments and their scoring patterns, with considerable individual variation seen in the frequency with which raters mentioned various language features. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Assessments and Surveys: Test of English as a Foreign Language