NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED513983
Record Type: Non-Journal
Publication Date: 2009
Pages: 232
Abstractor: As Provided
ISBN: ISBN-978-1-1095-9246-7
ISSN: N/A
EISSN: N/A
Prompt and Rater Effects in Second Language Writing Performance Assessment
Lim, Gad S.
ProQuest LLC, Ph.D. Dissertation, University of Michigan
Performance assessments have become the norm for evaluating language learners' writing abilities in international examinations of English proficiency. Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters. This raises the possibility of undue prompt and rater effects on test-takers' scores, which can affect the validity, reliability, and fairness of these tests. This study uses data from the Michigan English Language Assessment Battery (MELAB), including all official ratings given over a period of over four years (n=29,831), to examine these issues related to scoring validity. It uses the multi-facet extension of Rasch methodology to model this data, producing measures on a common, interval scale. First, the study investigates the comparability of prompts that differ on topic domain, rhetorical task, prompt length, task constraint, expected grammatical person of response, and number of tasks. It also considers whether prompts are differentially difficult for test takers of different genders, language backgrounds, and proficiency levels. Second, the study investigates the quality of raters' ratings, whether these are affected by time and by raters' experience and language background. It also considers whether raters alter their rating behavior depending on their perceptions of prompt difficulty and of test-takers' prompt selection behavior. The results show that test-takers' scores reflect actual ability in the construct being measured as operationalized in the rating scale, and are generally not affected by a range of prompt dimensions, rater variables, or test taker characteristics. It can be concluded that scores on this test and others whose particulars are like it have score validity, and assuming that other inferences in the validity argument are similarly warranted, can be used as a basis for making appropriate decisions. Further studies to develop a framework of task difficulty and a model of rater development are proposed. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A