NotesFAQContact Us
Search Tips
ERIC Number: ED503777
Record Type: Non-Journal
Publication Date: 2005
Pages: 35
Abstractor: As Provided
Adjusting for Year to Year Rater Variation in IRT Linking--An Empirical Evaluation
Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg
Online Submission, Paper presented at the Annual Meeting of the National Council on Measurement in Education (Montreal, Canada, 2005)
The main purpose of this study was to illustrate a polytomous IRT-based linking procedure that adjusts for rater variations. Test scores from two administrations of a statewide reading assessment were used. An anchor set of Year 1 students' constructed responses were rescored by Year 2 raters. To adjust for year-to-year rater variation in IRT linking, a two-step approach was used. First, the Year 1 item parameters for the constructed response items were re-estimated using the ratings assigned by the Year 2 raters. Through this recalibration process, the Year 1 item parameters for the constructed response items were adjusted for rater variation. Second, the Stocking-Lord equating method was used to place the Year 2 form on the Year 1 scale using all the common items between forms. This method was compared with two alternative methods: traditional IRT linking study that links the test forms using (a) all the common items without the rater adjustment and (b) the common selected response items. The results of the Stocking-Lord procedure was compared among the three methods. The differences in the test characteristic curves and the students' scale score distributions produced by the three methods were also compared. Significant shifts in the parameters after rater adjustment were found for one (grade 8) of the three grades examined. The p-values and TCCs shifted across years when adjusted for rater effects. The impact of the parameter shifts and TCCs manifested in the changes in the proficiency classification before and after adjustment. However, a systematic bias might have been introduced in the equating process while trying to adjust for the rater variation through the equating process. Further studies are needed to address this problem in greater detail. Using only the common SR items seems to produce satisfactory results. Thus, in the case where it is not feasible to integrate rater adjustment in the equating process, using SR item anchors is a better approach than using the mixed-item anchors without adjusting for rater effect. The results of the study suggest that raters were not consistently more severe or more lenient between grades, but the resulting rater error (severity or leniency) affected the scores and thereby produced misleading results if not taken into account. (Contains 10 tables and 9 figures.)
Publication Type: Reports - Evaluative; Speeches/Meeting Papers
Education Level: Grade 4; Grade 6; Grade 8
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A