NotesFAQContact Us
Search Tips
ERIC Number: ED533062
Record Type: Non-Journal
Publication Date: 2011
Pages: 183
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1248-6409-9
Longitudinal Rater Modeling with Splines
Dobria, Lidia
ProQuest LLC, Ph.D. Dissertation, University of Illinois at Chicago
Performance assessments rely on the expert judgment of raters for the measurement of the quality of responses, and raters unavoidably introduce error in the scoring process. Defined as the tendency of a rater to assign higher or lower ratings, on average, than those assigned by other raters, even after accounting for differences in examinee proficiency, rater severity is a particular rater error that fluctuates over time giving rise to rater severity drift. Current approaches to detecting rater severity drift rely mostly on the use of auxiliary statistics of a Many-Faceted Rasch Model. These statistics can detect the drift of rater severity between two time points, but they do not adjust examinee proficiency estimates for its impact. This dissertation introduces a model-based approach to detecting and adjusting for the presence of rater severity drift in the form of the Longitudinal Rater Model (LRM), a member of the class of generalized additive mixed models. The LRM is a flexible model for data, which has random intercept parameters representing examinee proficiencies, fixed slope parameters representing item difficulties, and models rater severity using an arbitrary function of time, via natural cubic splines. The parameters of the models are estimated from the data via penalized maximum likelihood, with penalty for the smoothing splines selected via generalized cross-validation. A simulation study assessed the LRM in its ability to recover true data-generating parameters via sample-based estimation. The utility of the LRM was showcased using data from a high-stakes, large scale performance assessment. The penalized spline functions captured the linear and nonlinear drift patterns of 101 raters across eight time points simultaneously. Over half the raters drifted significantly in the levels of severity they displayed over time. The LRM produced examinee proficiency estimates that were adjusted for the presence of rater severity drift. For 17% of examinees, the initial scores were adjusted by one or more points. The LRM's capability to accurately detect the drift of rater severity over multiple time points and adjust examinee proficiency estimates for its impact can help testing organizations better monitor rater performance and ensure greater fairness of examinee scores. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A