NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
PDF on ERIC Download full text
ERIC Number: ED562804
Record Type: Non-Journal
Publication Date: 2014
Pages: 8
Abstractor: ERIC
Reference Count: 10
Noninvariant Measurement in Rater-Mediated Assessments of Teaching Quality
Kelcey, Ben
Society for Research on Educational Effectiveness
Valid and reliable measurement of teaching is essential to evaluating and improving teacher effectiveness and advancing large-scale policy-relevant research in education (Raudenbush & Sadoff, 2008). One increasingly common component of teaching evaluations is the direct observation of teachers in their classrooms. Classroom observations have been long viewed as a promising way to evaluate and develop teachers because they anchor assessments in specific and observable criteria (Gitomer, 2009). Despite the potential of classroom observations to identify strengths and address specific weaknesses in teachers' practices, a significant problem with observed teaching scores is that they confound construct-irrelevant variation with persistent teaching quality (i.e., observed scores are not independent of the characteristics of a specific observation). In observational assessments of teaching, one key source of construct-irrelevant variation is the differences among raters. Research has shown that even after extensive training there are important differences in how raters interpret evidence and that these differences potentially introduce variability in the structure of the scale established by the guiding rubric/instrument (e.g., Eckes, 2009; Hill, Charlambous, & Kraft, 2012; Engelhard, 2002). The focus of this study was to develop and investigate a set of psychometric methods that accommodate, as best as possible, measurement noninvariance among raters. To estimate persistent teaching quality using rater-mediated classroom observations, a cross-classified multilevel random item effects graded response model was developed. Because instruments have different visions of teaching quality and use different systems and sets of competencies to operationalize these theories, instruments may vary in their sensitivity to rater effects. Instruments that are less sensitive to rater differences may reduce the value of the proposed method. Evidence from this study suggests the promise of random item effect models to address measurement non-invariance in rater-mediated assessments. However, there is a question of whether random item effects and the associated approximate measurement invariance can adequately compensate for differences in the scales raters use. Tables are appended.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; Fax: 202-640-4401; e-mail:; Web site:
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)