NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1020230
Record Type: Journal
Publication Date: 2014
Pages: 28
Abstractor: As Provided
ISSN: ISSN-1467-9620
State and Local Efforts to Investigate the Validity and Reliability of Scores from Teacher Evaluation Systems
Herlihy, Corinne; Karger, Ezra; Pollard, Cynthia; Hill, Heather C.; Kraft, Matthew A.; Williams, Megan; Howard, Sarah
Teachers College Record, v116 n1 2014
Context: In the past two years, states have implemented sweeping reforms to their teacher evaluation systems in response to Race to the Top legislation and, more recently, NCLB waivers. With these new systems, policymakers hope to make teacher evaluation both more rigorous and more grounded in specific job performance domains such as teaching quality and contributions to student outcomes. Attaching high stakes to teacher scores has prompted an increased focus on the reliability and validity of these scores. Teachers unions have expressed strong concerns about the reliability and validity of using student achievement data to evaluate teachers and the potential for subjective ratings by classroom observers to be biased. The legislation enacted by many states also requires scores derived from teacher observations and the overall systems of teacher evaluation to be valid and reliable. Focus of the study: In this paper, we explore how state education officials and their district and local partners plan to implement and evaluate their teacher evaluation systems, focusing in particular on states' efforts to investigate the reliability and validity of scores emerging from the observational component of these systems. Research design: Through document analysis and interviews with state education officials, we explore several issues that arise in observational systems, including the overall generalizability of teacher scores; the training, certification, and reliability of observers; and specifications regarding the sampling and number of lessons observed per teacher. Findings: Respondents' reports suggest that states are attending to the reliability and validity of scores, but inconsistently; in only a few states does there appear to be a coherent strategy regarding reliability and validity in place. Conclusions: There remain a variety of system design and implementation decisions that states can optimize to increase the reliability and validity of their teacher evaluation scores. While a state may engage in auditing scores, for instance, it may miss the gains to reliability and validity that would accrue from periodic rater retraining and recertification, a stiff program of rater monitoring, and the use of multiple raters per teacher. Most troublesome are decisions about which and how many lessons to sample, which are either mandated legislatively, result from practical concerns or negotiations between stakeholders, or, at best case, rest on broad research not directly related to the state context. This suggests that states should more actively investigate the number of lessons and lesson sampling designs required to yield high-quality scores.
Teachers College, Columbia University. P.O. Box 103, 525 West 120th Street, New York, NY 10027. Tel: 212-678-3774; Fax: 212-678-6619; e-mail:; Web site:
Publication Type: Journal Articles; Reports - Research
Education Level: Elementary Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Laws, Policies, & Programs: No Child Left Behind Act 2001; Race to the Top
IES Funded: Yes
Grant or Contract Numbers: R305C090023
IES Cited: ED563445