Teacher Effectiveness on High- and Low-Stakes Tests
Corcoran, Sean P.; Jennings, Jennifer L.; Beveridge, Andrew A.
Society for Research on Educational Effectiveness
The authors use data from the Houston Independent School District to estimate teacher effects on two different academic tests of the same subject areas, administered in the same school year to the same students at approximately the same time of year. The first is the statewide "high-stakes" test administered as part of the Texas accountability system, while the second is a nationally-normed "low-stakes" test, intended as both an audit test and as a grade promotion tool. The authors focus on achievement in reading and math in the 4th and 5th grade. Given these two effectiveness measures, the authors address the following questions: (1) Do these estimates of teacher effectiveness suggest a similar level of variation in quality across teachers?; (2) How strongly are these two measures correlated? Is it the case that teachers who appear effective on a "high-stakes" state test are similarly effective on a "low-stakes" test of the same subject?; (3) Is one measure of teacher effectiveness more stable from year to year than the other?; (4) Are there differences in decay rates in teacher effects on high- and low-stakes tests?; and (5) To what extent does the high- and low-stakes nature of the test contribute to these differences? For this paper the authors drew from a longitudinal dataset of all students tested in the Houston Independent School District (HISD) between 1998 and 2006, approximately 165,000 per year. The authors' results indicate that teacher effects on the "high"-stakes test vary substantially more than those in the same subject on the low-stakes test. For the Texas state assessments (TAAS/TAKS), they find a standard deviation in teacher quality of 0.23 in reading and 0.28 in math. In contrast, the standard deviation is nearly half this size on the Stanford: 0.13 and 0.15, respectively. Teacher effects on different tests of the same subject in the same year are only modestly correlated, at 0.61 in math and 0.52 in reading. Figure 2 expresses this correlation another way, showing the proportion of teachers in each quintile of effectiveness on one test that ranked in quintiles 1-5 on the second test. As an illustration, they find only 48 percent of teachers in the top quintile of the TAKS math test were also in the top quintile of the Stanford test. A non-trivial share (13%) ranked among the "lowest" two quintiles of the Stanford. They find very little difference across the two tests in inter-temporal stability. Perhaps more importantly, they find that teacher effects on the high-stakes TAKS test decay at a much faster rate than those on the low-stakes Stanford test. Using the method proposed by Jacob, Lefgren, and Sims (forthcoming), they estimated the persistence of teacher-induced gains on achievement in later grades and found that 34% of a teacher's effect on grade 4 mathematics carried through to grade 5, as measured by the Stanford test, while only 16% of her effect on achievement persisted as measured on the high-stakes TAKS test. The corresponding numbers in reading were 31% and 20%. Finally, they find important differences in the impact of teacher observables on student performance across the two tests. The returns to teacher experience are compressed on the high-stakes test, such that the majority of the returns occur in the first 2 to 3 years. In contrast, they find positive returns to experience on the low-stakes reading test throughout the first 15 years of teachers' careers. (Contains 2 figures.)
