Big Study Links Good Teachers to Lasting Gain

By ANNIE LOWREY

WASHINGTON — Elementary- and middle-school teachers who help raise their students’ standardized-test scores seem to have a wide-ranging, lasting positive effect on those students’ lives beyond academics, including lower teenage-pregnancy rates and greater college matriculation and adult earnings, according to a new study that tracked 2.5 million students over 20 years.

Steve Hebert for The New York Times

A study found profound effects on students whose teachers helped them raise their test scores.

Multimedia

Graphic

Benefits of Good Teachers

The paper, by Raj Chetty and John N. Friedman of Harvard and Jonah E. Rockoff of Columbia, all economists, examines a larger number of students over a longer period of time with more in-depth data than many earlier studies, allowing for a deeper look at how much the quality of individual teachers matters over the long term.

“That test scores help you get more education, and that more education has an earnings effect — that makes sense to a lot of people,” said Robert H. Meyer, director of the Value-Added Research Center at the University of Wisconsin-Madison, which studies teacher measurement but was not involved in this study. “This study skips the stages, and shows differences in teachers mean differences in earnings.”

The study, which the economics professors have presented to colleagues in more than a dozen seminars over the past year and plan to submit to a journal, is the largest look yet at the controversial “value-added ratings,” which measure the impact individual teachers have on student test scores. It is likely to influence the roiling national debates about the importance of quality teachers and how best to measure that quality.

Many school districts, including those in Washington and Houston, have begun to use value-added metrics to influence decisions on hiring, pay and even firing.

Supporters argue that such metrics hold teachers accountable and can help improve the educational outcomes of millions of children. Detractors, most notably a number of teachers unions, say that isolating the effect of a given teacher is harder than it seems, and might unfairly penalize some instructors.

Critics particularly point to the high margin of error with many value-added ratings, noting that they tend to bounce around for a given teacher from year to year and class to class. But looking at an individual’s value-added score for three or four classes, the researchers found that some consistently outperformed their peers.

“Everybody believes that teacher quality is very, very important,” says Eric A. Hanushek, a senior fellow at the Hoover Institution at Stanford and longtime researcher of education policy. “What this paper and other work has shown is that it’s probably more important than people think. That the variations or differences between really good and really bad teachers have lifelong impacts on children.”

The average effect of one teacher on a single student is modest. All else equal, a student with one excellent teacher for one year between fourth and eighth grade would gain $4,600 in lifetime income, compared to a student of similar demographics who has an average teacher. The student with the excellent teacher would also be 0.5 percent more likely to attend college.

Perhaps just as important, given the difficulty of finding, training and retaining outstanding teachers, is that the difference in long-term outcome between students who have average teachers and those with poor-performing ones is as significant as the difference between those who have excellent teachers and those with average ones, the study found.

In the aggregate, these differences are potentially enormous.

Replacing a poor teacher with an average one would raise a single classroom’s lifetime earnings by about $266,000, the economists estimate. Multiply that by a career’s worth of classrooms.

“If you leave a low value-added teacher in your school for 10 years, rather than replacing him with an average teacher, you are hypothetically talking about $2.5 million in lost income,” said Professor Friedman, one of the coauthors.

To do the study, the researchers first tackled the question that has swirled controversy in so many school districts, including New York City’s: whether value-added scores are in fact a good measure of teacher quality. Mr. Jones might regularly help raise test scores more than Ms. Smith, but maybe that is because his students are from wealthier families, or because he has a harder-working class — factors that can be difficult for researchers to discern.

While Professor Rockoff, at Columbia, has previously written favorably about value-added ratings, the Harvard pair were skeptics of the metrics. “We said, ‘We’re going to show that these measures don’t work, that this has to do with student motivation or principal selection or something else,’ ” Professor Chetty recalled.

But controlling for numerous factors, including students’ backgrounds, the researchers found that the value-added scores consistently identified some teachers as better than others, even if individual teachers’ value-added scores varied from year to year.

After identifying excellent, average and poor teachers, the economists then set out to look at their students over the long term, analyzing information on earnings, college matriculation rates, the age they had children, and where they ended up living.

The results were striking. Looking only at test scores, previous studies had shown, the effect of a good teacher mostly fades after three or four years. But the broader view showed that the students still benefit for years to come.

Students with top teachers are less likely to become pregnant as teenagers, more likely to enroll in college, and more likely to earn more money as adults, the study found.

The authors argue that school districts should use value-added measures in evaluations, and to remove the lowest performers, despite the disruption and uncertainty involved.

“The message is to fire people sooner rather than later,” Professor Friedman said.

Professor Chetty acknowledged, “Of course there are going to be mistakes — teachers who get fired who do not deserve to get fired.” But he said that using value-added scores would lead to fewer mistakes, not more.

Still, translating value-added scores into policy is fraught with problems. Judging teachers by their students’ test scores might encourage cheating, teaching to the test or lobbying to have certain students in class, for instance.

“We are performing these studies in settings where nobody cares about their ranking — it does not change their pay or job security,” said Jesse Rothstein, an economist at the University of California, Berkeley, whose work criticizing other value-added assessments unions frequently cite. “But if you start to change that, there is going to be a range of responses.”

Many other researchers and school administrators say that even if imperfect, well-calculated value-added scores are an important part of evaluating teachers.

“Very few people suggest that you should use value-added scores alone to make personnel decisions,” Dr. Hanushek, of Stanford, said. “What the whole value-added debate has done is push forward the issue of how to evaluate teachers, and how to use that information.”

The new study found no evidence for one piece of conventional wisdom: that having a good teacher in an early grade has a bigger effect than having a good teacher in later grades.