Today’s New York Times carries a story about an important study that links teacher impact not just to students’ academic performance but to life-long benefits including higher earnings, lower teen pregnancy rates and college enrollment.
A key finding, as reported by the paper: Replacing a poor teacher with an average one would raise a single classroom’s lifetime earnings by about $266,000.
Other researchers have made the same controversial link in the past, but by tracking 2.5 million students over 20 years the new study is the weightiest yet.
And it has immediate ramifications for Minnesota policymakers and educators, who are in the final month of grappling with the thorny issue of how to use the same type of “value-added” data to evaluate the state’s teacher and principal corps.
A state task force is currently working to devise an evaluation model to provide school districts that do not have their own systems for measuring teachers’ impact on individual student achievement. The group is supposed to present its recommendations during the next legislative session, which begins at the end of the month.
The upshot, according to several Twin Cities educators familiar with the research: Proceed, but carefully.
Because they measure individual students’ learning over the course of an academic year, not grade-level proficiency, so-called growth model or value-added tests are a fairer way to assess teachers’ impact, they said.
But they are still imperfect, and using them to make high-stakes decisions about hiring and firing is fraught.
Minnesota needs to look carefully at the data-scrubbing and other methodologies employed by the study’s authors, economists Raj Chetty and John Friedman of Harvard and Jonah Rockoff of Columbia, local educators said.
All kinds of variables
The economists controlled for all kinds of variables, including poverty, and used outcomes from multiple classes taught by each teacher to compensate for the inability to randomly assign students, some of whom are much harder to teach than others, to classrooms.
“I think it’s yet another example of very careful work that shows that, at least in reading and math, and at least in grades 3-8, this can be done in a way that controls for biasing characteristics that students bring into the classroom,” said Kent Pekel, executive director of the College Readiness Consortium at the University of Minnesota.
“It’s a kind of numeric validation that the Jaime Escalantes of the world do exist and that they matter,” he said, referring to the heroic teacher portrayed in the movie “Stand and Deliver.”
Test data should be only a portion of any teacher’s evaluation, he and other local education policymakers agreed. Observation by principals and peers and other measures are crucial both to ensuring decisions about hiring, firing and compensation are fair and to gleaning insight into great teachers’ practices.
Right now, even the best value-added tests have too much margin for error, said Jim Angermeyer, director of Research and Evaluation for Bloomington Public Schools and a designer of one of the earliest and most reliable growth-model tests.
“It’s not the sort of thing I would base personnel decisions on,” he said. “The variability is just huge.”
Economists love the kind of research featured in the Times today because it typically provides great insights. But breaking down statistics on, say, crime or recidivism rates and then probing for causes and correlations is more easily done objectively than in education.
Kids will not perform the same way on the same test from one day to another and a classroom contains few enough students that a poor performance by just a couple can have an outsize impact on an individual teacher’s performance profile, he said.
Plus, you have no guarantee the classroom teacher was the only, or most impactful, instructor. “If you have a classroom full of kids with serious problems, they may be receiving multiple levels of service from a variety of providers,” he said.
“You can be a great teacher one year and underperform in another,” he said.
Ahead of the curve
Even with all of these caveats, Minnesota is ahead of the curve, Angermeyer and Pekel agreed.
As a part of the Bush Foundation’s ongoing effort to increase the quality of teacher-preparation programs in the Upper Midwest, the nation’s foremost value-added data experts are working with a number of districts throughout the state to tie student performance to teachers and to the institutions of higher education where the teachers received their training.
The foundation has contracted with the Value-Added Research Center (VARC), located at the University of Wisconsin-Madison. Center Director Robert Meyer was quoted approvingly in today’s Times story.
Locally, VARC is working with Dave Heistad, Minneapolis Public Schools’ director of Research and Evaluation, who is widely acknowledged to be the Minnesotan with the most experience parsing growth-model data.
Last year, he and Meyer provided some two dozen Minnesota districts with data showing the effectiveness of different grade-level teaching teams, Heistad said. Minneapolis is currently working on tying data to individual teachers.
Eventually, VARC’s Minnesota research is expected to identify great teachers, any unique or effective preparation that went into making them great and whether the highest performers stay in the profession, Heistad said.
The model has tremendous potential, all three agreed, but the state task force at work right now needs to understand that the new study, and for that matter VARC’s work, is very tricky stuff.
For instance, the study was the product of three highly skeptical world-class experts who imagined they would end up debunking earlier efforts, Pekel noted. They used retroactive data and controlled for myriad variables — which has important differences from using the data on the fly to evaluate teachers.
“What we should do is use this data as soon as possible to identify the most effective teachers and learn from them,” said Pekel.