Friday, July 18, 2008

The Math of Merit Pay

Bured in yesterday's Ed Week write-up ($) of McCain's speech was a quote that nicely sums up the common concerns over merit pay:

Bettye Oldham, a retired teacher from Cincinnati, said she had mixed feelings about the merit-pay proposal.

“I think it’s a good idea if it’s run correctly,” she said. “But you can’t judge teachers the way you can judge manufacturing companies.” Some students come to school with “issues that prevent them from learning,” she said, which can make it harder for teachers to reach them.

I picture what Ms. Oldham might look like, and I can see her mulling the pros and cons of merit pay. She's probably worried that it would reduce teachers to nothing more than cogs on an assembly line, but also understands the implicit logic that good doctors and lawyers make more money than bad ones. Teachers, on the other hand, are paid only by credentials--all teachers in the same district with a bachelor's degree and 12 years experience, for instance, are paid the exact same amount no matter their talent or specialty area.

Merit pay seeks to measure some of the work teachers do. While the precise math is a little too complicated for this blog, in essence it looks like a fairly straightforward algebra problem:

Student achievement = age + prior achievement + student demographic information + teacher quality + measurement error

A common argument against merit pay, and the one repeated by Ms. Oldham, is that there are too many human elements for the measurement of teaching. But that argument overlooks even the simple version of the merit pay equation. Race, income and prior achievement are all factored in, and the error term captures all those things that are not predictable--a student's situation at home, whether they had a good breakfast, whether it's too hot or cold, etc.--all things that surely affect the student's score but are not measurable.

It might seem a little too whiz-bang to trust the capabilities of modern statistical computer programs, but mathematicians far greater than I are working to refine the exact statistical equations. We've made giant strides in our ability to compute the value an individual teacher adds to student learning. Let's not let fear of complex, modern math get in the way of promising reform.

Update: Fred Klonsky responds with the "human element" argument. He's making my point for me though, because it really goes back to a fear of assessment of teachers' work. We are comfortable measuring a baseball player's batting average, even if they had a bad breakfast, a rough flight the night before, or an argument with their spouse. In fact, their results are published the next morning in your daily paper. I can't imagine A-Rod explaining poor results on the field with his off-field Madonnna shenanigans. There are "human elements" to everything in life.

7 comments:

Anonymous said...

While we certainly have made progress in measuring an individual teacher's contributions to student achievement, a scepticism of our inability to accurately measure teacher quality cannot be dismissed simply as "fear of complex, modern math". First, the fact that the error term IN THEORY captures all of the "random noise" of individual students' situations does not actually help in the reality of imputing teacher quality. The fact is, in the equation above, we are left with two unknowns - the teacher quality factor and the error term - and we can't allocate between them without being arbitrary.

This is true of all empirical statistics, where you assume the error term is random and washes out, but the issue here becomes the small sample sizes of one teacher's classroom. It is simply a statistical fact that some segment of teachers are going to appear unduly strong or weak for reasons that are not related to their own performance. This is only exacerbated when looking at student GAIN measures (which require larger sample sizes for statistical validity) and by the fact that certain unmeasured factors may influence multiple students' performance in a single classroom (eg, a peer's death the week before the exam, a particularly violent neighborhood, an exceptionally high-quality after-school program) and mean that the error term is NOT random.

None of this is meant to say that we shouldn't be attempting to measure individual teacher performance - and using test scores to do so. But I think dismissing legitimate concerns (even if intuitive rather than sophisticated theory) is not only likely to turn off needed allies but also to lead to overly simplistic policies that result in one more failed education reform fad.

AldeBeer said...

Your points about caution are well taken. I get a sense, though, that many teachers and educators subscribe to the "black box" theory that says no statistical computation could ever encapsulate the effect of teachers on students, regardless of sample size or model sophistication.

Many people debating merit pay never embrace the idea that we CAN control for prior achievement and student demographic characteristics.

Anonymous said...

No statistical computation in the foreseable future can encapsulate the teacher effects IN THE REAL WORLD.

You simply can not control for the people side. Two teachers, side by side, one class is under the authority of an excellent assistant principal while the other AP refuses to do discipline. That type of situation occurs in ?% of schools. Under a typical career, I suspect it happens to almost all of us.

Or you challenge a principal's policies and thus are assigned to a class where you will never again get a pay raise.

As long as you have a system that overemphasizes quick fixes, you will bring out the worst in everyone. Merit pay will always encourage many schools towards a race to the bottom of human nature. Would you choose teaching if you had a x% chance of ending up in such a situation. Or if you had an X% chance that a new principal or central office administrator would demand that your school - a school where you have given total commitment - would be transformed into a jungle-like non-stop test prep culture. If you don't teach, you have no idea of how common those patterns are. (if you have four principals in a school, and 3/4ths approach test-driven accountability with integrity, what is your margin of error PER YEAR. You'd gamble your career, your mortgage, and your children's college tuition on those odds?)

Performance pay, however, can be designed in a way that rewards teamwork. I hope it works in Denver. I hope we try it in some schools in OKC. Try it in schools that have good leadership, and it may produce results. But, as is the norm, its the people issues not the math that are the toughest.

But as the previous post explained the math ain't that easy either.

If you really believe in accountability, why not devise a decent system of accountability before imposing it on people. Lets stop treating educators and children as guinea pigs.

Anonymous said...

If teacher performance ranking is public knowledge (city-employee salaries), then doesn't a student have the right to demand that they be in their classroom?

Anonymous said...

John Thompson and anonymous (7/18) make a good point: It's very difficult to control for the unknown factors in any school or classroom in an attempt to evaluate teacher performance.

For instance, a teacher I know is regarded within her school as being very good at handling "difficult" students. She was recently voted teacher of the year for her elementary school.

As a result, the principal specifically puts the most difficult students in her class each year. These include students from broken homes, students with emotional disturbances, and students who have a history of behavior problems.

The teacher does remarkably well with the students she is assigned. But how could one adequately compare the performance of her students with the performance of the other teachers' students?

This is why I believe that evaluating teacher performance is a much trickier endeavor than evaluating the batting skills of a major league pitcher...

Anonymous said...

My teachers warned me about reasoning from analogy. The A-Rod example is problematic. We're comfortable with comparing him to other players because (A) we know that he'll face a roughly equal set of pitchers and fielders as other players across a season; and (B) we're judging him using a single metric (hits). But imagine that he is playing every game against the same team (let's say the Nationals) while some other excellent hitter is playing every game against the Cubs. Would we be comfortable that we could correct for the difference in teams (classes) well enough to compare the two players? And suppose we add to the mix more than hits. Suppose we include hits, RBIs, homers, fielding errors, etc. as well as how well this year's stats predict his future stats (something we expect tests in school to do). Add to that the mobility that occurs in many high-poverty schools. What if the players A-Rod and the other player are competing against change frequently and often don't show up?
Still easy to compare?

AldeBeer said...

I can't believe I'm getting suckered into the baseball argument here on The Quick and the Ed. Kevin already addressed it at length:

http://tinyurl.com/5qqacr

August, baseball statisticians already factor these things in on a measure called Value Over Replacement Player. It controls for walks, hits, home runs, RBIs, stolen bases, the player's position, ballpark, and season year to calculate a player's value compared to any ordinary substitute. A similar measure is calculated for pitchers.

Merit pay does the same thing with teachers, only it's more complicated, careful, and performed by PhDs (follow the link in my post to find a conference with 14 different papers on the topic, all working to fine-tune the merit pay mechanism).