Wednesday, February 14, 2007

Aspen and Value-Added

Day two of the Aspen NCLB report saw good notices in the press from the Bush Administration and the Democratic committee chairmen on the Hill, Ted Kennedy and George Miller, and more strong objections from various teachers union representatives. While Joel Packer et. al. hit the broad strokes in the Post and Times, Leo Casey got a lot more specific over at Edwize. It's worth taking a look, because there are a lot distortions and flatly incorrect statements to parse. Leo said:


NCLB was exactly on point in insisting upon a highly qualified teacher in every classroom. Educational research affirms the absolute centrality of experienced, accomplished teachers to the education of young people, especially for students living in poverty and at academic risk.

The recommendations of the report would abandon the work that states have already done to improve the quality of teaching, set aside the work that remains to be done, and adopt a proposal which has never been successfully implemented – the use of growth in standardized test scores as a measure of teaching quality. State of the art research on ‘value added’ and ‘growth’ models of standardized testing has concluded that they do not have the capacity to provide accurate micro-level data for schools or individual teachers. In the absence of such refined tools and with the real world of standardized tests that are themselves substantially flawed more often than not, the use of such tests as measures of quality teaching will simply create massive disincentives for schools and teachers to take on the most needy and academically at risk youngsters who perform poorly on such exams – the exact opposite of what we should be doing in American education.

Let's take these in order (all bold type added):

Leo says "Educational research affirms the absolute centrality of experienced, accomplished teachers to the education of young people, especially for students living in poverty and at academic risk." On the first point, "experienced," this is a gross distortion of what the research says. There's a lot of research that says teacher experience matters. Absolutely none of it says that teacher experience is central to the education of young people. The research says (see here for a good summary, and here) that experience matters in the beginning of a teacher's career, up to at most 10 years, and then not afterwards. And even when experience does matter, it accounts for only a small fraction of all the variance among teachers.

In terms of the centrality of accomplished teachers, it depends on what you think that word means. If you mean the accumulation of credentials, like state certification or master's degrees, absolutely not. Most studies say master's degree don't matter at all, and the evidence on certification is mixed, with small effects. If, on the other hand, you mean "accomplished" in the sense of "consistently helping students increase learning, as measured by standardized tests," the evidence is a lot more persuasive--but that's the very conception of teacher quality that Leo is arguing against.

Leo criticizes the commission for wanting to "adopt a proposal which has never been successfully implemented – the use of growth in standardized test scores as a measure of teaching quality."

A) That's not true, Tennessee has been doing it over a decade.
B) To the extent that it hasn't been implemented more broadly, that's because people like Leo object every time someone tries. It's the chicken-and-the-egg strategy of obstructionism--stop people from doing something new, and then when other people try do it, say "you shouldn't try to do it, because nobody's ever done it."

Leo then cites a 2003 book from RAND, titled "Evaluating Value-Added Models for Teacher Accountability," as evidence against the approach the Aspen commission recommends--evaluating teacher effectiveness by using year-to-year growth in student test scores. This book stands as the most authoritative treatment of the subject that currently exists, and it's going to get referred to a lot in the coming debate. So let's be clear about what it does and does not say.

The RAND book provides a very thorough and critical look at the extant research and available methods for estimating teacher effectiveness using test score growth. It reaches the absolutely reasonable conclusion that none of the measures are perfect, and that anyone relying on value-added data should be aware of the many potential sources of bias and error that could skew the results, particularly when making high-stakes decisions. At the very end--literally, the second-to-last sentence--the authors say:

"In the end...it is the job of policymakers and educators to define their inferential goals and to decide what kinds of uncertainty are acceptable and what kinds are not."

In other words, understand the imperfections of the data you're using, and make smart judgments accordingly. However, earlier in the same chapter, they say:

"The research base is currently insufficient for us to recommend the use of [value-added methods] for high-stakes decisions."

It is hugely important to understand that they are not answering a research question here. They're answering a policy question, which, as they rightly note in the previous quote, is fundamentally different. No researcher, from RAND or elsewhere, can apply an established methodology to definitively identify the threshold level of error, bias, or uncertainty beyond which data shouldn't be used for policymmaking (other than to make the fairly obvious point that a finding of no teachers effects, or completely flawed methodology, would indicate this. That's not what they found.).

As they rightly state in the previous quote, the degree of acceptable uncertainty--some of which is unavoidable--is a question for policymakers to answer. Moreover, their opinion on this issue is based in part on the lack of research around value-added methods that provides evidence one way or the other, not, as Leo implies, the presence of research that discredits value-added methods.

So I read this sentence to say, "If we were policymakers making high-stakes decisions, we wouldn't use this data." They're not policymakers, but okay.

However, in the summary of the book, at the front (that is, the only part non-statisticians are likely to read), this sentence has been edited down to the following:

"The research base is currently insufficient to support the use of [value-added methods] for high-stake decisions."

Note the omission of the words "for us to recommend." Now it appears that RAND has made an absolute, empirical determination of whether value-added data is good enough for policy. This is the quote that will be frequently brandished and misused if this debate gets off the ground. And it simply doesn't mean what people like Leo will say it means.

No comments: