Wednesday, November 12, 2008

Magnitude

Expanding on something I alluded to in the post below about training and certifying teachers vs. attracting and selecting them: the way these issues are often discussed reveals one of the weak spots in the way research is applied to policy. Essentially, people don't properly account for differences in magnitude.

All rational people want education policy to be informed by the results of objective research and inquiry, conducted using transparent, widely-accepted methods. (In a lot of ways, this is just a restatement of the definition of "rational person.") The kind of analysis that informs policy tends to involve testing hypotheses and identifying differences. Consider randomized control trials, the "gold standard" of research, which increasingly are being used in education. You randomly assign subjects to two groups, apply a treatment to one of them, and see if some pre-defined results vary. If the variance is large enough to meet agreed-upon levels of statistical significance, you conclude that the treatment had an effect. The same basic approach, even without the randomized assignment, is used to tackle other questions--like, for example, the effect of certification on teacher effectiveness.

So let's take another look at that chart with the three overlapping bell curves, one showing the distribution of teacher effectiveness among certified teachers, one showing alt-cert teachers, and one showing uncertified teachers.



If you squint your eyes and look carefully, you'll note that the solid line--the certified teachers--is slightly to the right of (with teachers increasing in effectiveness as they move from left to right along the scale) the other two. In other words, the population of certified teachers was slightly more effective. If the populations were large enough--if, say, they represented every teacher in America--that difference might be statistically significant. It might even be significant beyond a shadow of a statistical doubt, at the .0001 level or what have you. And if that were the case, the results of the research would very likely be translated for lay readers and policymakers into something like this: "A study from [fill in the blank] found that certified teachers are more effective than non-certified teachers." And the natural policy response would be that we should make sure that all students are taught be certified teachers--they're better, after all--and we should invest in and strengthen certification programs, since they work.

The problem, of course, is that if you put your common sense hat on and think for 30 seconds about what that graph actually shows, you naturally conclude that such a policy response would be insane. The difference between certified and uncertified may indeed be statistically signifcant, but it is a tiny statistically signficant difference, one that pales in comparison to the extent to which teachers vary within those populations, and the extent to which the populations are the same. To state the obvious: the most important differences tend to be large, not small.

The vernacular of research and policymaking, in other words, is insufficiently precise about and sensitive to differences in magnitude. All significant differences are not equally signficant just because they met the same test of signficance, but they are often thought of as such.

3 comments:

KDeRosa said...

This is why we have the concept of "educationally important" or "educationally significant" which requires an effect size of at least 0.25 standard deviations.

We don't get excited by education research with an effect size below 0.25, i.e., the vast majority of them because in the real world such interventions have little practical effect.

Parry Graham said...

This is my new favorite graph. It visually summarizes and presents the information in a way that allows for so little debate. Why can't more education data be presented this clearly?

Parry

Anonymous said...

When the bar is low – everyone can jump over. And if everyone can jump over, it is just the same as having no bar at all. Therefore, these results are not surprising to anyone who works in certification.
http//www.abcte.org/blog