Monday, August 25, 2008

The Schools We Have, or the Schools We Need

Last week I wrote about the practice of applying statistical margins of error to the percent of students in a school who pass a test. It's a goofy idea, I said, because: "unlike opinion polls, NCLB doesn't test a sample of students. It tests all students. The only way states can even justify using MOEs in the first place is with the strange assertion that the entire population of a school is a sample, of some larger universe of imaginary children who could have taken the test, theoretically. In other words, the message to parents is "Yes, it is true that your children didn't learn very much this year, but we're pretty sure, statistically speaking, that had we instead been teaching another group of children who do not actually exist, they'd have done fine. So there's nothing to worry about."

Eduwonkette responded with her own take on the issue by cutting and pasting from Harvard professor Dan Koretz's recent book on educational assessment, in which he says:
This question was a matter of debate among members of the profession only a few years ago, but it is now generally agreed that sampling error is indeed a problem even if every student is tested. The reason is the nature of the inference based on scores. If the inference pertaining to each school...were about the particular students in that school at that time, sampling error would not be an issue, because almost all of them were tested. That is, sampling would not be a concern if people were using scores to reach conclusions such as "the fourth-graders who happened to be in this school in 2000 scored higher than the particular group of students who happened to be enrolled in 1999." In practice, however, users of scores rarely care about this. Rather, they are interested in conclusions about the performance of schools. For the inferences, each successive cohort of students enrolling in the school is just another small sample of the students who might possibly enroll, just as the people interviewed for one poll are a small sample of those who might have been.

In one sense Koretz frames the issue correctly; this is indeed all about inference. But he's wrong to imply that this is a methodological question best resolved by the consensus opinion of statisticians and assessment professionals. It's not. Rather, this really goes to the heart of how we conceive of public schools, what we expect of them, and how we hold them accountable for student learning.

By one of way of thinking, school outcomes are the result of a production function in which schools are constant and inputs, including students, vary. In other words, the school is what it is, with a certain amount of resources, certain number of teachers with certain qualifications, along with curricula, facilities, etc. Every year, the inputs (students) change. If, in one year, a brilliant group of students comes through the door, outputs (test scores) go up. If a very tough group arrives in the following year, outputs go down. If this is how you see things, then Koretz's analysis makes sense and sampling-based margins of error are fair--indeed, absolutely necessary.

The problem with this idea is that it assumes that schools are inflexible and un-improvable, inert black boxes that serve as little more than conduits for inputs and outputs. Accountability policies assume something else: that both students and schools can vary. That schools can, and must, change they way they teach to fit the particular needs of their students in a given year. If a cohort of particularly difficult students enters the system, and this becomes apparent as they move through first and second grade, by the time they hit the tested grades the school or district needs to reallocate resources and plan accordingly.

Statisticians have a weakness for black box thinking, because the box contains a lot of things that are essentially unmeasurable. You can't put a number on the relationship between a principal and her teachers, the quality of teamwork, the level of commitment and hard work among the staff, the sophistication and flexibility of the instructional plan. You can, however, put a number on student learning--an imperfect one, to be sure, but close enough to render reasonable judgments about school success.

Schools need to be organized, staffed, led, funded--and held accountable--for the performance of the students they have, not those they might have had, or wish they had, or had once or may have again. These students, this year, are the ones who matter. That conviction, and those that follow, are much more than a matter of statistics.

3 comments:

Corey Bunje Bower said...

If the states are arguing what you say they are, then they're wrong. But your argument is just as flawed.

The reason to build error into evaluating test results isn't what you say states say it is -- it's because the test results are merely a sample. But in this case the population is all student knowledge, not all students. If we were measuring the height of each student then we could do it in one shot and be confident in the results. But each test is merely a snapshot of student knowledge and the measurement is imperfect.

Anonymous said...

This is straight up foolish.

Your point that schools should be thought of (and expected be) flexible and responsive is a good one. But the margin of error is an issue of *measurement*, not an assumption that schools are a fixed production process with a random error term attached.

The whole point of sampling error is that the measurements taken are *not*, as you say, "close enough to render reasonable judgments about school success." *Even if* schools are quite responsive to a change in student populations, there will still be sampling error in measurement. This will be particularly true when group sizes are small.

Thank heavens we have Kevin Carey to prevent those no-nothing Harvard professors and "statisticians" from getting in the way of our kids education!

Please, please take a statistics class before someone gets hurt.

Anonymous said...

"If a cohort of particularly difficult students enters the system, and this becomes apparent as they move through first and second grade, by the time they hit the tested grades the school or district needs to reallocate resources and plan accordingly."

Are you kidding? Do you have any idea what you are asking for? You theorists have got to have a reality check occassionally.