Eduwonkette responded
This question was a matter of debate among members of the profession only a few years ago, but it is now generally agreed that sampling error is indeed a problem even if every student is tested. The reason is the nature of the inference based on scores. If the inference pertaining to each school...were about the particular students in that school at that time, sampling error would not be an issue, because almost all of them were tested. That is, sampling would not be a concern if people were using scores to reach conclusions such as "the fourth-graders who happened to be in this school in 2000 scored higher than the particular group of students who happened to be enrolled in 1999." In practice, however, users of scores rarely care about this. Rather, they are interested in conclusions about the performance of schools. For the inferences, each successive cohort of students enrolling in the school is just another small sample of the students who might possibly enroll, just as the people interviewed for one poll are a small sample of those who might have been.
In one sense Koretz frames the issue correctly; this is indeed all about inference. But he's wrong to imply that this is a methodological question best resolved by the consensus opinion of statisticians and assessment professionals. It's not. Rather, this really goes to the heart of how we conceive of public schools, what we expect of them, and how we hold them accountable for student learning.
By one of way of thinking, school outcomes are the result of a production function in which schools are constant and inputs, including students, vary. In other words, the school is what it is, with a certain amount of resources, certain number of teachers with certain qualifications, along with curricula, facilities, etc. Every year, the inputs (students) change. If, in one year, a brilliant group of students comes through the door, outputs (test scores) go up. If a very tough group arrives in the following year, outputs go down. If this is how you see things, then Koretz's analysis makes sense and sampling-based margins of error are fair--indeed, absolutely necessary.
The problem with this idea is that it assumes that schools are inflexible and un-improvable, inert black boxes that serve as little more than conduits for inputs and outputs. Accountability policies assume something else: that both students and schools can vary. That schools can, and must, change they way they teach to fit the particular needs of their students in a given year. If a cohort of particularly difficult students enters the system, and this becomes apparent as they move through first and second grade, by the time they hit the tested grades the school or district needs to reallocate resources and plan accordingly.
Statisticians have a weakness for black box thinking, because the box contains a lot of things that are essentially unmeasurable. You can't put a number on the relationship between a principal and her teachers, the quality of teamwork, the level of commitment and hard work among the staff, the sophistication and flexibility of the instructional plan. You can, however, put a number on student learning--an imperfect one, to be sure, but close enough to render reasonable judgments about school success.
Schools need to be organized, staffed, led, funded--and held accountable--for the performance of the students they have, not those they might have had, or wish they had, or had once or may have again. These students, this year, are the ones who matter. That conviction, and those that follow, are much more than a matter of statistics.
3 comments:
If the states are arguing what you say they are, then they're wrong. But your argument is just as flawed.
The reason to build error into evaluating test results isn't what you say states say it is -- it's because the test results are merely a sample. But in this case the population is all student knowledge, not all students. If we were measuring the height of each student then we could do it in one shot and be confident in the results. But each test is merely a snapshot of student knowledge and the measurement is imperfect.
This is straight up foolish.
Your point that schools should be thought of (and expected be) flexible and responsive is a good one. But the margin of error is an issue of *measurement*, not an assumption that schools are a fixed production process with a random error term attached.
The whole point of sampling error is that the measurements taken are *not*, as you say, "close enough to render reasonable judgments about school success." *Even if* schools are quite responsive to a change in student populations, there will still be sampling error in measurement. This will be particularly true when group sizes are small.
Thank heavens we have Kevin Carey to prevent those no-nothing Harvard professors and "statisticians" from getting in the way of our kids education!
Please, please take a statistics class before someone gets hurt.
"If a cohort of particularly difficult students enters the system, and this becomes apparent as they move through first and second grade, by the time they hit the tested grades the school or district needs to reallocate resources and plan accordingly."
Are you kidding? Do you have any idea what you are asking for? You theorists have got to have a reality check occassionally.
Post a Comment