Wednesday, December 19, 2007

It's (Not) So Easy

The Post ran a useful article a few days ago ("Calls Grow for a Broader Yardstick For Schools") about one of the more interesting challenges of NCLB reauthorization and education policy generally: expanding the scope of educational accountability beyond standardized test scores in reading and math to include many other important things--mastery of other subjects, more generalized abilities like critical thinking and analytic reasoning, "soft" skills like leadership and teamwork, graduation and success in college and the workforce, etc. etc. But it leaves the central question answered: if nearly everyone thinks this is a good idea, why aren't we doing it already ?

Mostly, I think, because this imperative bumps up against other imperatives, and nobody has figured how to adequately reconcile their inherent conflicts.

NCLB was designd to make the assessment of schools objective, universal, and unavoidable. In other words, all schools within a state are assessed according to the same standards and in the same way. There are very good reasons to put a premium on this. Without objectivity and universality, it's a short road back to the default judgment that most schools render upon themselves: "We're doing the best we can, given the students we have." Given how badly some schools and students are failing, that's just not good enough.

The way that NCLB achieves objectivity and universality, however, has significant shortcomings. It's a mechanistic process, based on rules intead of human judgment. The problem is that it's really hard to developing rules that (A) accurately assess something as complicated as a school and (B) people can understand.

For example, here's a by-no-means-exhaustive list of some of the important categories of information we might want to gather about a school and its students, along with the number of possible values for each:

Student Race/Ethnicity: 5 (White, Blank, Hispanic, Asian, Indian)
Student Gender: 2 (Male, Female)
Student LEP status: 2 (Yes, No)
Student Disability status: 2 (Yes, No)
Student Economic status: 2 (Low-income, Not Low-Income)
Student Gifted status: 2 (Yes, No)
Subjects: 5 (Math, Reading/LA, Social Studies, Science, Art/Music)
Proficiency Status: 4 (Below Basic, Basic, Proficient, Advanced)
Value-Added Growth: 3 (Below Expected, Expected, Above Expected)
Attainment (i.e. graduation, progression to next year in school): 2 (Yes, No)
Grades: 4 (Typical grade configuration)
Timeframe: 4 (Now, Short-Term, Medium-Term, Long-Term)

Keep in mind that this is, in many ways, a very conservative estimate of the number of possible variables. There are more than five significant racial/ethnic groups, more than five important academic subjects, more than four potential levels of academic proficiency or value-added growth to consider, far more than one category of disability, gifted status, etc. Heck, you could even argue about gender.

But even this highly simplified model produces 307,200 possible outcomes. Each of them tells us something different, and as such could theoretically merit a different response. This throws the decisions of NCLB's authors into a fairly sympathetic light. They knew that a 307,200-element accountability system wouldn’t fly, so they started narrowing things down.: Two subjects and five racial/ethnic categories. One category each for LEP, special ed, and economic status, but no combinations—in other words, we measure the performance of white children and low-income children, but not white low-income children. Gender and gifted status are out. One proficiency level, no value-added. Only seven of 12 grades, and multiple grades can be combined. Include one growth measure (safe harbor), but make it either/or so you don’t create extra variables. Each of the 16 distinct outcomes (two subjects X eight student categories, although very few schools will have all eight) has equal and overriding status as an indicator of school success. Miss one, miss all, it doesn’t matter—your identification as not making AYP is the same. Then, having rolled the entirety of a school’s success into the single binary AYP variable, put it on a four-level time scale: 0-1 years in a row, 2-3, 4-5, 6 or more. Each level corresponds to a collection of mandated and optional responses—None, In Need of Improvement, Corrective Action, Alternative Governance.

Even this relatively small level of complexity seems barely manageable. The testing industry is popping rivets trying to handle two academic subjects in seven grades. State Department of Educations struggle, some mightily, to gather all the required data and turn school ratings around on time. There are constant complaints about the expense and bureaucracy of compliance and time lost to preparing for and taking one test in two subjects per year.

And even with all the compromises and simplifications, most people still don't know how the NCLB system actually works. That diminishes the capacity of the law to act as a catalyst for change, since educators and policymakers can't constructively respond to signals they don't understand.

The article quotes Ed Trust's Amy Wilkins saying maybe this is okay: "Proponents of multiple measures say it will give a richer, fuller view of a school, but this isn't about a rich view of a school. It's about failures in fundamental gate-keeping subject areas." That view reflects Title I's origins in and continued focus on compensatory education for low-income children, and I agree this needs to remain the first priority. But like it or not, NCLB has come to be about all students and all schools, and that demands a richer view on some inescapable level.

Adding more information to the existing rules-based system will consume even more scarce resources and create even more hard-to-manage complexity. Not adding more information will leave us with an accountability system that reflects only a fraction of what we want for schools. That argues for a non-rules-based approach, one that relies more heavily on human judgment, since people are much better at making sense of vast amounts of information from disparate sources than rules. But that, in turn, threatens universality and objectivity. Perhaps one could mitigate this problem by aggregating many judgments through more robust market-focused systems, but then we're opening up a whole new can of worms...

Anyway, it's tricky. Anyone who thinks the shortcomings of the existing system are a result of obvious choices not made should think again.

No comments: