Friday, April 03, 2009

The Other Lake Wobegon

There are a lot of cute references to No Child Left Behind as some sort of Lake Wobegon law, because of its provision that all children must be "proficient" by 2014. The reference is to Garrison Keillor's famous book by the same name, where all the children from the town of Lake Wobegon are above average. Of course, "average" does not mean the same thing as "proficient," so it's not really a fair comparison, unless the speaker intends to express that all kids being "proficient" is just as impossible as all kids being "above average." Regardless, this is a pretty common misconception and one of the biggest (false) critiques of NCLB.

We're about to get some good evidence of a real Lake Wobegon effect in education. Secretary of Education Arne Duncan has asked states to provide, as part of their teacher quality assurances due for stimulus funds, the number and percentage of teachers scoring at each performance level on local evaluations. (Read a write-up of the story at Education Week here.)

These numbers will surprise the general public and embarrass teachers and districts. The new Secretary of Ed. knows a thing or two about this. His administration commissioned the New Teacher Project to analyze its personnel policies, and one of the results was the chart at left. It shows the results of 36,000 teacher evaluations from 2003 to 2006 in Chicago. Over a four year period, 93 percent of all evaluations resulted in a rating of either "superior" or "excellent," while only .3 percent were deemed "unsatisfactory." In a district with its share of failing schools, less than one in twenty gave an unsatisfactory rating to any teacher in four years. Catherine Cullen thinks the data will be taken in stride, but I doubt such Wobegonian evaluation systems will resonate with the average taxpayer, especially as unemployment hits 25-year highs.

Evaluation systems, ideally, would not be just some abstract measure on which everyone scores well. They should be used for real asssessment and improvement, but they're often just drive-by formailities. Yet, ones that are done well can help document a case for dismissing an ineffective or negligent teacher, while ones that are done poorly serve as a major impediment. If a principal makes a mistake on a teacher's evaluation, that too can hamper a district's ability to rid itself of poor teachers.

And, to be honest, there are poor teachers. Evaluation systems that fail to recognize that fact deserve sunlight and scorn.

Sons and Daughters

Secretary Duncan talked with the New York Post editorial board yesterday and had some interesting things to say:
Duncan was surprised that Albany had added $405 million in state aid to public-school districts while hitting charter schools with what amounted to a $50 million cut.

"That doesn't make sense," Duncan said, after shaking his head for a minute. "These are our kids, these are our schools. If we're serious about it, then let's treat them all the same."

Lawmakers are freezing charter-school funding for the coming fiscal year, which critics say guts nearly $1,000 per city charter student.

Duncan suggested the funding inequity was creating unnecessary divisions between traditional public schools and privately managed charters -- even though they serve the same public-school kids.

"I have two children," he said. "I'm not going to treat my son differently than I'm going to treat my daughter."

Ah, but New York is treating its "children" differently. They give their daughter (traditional public schools) a bigger allowance but impose a strict curfew on them. They're definitely not allowed to date. The forgotten son (charters) doesn't get a very big allowance because he's expected to work for it. Since he's making his own money, he has the freedom to carouse around and generally stir up trouble. A stretch? Maybe, maybe not.

Thursday, April 02, 2009

Update on Florida Legislation to Curtail Virtual Schooling

On Monday, I wrote about pending legislation in Florida that would severely curtail educational choices available through the public, state-run Florida Virtual School. The bill would eliminate enrollment in any elective courses and funding for any courses beyond a standard six periods. Students would no longer have an option to take electives, including some AP courses, beyond those offered at their traditional schools (especially painful for small or rural schools), nor would they have the opportunity to take extra courses to catch up on graduation requirements or accelerate. The legislation was approved in committee and now goes to the full State Senate. An AP article reporting on the legislation quotes the committee chair:

Sen. Stephen Wise [R-Jacksonville], the committee's chairman, said the measure would encourage public schools to enroll more students in virtual courses and that the Senate plans to increase Florida Virtual School's funding by 29 percent.
I don't understand how the Senator's first statement is possible, but I can check the budget figures on the second. In the committee's proposal (see page 27), the overall statewide per student funding remains flat at $6,860 per student. Here's the proposed budget for Florida Virtual School:

Yes, funding is increased by 29%. But enrollment, which is entirely based on student demand, is projected to go up 40.5%. Per student funding for the virtual school declines by 8.1%.

I live in Washington, DC, where one of the biggest issues is the rapidly declining enrollment in our city's public schools. Our schools are generally considered a mess and ground zero for reform battles.

In Florida, there is a public school program that is seen as a national model, rapidly increasing enrollments, and proving that public schools can compete for students and educate in new and different ways. This public school is being rewarded with significant budget cuts (double the per student cut of any other district) and significant limitations on its programs.

PS - A commenter on Monday asked for more information about Florida Virtual School (FLVS). I just finished work looking into the school's data for an article that will be published next month. My interest in the school stems primarily from the fact that it breaks free from many stereotypes common in education policy debates. The school is extremely innovative and has built a distinct educational philosophy, approach, and culture. At the same time, it is state-run, has maintained its identity as a public school, and remains part of the system. For persons who want to see innovation within public schools, this is an extremely important model. A few quick facts from that article:
  • The school is a supplemental virtual school—students attend bricks-and-mortar schools and take FLVS courses in addition to their traditional classes. While the vast majority of FLVS students come from public high schools, the school is open to charter, private, and even home-schooled students.

  • The school has been extremely popular with students and their families. In the 2008–09 school year, approximately 84,000 students will complete 168,000 half-credit courses, more than a tenfold increase since 2002-03. Much of the school’s recent growth has been driven by minority enrollments. Between June 2007 and July 2008, African-American enrollments grew by 49 percent, Hispanic enrollments by 42 percent, and Native American enrollments by 41 percent.

  • The school employs more than 715 full-time and 29 adjunct teachers—all Florida-certified and “highly qualified”under the federal No Child Left Behind law.
You can read more about the school in a December 2007 Department of Education profile on "Innovations in Education" and more about virtual schooling in Education Week's 2009 Technology Counts publication.

Wednesday, April 01, 2009

Uncomfortable Truths

The University of Wisconsin-Madison has recently launched an initiative to leap toward a tuition model that involves greater degrees of price discrimination. It plans to increase tuition significantly, capitalize on those students who can pay the higher costs, and re-distribute excess money to low-income students. It's a bad idea for a variety of reasons (read the Edu-Optimists here and here for more), not least of which is making the true cost of higher education ever more opaque, but it's also forcing Wisconsin to admit some hard truths:

To sell the Madison Initiative, [Wisconsin's Chancellor] has had to be candid about some shortcomings at the institution. Offering need-based aid simply hasn't been part of the "tradition" at Wisconsin, and that's left a gap of about $20 million in annual unmet need at the university, according to [the chancellor]. The Madison Initiative is expected to provide $10 million to close the gap, and a simultaneous fund-raising effort is designed to raise the remaining $10 million.

Historically, Wisconsin has put most of its money toward funding students based on merit -- not need. In 2006-7, the university awarded $23 million in merit-based aid to undergraduates, compared with $6.5 million in need-based aid, according to university officials.

Unfortunately, offering need-based aid isn't part of the "tradition" at most elite colleges and universities. And not only do they not give much aid, they don't enroll very many low-income students, either. Below are the percent of students enrolled with Pell Grants, a good proxy for low-income college students, at selected prestigious institutions across the country:

University of Illinois at Urbana-Champaign: 15.9%
University of Michigan Ann Arbor: 12.9%
University of Virginia: 7.9%
University of North Carolina at Chapel Hill: 14.0%
University of Wisconsin-Madison 12.5%
Stanford University: 13.5%
Yale University: 10.4%
Harvard University: 8.7%
Princeton University: 9.5%
University of Pennsylvania: 9.3%
Duke University: 10.1%
Northwestern University: 9.2%

Some of these institutions received a lot of good press in the last couple years for announcing generous financial aid programs for low-income students. The dirty secret is they tend not to enroll very many low-income students in the first place. And, while the rest of the country experienced a 36.7 percent rise in Pell Grant participation between 2000 and 2006, elite colleges and universities enrolled 1.9 percent fewer. Thanks to Wisconsin for nothing else than forcing these realities into the public.

Late Choices

Secretary of Education Arne Duncan today released a letter to chief state school officers regarding regulations passed back in October. In what is no April Fool's joke, his letter rolls back a regulation that could have helped provide parents of children enrolled in unsuccessful schools the option of choosing a better one.

No Child Left Behind mandates that, for every school labeled in need of improvement, districts must notify parents of an option to transfer to another, more successful school within the district. Unfortunately, only meager fractions of eligible students take up this provision, and its implementation has a lot to do with that.

The law requires districts to give parents notice by at least at least the first day of school of the following year. So, if you're a parent, imagine going through the school registration process in the fall, buying supplies for your child, and believing that your child will begin attending school X. Then, on the first day of school, your child brings home a letter that says her school failed to make adequate yearly progress last spring, and she now has the option to transfer to another school. She only wasted one day in this school, and maybe you weren't particularly keen on it in the first place, maybe the supplies will be the same everywhere, and maybe the other options would be on your route to work or your child could easily find some other way to attend the new school...in other words, a lot of maybes.

Instead, imagine you, as a parent, were notified at least two weeks in advance of the new school year, as the October regulations dictated. You would have time to consider your options, visit new schools (maybe even new teachers), and plan transportation. You might be altogether more interested in exercising your right to choose.

So who cares whether it's one day our fourteen? The truth is, for a number of reasons (including an over-burdened testing industry), states have proven to be quite poor at turning test results into accountability data. Paul Manna analyzed 2005-6 school year data to determine the date at which states released determinations of whether schools had met or failed to meet proficiency targets. Of students who were tested in the spring of 2006, only five states were able to return school and district data by the end of July. Sixteen managed to do so in the first half of August, and 16 more in the second half. After school started in most states, twenty had failed to report the spring test results: ten finished in September, four in October, three in November, and three states failed to release the data by the end of November.

Secretary Duncan's waiver allows this problem to fester. (Although you could argue the fourteen day regulation was illegal, since it expands the original law, you won't find any support in Duncan's letter: he makes it clear this is just a one year waiver. The fourteen day provision would still take effect for the 2010-2011 school year.) His waiver makes it less urgent for states to turn around test results promptly, which has implications beyond just an under-used school choice provision. Late data results also penalize schools labeled in need of improvement, because it gives them little time to implement a real school improvement plan. Instead of having a summer to figure out how best to reconfigure their school, which sections of the student body need the most help, which subjects students are most behind in, or which turnaround specialist to hire, the school and district often have to make these adjustments on the fly. That makes the process even more difficult, and it's too bad federal policy is moving in the opposite direction.

Tuesday, March 31, 2009

A Move to Limit Educational Choice in Florida

While reformers hope that the country's fiscal crisis will lead to much needed educational changes, there's at least one move underway to do the exact opposite. Under the guise of budget cuts, the Florida legislature is attempting to severely curtail educational choices available through the state-run Florida Virtual School (see page 12 of bill text).

If the bill, which is making its way through the Florida Senate, passes, students would no longer have the option to take additional credits through the virtual school. Want to extend the school day virtually? Nope. Fail algebra I your freshman year and want to take an extra course online to catch up to the college track? No dice. Want to graduate early (and save the state money)? Not gonna happen. Excited about school and want to take a high school course while in middle school? Way too ambitious. Stuck because you need to pass a course and there is no summer school option? Sit in the same class again next year. [Even though we know what will happen is that it will be much more likely for teachers to just pass kids along.]

The bill would also limit the virtual school's offering to "core curricula courses" only. Forget choice and a market-driven mechanism to allow students options to take AP Art History, Computer Science, or any number of other courses. Elective options are limited to what your bricks and mortar school can offer.

The irony here is too much. Just last week, an entire special edition of Education Week detailed the rise to prominence and potential of virtual schooling. And by all accounts, Florida Virtual School is a national model. Every other state is trying to match Florida's current success. Despite this success, the Florida legislature keeps trying to sabotage the program.

Florida Virtual School is also the most prominent case study for the "disruptive innovation" theory that posits virtual education as a transformational opportunity for greater personalization in education. The seemingly arbitrary nature of a few words inserted into a long amendment make this type of innovation so much more fragile in the public sector.

PS - Before you start pointing your fingers at the nefarious teachers unions, note that this time its the Republican-dominated Florida State Senate that's proposing these changes.

Monday, March 30, 2009

Not Exactly

Fred Hiatt sat down with Bill Gates to talk education reform, producing a pretty straightforward reformist summary in the Post yesterday. One point, however, deserves clarification. Hiatt said:

In fact, Gates said, evidence shows no connection between teaching quality and most of the measures used in contracts to determine pay. Seniority, holding a master's degree or teacher's certification, and even, below 10th grade, having deep knowledge of a subject -- these all are mostly irrelevant.

A lot depends on what you mean by "mostly," I suppose, but you really can't characterize each of those factors in exactly the same way. Holding a master's degree has been proven to be pretty definitively irrelevant; this is one of the most consistent findings in the research and really quite shocking when you think about it. The effect of certification is a little tricky to measure because the vast majority of all teachers are certified and those who aren't tend to differ from the general population in specific ways, positive (TFA) and negative (the district needed a body in the classroom and literally hired someone off the street.) Master's degrees, by contrast, split about 50/50. They are an advanced postseconary credential in education granted by an accredited institution of higher learning and yet when you control for other factors they have no impact on classroom effectiveness, as measured by student learning gains, at all. This is one of those public policy scandals that's so big and ubiquitous and long-standing that it's hard to see, becuase it's everywhere. 

Experience, by contrast--that is, "seniority"--really does make a big difference in the early years of teachers' careers. Nearly everyone is a lot better after five years in the classroom than they are right out of the box. Then effectiveness flattens out and actually declines near the end of teachers' careers. Of course it's true that some first-year teachers, while much worse than their future fifth-year selves, are still better than the current fifth-year selves of other teachers. But still, we shouldn't lose sight of seniority--low-income, low-performing and minority students tend to be disproportionately assigned to rookie teachers with terrible consequences. I know a former teacher who in her first year of teaching was assigned all the low-performing students in her grade, and when she asked why, her principal said "We figured you'd be gone by November anyway." 

The Unmatched

New York City released the results of its mandatory high school admission process last week. It's receiving a lot of negative attention at online parent forums (like this one) for the fact that 7,500 students (nine percent) received no placement at all. These students will have to submit preferences to a supplementary round for placement at new or under-subscribed schools. Understandably, parents and the unmatched students are upset, but, while the 7,500 number seems large, for some missing perspective here are the numbers of unmatched students over the last several years:

2009: 7,455
2008: 7,722
2007: 8,340
2006: 8,097
2005: 10,217
2004: 16,609
2003: 34,837

These numbers are incredibly important: they are the number of students who did not get one of their top twelve choices in a system that prides itself on choice. These are the failures of the choice system. But, as the numbers above show, New York has gotten a lot better. In a city where roughly 90,000 eighth graders apply to high school each year, they are now able to give more than 90 percent of students one of their choices, when just six years ago more than one out of three students was administratively assigned a school in a choice system.

The fact that the number of unmatched students continues to fall is a good thing, but it also poses the question of how many is acceptable. How do we know what's good enough? Do the positives of school choice--harnessing the power of parental preferences, opening more schools to more students--outweigh the non-trivial pain experienced by the unmatched families? Not to mention the costs of implementing a choice system, the school fairs, the public relations efforts to explain choice options, the investments in technology to match students, etc.

In almost every other public school system in the country, seats are filled based on who lives nearby. If a student happens to live in a neighborhood zoned to a crappy school, they don't have many options. New York City has been one of the leaders in changing that paradigm; unmatched students are a side effect of that process, but fortunately, a diminishing one.

Friday, March 27, 2009

Maybe Alaska Should Use the Stimulus Funds to Settle Adequacy Lawsuit?

Last week Governor Palin declared that Alaska would only apply for around half of the stimulus funding for which it is eligible; largely funding for infrastructure projects, but not for operational programs like education (here). This of course was seen much more as a political move than a policy one, and as you would expect, the education community in Alaska is not very happy about it (here). It appears that the court agrees that the Alaska education system has enough money. In a unique school finance adequacy lawsuit, the court in Alaska ruled that the state had provided sufficient funding to meet the constitutional requirement of funding an adequate system. But the court ruled in 2007 that the state has not provided sufficient state oversight and assistance for underperforming schools (Gleason%202-4-09%20Decision.PDF) (commentary here). That lack of oversight appears to have continued, and the court is starting to push the state to improve its oversight role giving it 60 days to make improvements. Maybe Governor Palin should consider accepting some of the federal funding to start to address their unconstitutional oversight system instead of trying to position herself for a 2012 presidential run.

Thursday, March 26, 2009

Illegals Get All the Breaks

The DREAM Act was just reintroduced in the House and Senate…again. First time was in 2001 and last time was in 2007, when its passage fell short by just a handful of votes. And again this bill, which is essentially about a "path to citizenship" for youth who were brought to the U.S. as children, has broad bipartisan support--introduced by Senators Richard Durbin (D-IL) and Richard Lugar (R-IN) and Representatives Howard Berman (D-CA), Lincoln Diaz-Balart (R-FL) and Lucille Roybal-Allard (D-CA)along with many others who have been pushing for the legislation for years. The 2007 round led to quite a debate, much about the in-state tuition portion of the act, the implications of which are captured well here at MPI. Debate will be heated this year too, again with many "no" votes driven by fear and anger that the bill gives illegals a better deal than real Americans. But as suggested here, this legislation isn’t only the right thing to do for children who are undocumented by no fault of their own but it is frankly the smarter approach for a nation that invests in these kids for their entire elementary and secondary school education (by law). Why give up on them, and this incredibly expensive investment, once they’ve successfully completed high school? To do so is inconsistent, illogical and economically irresponsible. And to think they might "go home" is just silly.

Some things you can't explain, like why we're all embracing conventional wisdom in a world that's so unconvential

The longer I work in public policy, the more I think about conventional wisdom. 

These little nuggets of thought--some essentially correct, others partially so, others not at all--are the building blocks of a shared narrative that profoundly shapes how we see the world, and thus how we act within it. While some methods of changing public policy involve directly influencing key decision-makers through persuasion, bribery, etc., most amount to engaging in a chaotic struggle to force ideas through the hourglass-center aperture that controls access to conventional wisdom and as such the public mind. It's a low-success rate / high-payoff business. The vast majority of ideas and findings die lonely, anonymous deaths. But if, by dint of force or accuracy or plain dumb luck, you can maneuver an idea past the point of increasing returns, the result is close to miraculous. The web of human communication begins to exponentially multiply its force and breadth. Suddenly the idea is everywhere, and having gotten there, it's very hard to extract. Depending on your point of view, it becomes a constant asset or a persistent obstacle. Either way, it's difficult to ignore. 

Really getting a grip on the present state of conventional wisdom can be tricky. Newspapers and general interest magazines are reliable sources, but they're still influenced by personal idiosyncrasies--a particular reporter or editor may have certain opinions and experiences that shape the tenor of coverage and commentary. Political platforms and the views of politicians are another good measure, but they're also individual-dependent. If a Senator's husband happens to be a special ed teacher, her view of IDEA will no doubt be affected.

No, to really get a piece of unadulterated education zeitgeist circa the present day, you have to find a popular and basically unserious media outlet that only bothers to think about education in any way shape or form while assembling some sort of essentially ridiculous "100 Agents of Change" list that includes among the Top Twenty world-historical figures like Judd Apatow, Tina Fey, and the guy who invented Twitter. I refer, of course, to Rolling Stone magazine. Coming in at number 98 we find U.S. Secretary of Education Arne Duncan. Here's what RS has to say:

WHAT HE'S CHANGING: The expectations for public education in America. The ex-CEO of Chicago's public schools has the resources — $100 billion in stimulus funds — to turn the crisis in our schools into opportunity. Duncan is committed to removing obstacles to innovation — including bad teachers — and intercepting at-risk kids before kindergarten.

FRIENDS SAY: "He just wants to find and scale the ideas that work, period," says Wendy Kopp, CEO of Teach for America.

NEXT FIGHT: Working with the politically powerful teachers' unions to match pay to classroom performance.

That last sentence would be made a lot more accurate by changing the word "Working" to "Fighting" (or the word "with" to "against"), but otherwise this is probably a pretty good snapshot of Secretary Duncan CW as of right this minute. And he should be worried. $100 billion is a lot of money in nearly every context except public education, where it represents only about 1/6th of what we spend on K-12 schools every year. And it's really only 1/12th, because it's $100 billion over two years. And it's really much less than 1/12th, because a big chunk of that money is for universities and Pell grants. And it's really much less than that, because most of what's left isn't for education reform but basic macroeconomic stabilization, keeping teachers and professors from being laid off. 

In reality Duncan has a little more than $5 billion to influence states that have a long and sordid history of taking federal money and then actively working to subvert the goals for which the money was allocated. It's hard, slogging, bureaucratic work and it's not going to catalyze a sea change in the way our massive, decentralized education system operates. Education isn't energy or health care, issues where federal initiatives can have an immediate and transformative effect on national policy. This is a case where sky-high expectations on the front end almost guarantee some level of disappointment down the road. 

Backfilling Cuts? Not at the State Level

In California, the state took action last month to address an over $40 billion budget gap through a combination of program cuts, new taxes and a whole lot of other manipulations. The voters will decide what they think about the package in a special election on May 19th. While the actions taken were historic, they addressed the budget hole identified at the end of 2008. Oh course the economy has gotten worse since then, and recent reports have suggested that there is still an additional $8 billion hole that needs to be filled in 2009-10 that grows larger in the out years. Education’s share of that $8 billion hole based on a state constitutional funding formula is around $3.6 in more cuts.

So, it came as no surprise when the legislature’s fiscal advisors proposed several steps that the state could take to use the various streams of education stimulus funds to basically backfill their state general fund budget problems (here). Clearly the purpose of the budget stabilization funds was to do just that. But the stabilization funds are not enough to backfill the holes in California, so the legislature’s advisors suggested going after as much of the rest of the education funding as possible including Title I, special education and state mandated activities. They suggest the state should use these funds to fill the holes in their future budgets at the state level.

The California Congressional Delegation was not pleased (here). They make it very clear, that these funds are not for the state to use to solve their own problems, these funds are to be passed through as quickly as possible to keep California teachers from getting laid off now (current count of layoff notices given to teachers in the state is over 27,000). It appears that the congressional intent has been heard, and the Governor intends to get these funds out as quickly as possible (here). What about next year’s continued budget hole? I guess that is a problem for next year. The message is clear, use the funds now, and worry about the fiscal cliff later. Unfortunately, it looks like California schools are facing one fiscal cliff after another until the state starts to balance revenues and expenditures. (here is a prior discussion on fiscal cliffs)

Wednesday, March 25, 2009

Points for Style?

Nick Kristoff's recent column about Michelle Rhee brings up a common trope in school reform controversies: "leadership style," with Kristoff averring that "Ms. Rhee’s weakness is her bedside manner." Per Eduwonk--really? Is that all? Read Dana Goldstein's informative new TAP article about UFT President Randi Weingarten, Rhee's chief antagonist, who "speaks in the commanding, practiced tones of a unionist," who is "known as a guns-blazing New York power broker," who "came up as a New York City labor lawyer" and "ended the political career of City Councilmember Eva Moskowitz." Does she sound like the kind of person who's going to be swayed one way or another by bedside manner? Of course not--frankly, it's an insult to Weingarten more than anyone to suggest otherwise. 

Rhee has been described at various times as "abrasive," "mean-spirited," "confrontational," and many other words that can't be reprinted on a PG-13 blog. But these are just distractions--deliberate ones--from the real issues at play. Rhee clearly believes that a not-insignificant number of teachers currently employed by DCPS aren't doing a good enough job and should be replaced by better teachers. Most of those current teachers, I'm guessing, see things differently. They're well-organized and represented (at the national level, at least), and so these divergences of opinion and interest are going to get fought out in the realms of politics and public opinion. That's as it should be. But let's not fool ourselves by thinking that this is an argument about manner and style.  

The Talented Tenth

Since 1996 Texas has had a law on the books guaranteeing admission to any of the state's public higher education institutions for any student graduating in the top ten percent of his or her high school class. The law is credited with remarkable results in enhancing diversity at the state's most prestigious public institutions, and students admitted under the rule consistently outperform those who are not, yet state legislators have repeatedly tried to kill the bill. The latest attempt is expected to pass the state Senate today.

Legislators want to kill the ten percent rule mainly because of its impact on the state's flagship, the University of Texas at Austin. There, the percent of students admitted under the ten percent rule has climbed rapidly, from 43.2 in 2000 to 69.9 in 2008. This prompted Texas at Austin's president to write in an op-ed last fall, without any sense of irony, that, "if this trend continues unchecked...we will be required to admit more than 100 percent of our class under this rule." He also warned, in reminiscence of the Washington Monument Gambit, that the rule may force the school to cut its football program.

The original bill was intended as a way to expand diversity without imposing quotas, and it's worked. At the same time more students have been admitted under the rule, Austin's racial/ ethnic diversity has improved. For some context, consider that black and Hispanic graduates make up 48.9 percent of all Texas high school graduates, but only 20.5 percent of the enrollment at UT-Austin. In the last eight years, thanks mostly to the ten percent rule, black and Hispanic enrollment has begun to close that gap.

This would all be some feel-good diversity policy if the ten percent students failed to produce results. In fact, they earn higher freshmen grades and stay in school and graduate at higher rates than students accepted by all other methods, even ones with higher SAT scores. In other words, the ten percent admissions policy does a better job of screening applicants than the university's own admissions office.

What this really is, like plans in other states, is a ploy to get more students in from certain in-state locales. The ten percent rule has opened UT-Austin to students from all over the state and from high schools that never used to send students there. At the same time, coveted spots have been lost from suburban and wealthier areas. Legislators who want to kill or reduce the ten percent rule primarily come from these districts.

To end the rule would be short-sighted. There would be no stopping the institution from deciding that it needed more out-of-state students, who pay more tuition, to cover expenses. There are already headlines like, "Texas May Allow More Marylanders Into UT." Moreover, the policy creates a sense that Texas higher education institutions are for Texans. It creates buy-in with state taxpayers and legislators that the higher education institutions they finance are opening their doors to students from across the state. Hopefully the Texas Legislature continues to see the policy's merits.

Tuesday, March 24, 2009

Murray Vs. Murray

Earlier this month Charles Murray, of The Bell Curve fame, gave this year's Irving Kristol Lecture at the American Enterprise Institute. Most of it reads like an Ayn Rand objectivist diatribe against socialist democratic states using"Europe" as the code word for all that is wrong with the world. Apparently that sort of thing is popular again. While Murray, to a certain extent, is playing on the political mood, his logic is extraordinarily flawed. Here he is railing against what he calls the equality premise:

The equality premise says that, in a fair society, different groups of people--men and women, blacks and whites, straights and gays, the children of poor people and the children of rich people--will naturally have the same distributions of outcomes in life--the same mean income, the same mean educational attainment, the same proportions who become janitors and CEOs. When that doesn't happen, it is because of bad human behavior and an unfair society. For the last forty years, this premise has justified thousands of pages of government regulations and legislation that has reached into everything from the paperwork required to fire someone to the funding of high school wrestling teams. Everything that we associate with the phrase "politically correct" eventually comes back to the equality premise. Every form of affirmative action derives from it.

While Murray is clearly conflating equality of opportunity with equality of outcomes, what's most interesting, and entirely hypocritical, is that he later goes on to mourn how equality of opportunity is diminishing:

Perhaps the most important difference is that, not so long ago, the overwhelming majority of the elites in each generation were drawn from the children of farmers, shopkeepers, and factory workers--and could still remember those worlds after they left them. Over the last half century, it can be demonstrated empirically that the new generation of elites have increasingly spent their entire lives in the upper-middle-class bubble, never even having seen a factory floor, let alone worked on one, never having gone to a grocery store and bought the cheap ketchup instead of the expensive ketchup to meet a budget, never having had a boring job where their feet hurt at the end of the day, and never having had a close friend who hadn't gotten at least 600 on her SAT verbal. There's nobody to blame for any of this. These are the natural consequences of successful people looking for pleasant places to live and trying to do the best thing for their children.

In other words, the focus on equality is a bad thing, Murray says, and it's wrong because we've gotten more unequal over the last half century. Huh?

The College Admissions Lottery

It's a cruel irony that the more people buy into the notion that there's a "right" college or university out there for them (a myth that's perpetuated by the schools themselves), the harder it is for students to get in. Kids and their parents see how hard it is to get into "good" schools so they apply to more colleges, which in turn lowers the chances of acceptance for everybody.

This makes the admissions process far more random than colleges would like us to believe. And it makes the myth of a meritocracy, on which the selective admissions system is built, substantially a lie.

Selective colleges did not mean for this to happen; rather, they are victims of their own success, along with the emergence of a truly national higher education market and the rise of a rankings-driven consumer culture. But, there is no going back now, so colleges should embrace the unavoidable randomness and go from a lottery-like system to a true lottery. For more on why and how this might work, read my piece in today's InsideHigherEd.

Sunday, March 22, 2009

The Rich Get Richer

Per Sam Dillon's New York Times article about how education stimulus funds are being distributed through funding formulas that advantage rich states over poor states, it's all true, there's no excuse for it (note the lack of anyone offering a policy justification), for a more detailed (but not boring!) explanation see this from Marguerite Roza and yrs. truly. 

Friday, March 20, 2009

The Difference Between Knowing and Caring

Frank Heppner, honors professor of biological sciences at the University of Rhode Island, wrote a good column in the Chronicle a couple of weeks ago that nicely illustrates the importance of understanding the nature of problems. Heppner's essential point is that because universities value research more than teaching, teaching suffers, hurting students and the university bottom line. It's worth reading in full but here are some highlights:

In research universities, those faculty members who write and obtain grant proposals enjoy certain perks, including summer salaries, more travel, more space, and an extensive list of other benefits, great and small...Large introductory courses therefore become orphans cast out into the snow, sustained only by the good will of the transients who are their temporary custodians. To the successful researcher (in the financial sense) come fame, money, promotion, and prestige. To the good teacher comes the gratitude of his students...all the time I spend with these students I could be working on grant proposals. However, out of my 600 students, 114 are statistically at risk of not returning. If, through this personal attention, I "salvage" only five of those students, I will have recovered $250,000 in lost tuition. And I can do that every year. In my discipline, that is far more than I would ever be able to generate in grant overhead...Can faculty members be trained to be more effective teachers and so have an impact on retention? Absolutely. Instructional-development programs traditionally do just that. These offices are typically marginalized and token at research universities, without appropriate money, prestige, or appreciation. Faculty members typically have no official incentive to seek advanced training in teaching; in fact, they are often discouraged because of the disproportionate emphasis placed on research "productivity."

Student retention and poor teaching in introductory courses are chronic problems in higher education. But not all long-standing problems are the same. Some (we'll call them Type A) are essentially a matter not knowing how to solve the problem. Others (Type B) persist because people don't want to solve them. 

Most big issues combine both elements, in unequal amounts. Breast cancer is a Type A problem; pretty much everyone wishes a cure could be discovered, and if it were, that would save millions of lives. Inequitable school funding is purely a Type B problem. Some states provide adequate funding to high-poverty school districts while others don't. Those that don't do so because selfish people who prefer to hoard their dollars at the expense of providing equal educational opportunities for all children have enough political power to maintain the status quo. It's no secret how to distribute funds equitably, they just don't want to. Other issues--substantially reducing the absolute level of carbon emissions from the nation's passenger and commerical vehicle fleet, for example--lie somewhere between, requiring a combination of scientific breakthroughs and political will. 

Frank Heffner is describing a Type B problem. It's not that universities don't know how to change their incentive structures to give teaching more value, or how to help people become better teachers. They just don't want to. Which is not to say such a change would be easy, I'm sure it would quite difficult. But the reason it would be difficult is because the people who control the levers of power at universities want to keep things pretty much they way they are.

This matters for how we think about solutions. Type A problems are generally solved by resources and incentives focused on producing new knowledge. Type B problems, by contrast, are essentially political and values-based, and thus require politically-grounded solutions: public awareness, organizing constituencies, framing problems in terms of larger ideological agendas, changing the incentives that influence decision-making. And a favorite tactic if your self-interest makes you the source of a Type B problem is to pretend it's Type A, to say "Of course, something must done, and so we should invest in more research to identify new methods and best practices and perhaps if more resources were available etc. etc." This is a deflecting maneuver and should be understood as such. 

Thursday, March 19, 2009

College Rankings Will Never Die

Earlier this week I spent a couple of hours talking to education officials from North Africa and the Near East who are in Washington DC as guests of the State Department, learning about our education system. Near the end of the discussion, I had the following exchange with an education official from a large but sparsely populated North African country, the gist of which goes a long way toward explaining why college rankings are an unavoidable reality of higher education in the 21st century and as such need to be embraced not rejected. 

Official: Yesterday I was told that there are over 4,200 universities in the United States, is this true?

Kevin: Colleges and universities, yes, although that's a pretty broad number that includes a lot of small religous and occupationally-oriented institutions; if you narrow the field to "traditional" four-year private non-profit and public instituions, it's more like 2,000. But still, there are a lot. 

O: You see, this is actually a problem for my country, because we are thinking of creating a program where we pay students to attend an American university, but we don't know if it is okay to allow students to attend any institution or if we should have a list and say "you can only go to an institution that is on this list." Can we assume that any accredited institution is a good institution?

K: Well, no, I wouldn't say that, accreditation only guarantees a minimum level of quality and there are big differences among accredited institutions; some are much better than others.

O: I see, well, what about the "state universities"? I was under the impression that these are the official universities identified by the government as "the best" but now I am learning that may not be true.

K: No, some state universities are among the best and are very selective and receive a great deal of support for the government,but we also have many state universities that are not as selective and receive less funding, and while some of these are also very good some are not. 

O: But then there are the private universities that we all know of such as Harvard and Princeton and so on, these would definitely be okay, yes?

K: Well, again, some private colleges and universities are very good but this is also a large and diverse sector of our system and so there is a great deal of variety and for every Harvard there others that are not so good

And with this the official sighed because I was being of little help. His ministry of education didn't have vast resources at its disposal to independently audit and evaluate the huge number of colleges and universities in America. Students from his country obviously can't hop in the car with Mom and spend the weekend going on campus tours. He needs to make a rational choice with limited information, and so he'll probably end up using some set of independent rankings as a guide--U.S. News, Times Higher, Shangai Jiao Tong University, etc. By doing this, he'll be subjecting his policies and students to the considerable methodological limitations of those rankings. But given the choice between using an imperfect measure of quality and no measure of quality, he'll go with Option A. 

The point being, this is an entirely rational approach. It's what I would do if I were him. And in this sense the Official is in more or less the some position as individual students all over America (and, increasingly, the world) when it comes to choosing which college to attend. The choices are so many and the institutions themselves are so complex that there is simply no practical way for time- and resource-limited individuals (or foreign ministries of education) to gather complete information about every possible choice. It can't be done. So they'll rely on some other, larger, self-proclaimed expert institution with greater resources to do it for them. And that gives the self-proclaimed expert, the evaluater, the ranker, enormous leverage in defining the terms of quality in higher educaiton and as such the incentives under which decisions are made. 

Things are only going to keep moving in this direction--more mobility, more information, more choices, more institutions or higher education providers, more people all over the world having to make choices about postsecondary education and seeking guidance and interpretation to do so. Colleges can cede that responsiblity--and thus, control over their destiny--to for-profit newsmagazines. Or they can come together and seize that power back by defining and standing behind rankings of their own.  

Moreover, I'm not convince that the traditional hands-on approach to college choice works so well. The minority of college students who actually choose among a signficant number of institutions generally seem to identify a band of colleges that they're likely to be able to attend, and choose among them in signficant part based on the campus visit and the "feel" of the institution. This is apparently so important that some colleges are hiring consultants whose whole job is to "audit" the experience:

In his evaluations, [the consultant] rates the experiential qualities of each visit: Do visitors get a warm welcome from security guards and secretaries? Do tour guides ask open-ended questions? Does something fun happen?

I'm sure these things matter, but what do they have to do with whether students will get a good education and earn a degree? If students are making college choices based on whether they got a good vibe from walking around the campus for a couple of hours or if they happened to be assigned to a charismatic tourguide with a knack for storytelling, they're probably going to end up making a lot of sub-optimal choices, which might got a little way toward explaining why transfer and dropout rates are as high as they are. They might be better off sticking with rankings.  

Wednesday, March 18, 2009

Tennessee Growth Models: A Response from Dr. William Sanders

Ed. Note: Last week, Education Sector published a report titled "Are We There Yet? What Policymakers Can Learn About Tennessee's Growth Model." The report examines Tennessee's model for using measures and projections of annual student learning growth as a means of determining whether schools are making "Adequate Yearly Progress" under the No Child Left Behind Act. William L. Sanders, the architect of Tennessee's widely-known, growth-based "value-added" model of measuring school and teacher effectiveness, has written a response, which is published below, followed by a response from the report's author, Charles Barone.


Response from William L. Sanders, SAS Institute

Several years ago, I was encouraged by various educational policy leaders to initiate studies on how the AYP part of NCLB could be augmented so that educators working within schools with entering populations of very low achieving students could be given credit for excellent progress with their students and avoid being branded as a failing school. From the Tennessee longitudinal database, it did not take long to find “poster child” schools. One Memphis City school at that time had 3rd graders with a mean achievement level that would map to the 26th percentile of all Tennessee 3rd graders. Yet this school had this same cohort of students leaving 5th grade at approximately the 50th percentile relative to the state distribution. Still this school was failing AYP because the 3rd, 4th and 5th grade scores had to be composited. Relative to NCLB, this was a failing school. In our view, this was not a failing school; rather it was an outstanding school and should not suffer the indignity of the failing school label.

Additionally, in our early investigations, we found schools that had passed AYP, yet had many of their students who had received the proficiency designation to have trajectories that would lead to a non-proficiency status in the future. It was our intent to develop a process that would give positive recognition to those schools that were truly ramping up their students’ achievement levels, while not giving credit for those schools whose already proficient students were being allowed to slide. In other words, we endeavored to create a methodology to give the current schools credit for changing academic trajectories so that their students would have the opportunity to meet various academic standards in the future if effective schooling was sustained into the future. We sought methods consistent with the intent of NCLB to rectify the mislabeling of very effective schools as failing. We were encouraged to develop a process to be an augmentation of the AYP process, not a replacement for the USDE approved AYP process.

At the time we initiated these studies, we had many years of experience working with longitudinally merged student test data and knew of many difficult problems that would have to be addressed in order to achieve a process that would have fairness and reliability. We considered and rejected several approaches. One of the first to be rejected is one of the simplest. For example, one such approach would be to take a student’s current test score, subtract that score from the proficiency cut score three years in the future, divide the difference by three to yield how much progress that student must make per year. If, in an intervening year, the student’s score exceeds the target score, then that student would be deemed to have made appropriate growth. Then the percentage of students making growth could be calculated for each schooling entity (i.e. district, school or subgroup) and the approved AYP rules could be applied as for the AYP status procedure.

We rejected this approach because (1) the error of measurement in any one score for any one student is so relatively large that the setting of an improvement target by this method will inevitably send the wrong signals for many students, and (2) by definition vertically scaled tests provide scores that are intrinsically nonlinear over grades, resulting in uneven expectations of growth at the student level. We believed there to be a process that avoided these two major problems and would result in greater reliability for the final estimation of whether or not a school had earned the right to be considered a non-failing school even though it had not met the regular AYP requirements.

One of our first objectives was to avoid making judgment about the progress of an individual student based upon one test score—like what is done with simple approaches similar to the one outlined above. To minimize the error of measurement problem associated with one test score, we elected to use the entire observational vector of all prior scores for each student. In some of our earlier work, we had found that if at least three prior scores are used, then the error of measurement problem is dampened to be no longer of concern. (1) This number independently of us was also found by researchers at RAND, Inc.

The goal is to give the current school credit for changing the trajectory of its students so that they can meet various academic attainment levels in the future. How is this to be measured? Consider an evaluation to see if a school’s fifth graders are on pace to meet or exceed the proficiency designation in 8th grade. By using the data from the most recent 8th grade completing cohort, models can be developed which will allow projections for the current 5th graders as to their likely 8th grade scores assuming the same future schooling experience as the current 8th grade cohort received. Thus, the projected score enables an evaluation to see if each student is likely to exceed the 8th grade proficient standard. If so, the current school receives a positive credit. The percent of projected proficient students is calculated and all of the regular AYP rules are applied to see if this school has made AYP.

What this approach accomplishes is to use all of each student’s prior scores—instead of just one score as in the simple case—to give a more reliable measure of the impact of the current school on the rate of progress of its student population. In other words, this approach uses multivariate, longitudinal data from students from the current school to provide estimates to map into a future scale without the test error problem. Additionally, this approach avoids the inherent non-linearity problem of vertically scaled test data in that this approach only requires the assumption of linearity between the prior scores and the future score; an assumption that is easy to verify empirically. The author raised questions about the reliability of the projected values.

But the use of “projected scores” by Tennessee introduces an additional error factor. The Tennessee growth model substitutes a predicted or expected score for the AMO. Tennessee shows that the correlation of the predicted scores with actual scores is about .80 (R=.80). This means the percentage of variance accounted for is only about two-thirds (.8 squared = 64 percent); thus, one-third of the variance in actual scores is not accounted for by the predictive model. While an R of .80 is quite respectable in the research world, it may not be adequate for making real-world decisions. Many students will be counted as being on track when they are not, and vice versa.
The projected scores have much smaller levels of uncertainty than progress measures based upon one prior score. It is true that the projected values in the Tennessee application do not consider future schooling effectiveness and will reduce somewhat the relationship between the projected scores and the observed scores in the future. However, the objective is to give the current school credit, not to hold the current educators’ evaluation hostage to what future educators may or may not do! Additionally, it is most important to acknowledge and give Tennessee credit in the fact that all students’ projections are used in the determination of percent projected proficiency, not merely those students who did not have the proficiency designation. In other words, students who are currently designated to be proficient but whose projected values fall below the future proficiency designation will count as a negative, providing an incentive to focus on the progress rates of all students and to minimize the focus on just the “bubble kids.”

Response to Zeno’s paradox assertion

The author spent considerable energy and space in the paper asserting that the Tennessee projection model is unwittingly trapped in Zeno’s paradox. He asserts that students can make small amounts of progress, yet have their projected scores exceed the future proficiency level. Since the next year the projection targets are reset to another grade, this will allow schools to “get by” with suboptimal growth, resulting in students not obtaining proficiency due to the modeling itself. We dispute the author’s assertion! The author states:

“The disadvantage is that Mary’s target will always be “on the way” to proficiency because, under Tennessee’s model, Mary’s goals are recalculated each year. Her target fifth-grade score is one that is estimated to put her on the path to proficiency by eighth grade. Her sixth- through eighth-grade scores are recalculated each year based on a score projected to be on the path to a goal of proficiency by high school.”
As was previously stated, the goal is to evaluate the progress made in the current school. The totality of Mary’s data provides the estimate of Mary’s future attainment. If Mary has a ‘good’ year, then her projected achievement level goes up. The future distribution that Mary’s projected score maps into has the same proficiency standard as has been approved for regular AYP determination. Additionally, and most importantly to the Zeno's paradox argument, if the cut score for 7th and 8th grade is essentially at the same place in the statewide distribution, then it does not matter to which distribution her projected scores are mapped—so de facto there is no remapping. This is essentially the case for Tennessee’s 7th and 8th grade Math and Reading/Language Arts proficiency cut scores. The author’s resetting argument has no relevance and Zeno’s paradox does not apply.

Other responses

He further asserts:

“As long as each student makes a small amount of progress toward proficiency, the school could hypothetically make AYP through the growth model even though not a single student had actually gained proficiency. This is not only true in any given year, but also could be true for a period of more than the three years implied.”
The idea that a school can have all students making a small amount of progress toward proficiency and yet make AYP with no proficient students is unreasonable. Just because a student makes a little progress, it does not mean that his or her projection will be greater than the target value. A school with all students not proficient would have a large number who are very low in academic attainment. These students would need to have made substantial academic progress to be projected to being proficient within three or fewer years. This would require more than a small amount of progress from each student. If the very low achieving students are projected to proficiency in three years, then their growth trajectories must have changed substantially. The author’s conjecture that only small amounts of growth are necessary to meet projected proficiency is just not accurate.

Another comment:

But relative to the Tennessee growth model, the safe harbor provision has three advantages. First, it is based on real data, rather than a projection. Second, it is based on students achieving a set, policy-driven target—proficient—rather than a moving, amorphous, and norm-referenced target (i.e., a projected score), which has many more variables.
This whole passage is misleading. As was previously mentioned, percent proficiency calculations, as used in safe harbor, are estimates based on test scores with errors. The projections are estimates based upon much more data and contain more information than is in the one test score for the safe harbor estimates. The statement, “rather than a moving, amorphous, and norm-referenced target,” is totally wrong. There is not a norm-referenced target: the projections are to established proficiency cut scores for future grades.

The author further states:

The Tennessee growth model also means that schools in Tennessee will be able to make AYP in 2014 without all students being proficient.
This statement could be applied to all growth models, safe harbor, and the present AYP status model as well if test errors are taken into account. To single out the Tennessee model with such a declarative statement without careful consideration of all of the uncertainties around the estimates from other models is inappropriate. As was stated earlier, many of the opinions expressed in this paper ignore the innate test errors in one set of AYP approaches yet attempt to magnify uncertainties in the Tennessee projection model. In fact, this is an area in which the projection model has advantage over the others.

The following is a statement that is clearly and provably wrong:

Schools will be more likely to make AYP under the projection model than the other two models.
In Tennessee, many more schools make AYP with the status approach than either with safe harbor or projections, and many more schools make AYP through safe harbor than do through the growth model.

Not in the current paper, but in a recent blog by the author the author stated:

Some of the methodology and most of the data are proprietary, meaning they are privately owned, i.e., no public access. This all makes it very difficult for even Ph.D. and J.D.-level policy experts to get a handle on what is going on (which I found as a peer-reviewer last year), let alone teachers, parents, and the general public.
Exact descriptions of the methodology deployed in the Tennessee projection calculations have been published in the open literature since 2005.(2) Additionally, the proposals to utilize this methodology have been reviewed and approved by four different peer review teams assigned by the USDE. Also, at the request of Congress, the early approved proposals were reviewed by a team from the GAO. In that review, the software for the Tennessee calculations was reviewed and evaluated to give an independent evaluation as to computational accuracy.

Agreements with the author

We agree with some of the author’s comments.

The use of growth models represents an opportunity to improve upon the state accountability systems currently in use under NCLB. NCLB’s focus on a single criterion, proficiency, and its lack of focus on continuous academic progress short of proficiency, fails to recognize schools that may be making significant gains in student achievement and may encourage so-called “educational triage.” The model does offer some advantages. By setting goals short of, but on a statistically projected path to, proficiency, the model may provide an incentive to focus efforts—at least in the short-term—on a wider range of students, including both those close to and those farther away from the proficiency benchmark. It also may more fairly credit, again in the short-term, those schools and districts that are making significant progress that would not be reflected in the percentage of students who have met or exceeded the proficiency benchmark.
We also agree with the author that Congress and the Secretary should review and learn from the growth models which have been approved. After working with longitudinal student achievement data generally for nearly three decades and working with various models to be used as augmentations for AYP specifically, I have formed some opinions that I hope are worthy of serious consideration:

• Simplicity of calculation, under the banner of transparency, is a poor trade-off for reliability of information. Some of the more simplistic growth models sweep under the rug some serious non-trivial scaling, reliability and bias issues. The approved models for Tennessee, Ohio and Pennsylvania represent a major step in eliminating some of these problems.

• Reauthorization of NCLB should put more focus on the academic progress rates of all students, not merely the lowest achieving students. Our research has shown for years that some of the students with the greatest inequitable academic opportunities are the early average and above average students in schools with high concentrations of poor and minority students. Too many of these students are meeting the proficiency standards, yet their academic attainment is sliding.

• Serious consideration should be given to setting future academic standards to various attainment levels. For instance, for Tennessee we provide projections to proficiency levels (regular and advanced), to minimal high school graduation requirements, to levels necessary for a student to avoid being vulnerable to taking a college remedial course, and to levels required to be competitive in various college majors. Some or all of these could be included in an AYP reauthorization with some careful thought. States which presently have these capabilities should be encouraged to move forward. Moving to these concepts will tend to avoid the conflict over what cut score the word ‘proficiency’ should be attached.

****

(1) This is true because the covariance structure among the prior scores is not related to test error. For the Tennessee application, if a student does not have at least three prior scores no projection is made and a student’s current determination of proficient or not is included in the percent projected proficient calculation
(2) Wright, Sanders, and Rivers (2005, “Measurement of Academic Growth of Individual Students toward Variable and Meaningful Academic Standards”, in R. W. Lissitz (ed.) Longitudinal and Value Added Modeling of Student Performance


Response from Charles Barone

First, I appreciate Dr. William Sanders taking the time to converse about the “Are We There Yet?” paper.

The Tennessee growth model, like those of the other 14 states in which growth models are in use, is a pilot program being conducted through time-limited waivers of federal statutory requirements. The purpose is to try something out, learn from it, and use the results to inform future policy efforts. This was the very reason I wrote “AWTY?” and why Education Sector published it.

I actually think that the paper addresses all the points raised in Sanders’ response, and here, for the sake of brevity, I will focus only on the key points. In most, though not all cases, it is, in my opinion, a matter of emphasis rather than real difference.

The Fallacy of “Failing Schools.” There are a couple of points raised in the opening paragraph that I will address later, but there is an overarching point that I think is implicit in this debate about NCLB in general and AYP in particular that I want to bring into the open.

In talking about a school that was doing well “normatively” i.e., relative to other schools (in terms of percentile ranks) at some grade levels, Sanders states:

Relative to NCLB, this was a failing school. In our view, this was not a failing school; rather it was an outstanding school and should not suffer the indignity of the failing school label.
Nothing in NCLB labels a school as “failing.” Why this is a common misperception (and why Sanders has bought into it) is a topic for another discussion, but it’s indisputable that many perceive the law as ascribing this label to schools “in need of improvement.” It seems to me that the school Sanders cites was likely neither failing nor “outstanding” but somewhere within the wide gulf between those two poles. The whole point of growth models, I thought, was to calibrate the differences between extremes, not to throw schools into one of two “either-or” (or dichotomous) categories.

The real purpose of federal law is to identify areas where students are falling short—by grade and subject area—and to direct resources to them early and as intensively as is necessary and appropriate. Doing otherwise is a state and local choice and, I would add, a misguided one.

Those involved in creating the NCLB law felt that, prior to enactment of the law in 2002, schools were able to hide behind average across groups and, it appears in Tennessee, across grade levels i.e., were used in a way that obscured areas in need of improvement rather than illuminated them. Average elementary school scores can hide deficiencies in third grade that would be better to address early. Average scores of all students can hide gaps between black and Latino students and their non-minority peers. Composites across subjects can hide subject area-specific shortcomings.

By bringing those problems to light, and funneling resources to those areas as early and as surgically (or radically) as needed and as is possible, it is hoped that students will get a better education and that potential long-term problems will be addressed sooner rather than latter. Hence, in the paper we make this point:

The Tennessee growth model will also reduce the number of schools identified by NCLB as falling short academically. This could be a positive change if it allows the state to focus more intensely on the lowest-performing schools. However, it will also mean that some schools may not be able to avail themselves of resources that could help address student learning problems early enough to prevent future academic failure.
It sounds like what we had in Tennessee was a labeling problem—calling all schools that did not make AYP “failing” rather than an AYP problem per se. I think most educators seeing a third grade with scores in the 26th percentile statewide (with one of the lowest set of standards in the nation) would want to address that problem promptly in the antecedent years (i.e., by improving what happens in pre-K, kindergarten, first, and second grade) rather than waiting two years to see what happens in fifth grade. Other states have gradations of not making AYP and target their interventions accordingly (such as at one grade level or in one subject) including interventions at grades prior to the grades in which testing begins. The law offers wide leeway to do so.

The third grade case cited by Sanders is particularly in need of attention, as stated in the “AWTY?” paper:

Slowing down the pace at which students are expected to learn academic skills in elementary school may create long-term problems for students and create larger and more difficult burdens for public education in junior high, high school, and beyond. A wide body of research suggests, for example, that children who do not acquire language skills in the early grades have an increasingly difficult time catching up to their peers as they progress. This parallels neuropsychological research that shows critical periods for brain development in language and other areas of cognitive functioning.
• Statistical Error. Sanders states that Tennessee rejected looking at non-statistically-derived scores (i.e., hard targets, rather than estimates) in part:

Because (1) the error of measurement in any one score for any one student is so relatively large that the setting of an improvement target by this method will inevitably send the wrong signals for many students.
Here, as at other points in the paper, Sanders seems to assert that the projected score model gets rid of measurement error. It doesn’t. Measurement error is inherent in any test score (as in any grading system). Sanders’ method uses the same tests as every other AYP model in use in Tennessee and the other 49 states.

What the projected score model does is introduce an additional source of error, “prediction” error (the difference between a projected score that a multiple regression analysis estimates will put a student on the path to proficiency and the actual score that would do so). This is pointed out in the paper, but unaddressed in Sanders’ comments:

…the use of “projected scores” by Tennessee introduces an additional error factor. The Tennessee growth model substitutes a predicted or expected score for the AMO. Tennessee shows that the correlation of the predicted scores with actual scores is about .80 (R=.80). This means the percentage of variance accounted for is only about two-thirds (.8 squared = 64 percent); thus, one-third of the variance in actual scores is not accounted for by the predictive model. While an R of .80 is quite respectable in the research world, it may not be adequate for making real-world decisions. Many students will be counted as being on track when they are not, and vice versa.
Sanders goes on to state that:

To minimize the error of measurement problem associated with one test score, we elected to use the entire observational vector of all prior scores for each student. In some of our earlier work, we had found that if at least three prior scores are used, then the error of measurement problem is dampened to be no longer of concern.
But what he does not mention is that current law allows this option (using “rolling three year” averages of scores) whether or not a projected model is used.

• Zeno’s Paradox Issue. The "AWTY?" paper concludes that many students under the Tennessee model will take longer than three years to reach proficiency even if they meet their minimum “projected” score three years in a row. Sanders states, through reasoning I could not quite follow:

The author’s resetting argument has no relevance and Zeno’s paradox does not apply.
I stand by the conclusions of the paper. I challenge Sanders, or anyone else for that matter, to show me an instance where:

1) there is a long-term goal (e.g., X distance in Y years)

2) there is an interim goal that is some fraction of X progress for some fraction of Y years and;

3) the interim goals are re-calculated each year for a fraction of the remaining distance to Y;

in which it doesn’t take longer than Y years to get there.

Sanders could of course clear all of this up by taking, say, 100 cases where we can see the projected scores for each student, each year, and where the student exactly makes each interim goal, to show us what happens in Tennessee in this instance over three successive years. As the paper shows, however, since the data and exact methods are proprietary, none of us can do this on our own, or we would have simulated such an instance in the paper. On this point, Sanders states:

Exact descriptions of the methodology deployed in the Tennessee projection calculations have been published in the open literature since 2005. Additionally, the proposals to utilize this methodology have been reviewed and approved by four different peer review teams assigned by the USDE.
It is true that the multiple regression formula that Sanders uses can be found in Tennessee’s application for the federal waiver, as well as in most elementary statistics books. Tennessee’s materials also include descriptions of some of the methods and adjustments that are specific to the Tennessee growth model.

But the details—including standard deviations and standard errors of measurement of students within in a school, and the histories of individual students over multiple years—are not available. Thus no one, at least no one that I have talked to, can do a real replication.

In addition, I sat on a growth model peer review panel in 2008 in which other states submitted models based on that of Tennessee. Not a single person at the Department with whom I inquired understood that in 2013, the goal for Tennessee was still proficiency in 2016, not 2014, and I think any casual observer of former Secretary Spellings’ comments over the last few years can attest to that.

• Size of Adequate Yearly Progress. Sanders disputes the paper’s contention about the interaction between the growth model’s incremental progress (extending the proficiency target over a number of years rather than proficiency each year) and Tennessee’s low standards. But he merely skirts over the latter point.

First, I don’t understand the artificial focus on those grades in which testing is done. If proficiency within three years is a doable goal, why not start with proficiency in fourth grade as a goal beginning in first grade (or kindergarten or pre-K) where research shows schools (and programs like Head Start or high-quality child care) can have an incredible impact? The state and its localities have all these resources at their disposal to impact the years within and outside the 3-8 testing box imposed by NCLB. Why not do so? Is more federally imposed standardized testing, in earlier grades, what is required to bring this about? (I, for one, hope not.)

Second, no matter what grade you begin in, the Tennessee standard for proficiency is low compared to the NAEP standard—lower than virtually any other state.

Again, let me re-state a section of the paper which Sanders does not address:

In fourth-grade reading, for example, the NAEP benchmark for “basic” in reading is a score of 243; for “proficient” it is 281. The NAEP-equivalent score of the Tennessee standard for fourth-grade proficiency in reading is 222, which is about as far below the NAEP standard of basic as the NAEP standard for basic is below the NAEP standard of proficient. The NAEP benchmark for basic in fourth-grade math is 214; for proficient, it is 249. The NAEP equivalent of Tennessee’s standard for proficient in math is 200.
So if reaching proficiency in Tennessee is a low goal relative to other states (which the state of Tennessee acknowledges and is trying to change) relative to NAEP, then fractional progress toward that goal is, by definition, even lower.

How could it possibly be otherwise?

• Linearity. Sanders asserts, regarding the Tennessee model, that:

This approach avoids the inherent non-linearity problem of vertically scaled test data in that this approach only requires the assumption of linearity between the prior scores and the future score; an assumption that is easy to verify empirically.
I chose not to go into this in the paper (for obvious reasons), but since the issue is being opened here, I think it should be addressed.

Linearity is a double-edged sword (stay with me until at least the chart on the next page). With vertical scaling, different tests can be equated across grades by re-scaling scores to make them comparable. We can’t go into all the relative advantages and disadvantages of vertical scaling here. (Sanders is right that there are disadvantages.)

But I must point out that Sanders' assertion of the linearity of non-vertical scores in Tennessee—which he says are easy to verify empirically—may not always be true. (Note that Sanders does not supply empirical verification but only asserts it can be verified.) In turn, applying a linear regression, as Tennessee does, to estimate future scores may distort the relationship between real growth in student scores and those scores projected through the statistical model.

Let’s say that over time, non-vertically scaled scores for some students are not linear but are parabolic (curvilinear) with accelerated growth in early years and a leveling off, and then a decrease in later years (a phenomenon not unknown in education research). Then let’s say we try to map a linear regression onto this model (with an R squared of .67, similar to the Tennessee model with an R squared of .64).

The chart below (From SPSS Textbook Examples, Applied Regression Analysis, by John Fox, Chapter 3: Examining Data. UCLA: Academic Technology Services) illustrates this scenario.






Here, the projected scores in the early years would be lower than the actual scores that would be seen over time. In this scenario, the linear model would set AYP goals below that which we should expect for students between ages 6 and 11. Conversely, the model would overestimate what we should expect for students over age 11.

This is just one of the many (virtually infinite) scenarios possible depending on student characteristics, age, and patterns of variance of scores for students in a particular school. The point is a linear regression only approximates, and in some cases can distort, educational reality.

• Transparency. In closing, I would like to address the issue of transparency. In his remarks, Sanders says:

Simplicity of calculation, under the banner of transparency, is a poor trade-off for reliability of information. Some of the more simplistic growth models sweep under the rug some serious non-trivial scaling, reliability and bias issues. The approved models for Tennessee, Ohio and Pennsylvania represent a major step in eliminating some of these problems.
This paper only speaks to Tennessee, and so we will leave the issue of other states aside.

But, as the paper shows, and as demonstrated here, the Tennessee growth model is not necessarily more reliable, accurate, or valid than those of other states using other growth models or the statutory “status” or “safe harbor” models. All represent tradeoffs.

While eliminating some problems, the Tennessee model creates others. For now, each state can reach its own conclusions about the relative strengths and weaknesses, and it was the hope that the “AWTY?” paper, and this discussion, will help better inform those decisions.

I do not, however, think transparency is an issue to be taken lightly. Real accountability only takes place when all participants in the education system—including parents, advocates, and teachers—can make informed choices.

I talked to a reporter from Texas this week (which is implementing an adapted form of the Sanders model, with at least a couple of key improvements per points raised here) who recalled her school days of independent reading assignments through the “SRA” method.

For those of you who do not remember, SRA was a box of large (roughly 8 x 11) cards, with readings and structured questions. The box progressed in difficulty from front to back (easiest to most difficult) with color-codings for varying levels of difficulty.

What the color-coding did was make understandable where you were—for yourself and the teacher—in progressing through a set of skills. The reporter pointed out that with the traditional method you would know, for example, that if you were at say red (the lowest) rather than violet (the highest) you knew you were farther back than you wanted to be by a certain time. Depending on the assigned color of where you were at (say red or orange) you also knew where you were relative to the end goal.

She then pointed out that with the Tennessee growth model method, we never know what the target color (or level of difficulty)—i.e., the interim “projected” score for a student by the end of the school year—is. It could be any color of the rainbow from red (below basic) to violet (proficient), and all we would know is that it was somewhere in between.

I think that all players in the educational policy and practice arena—educators, consumers, parents, advocates, and taxpayers—want, just as this reporter does, something a little more like the color-coded SRA system.(1) That is, they would like quite a bit more clarity than “trust us, your child is on a projected path to proficiency” within Y years (which, as we see here, is really Y + unknown # of years) according to the following formula:

Projected Score = MY + b1(X1 – M1) + b2(X2 – M2) + ... = MY + xiT b (2)
And, as much as I love statistics, I would assert that given that these individuals—educators, consumers, parents, advocates, and taxpayers—are the primary sponsors and intended beneficiaries of the educational system, we owe it to them to strive, as much as humanly possible, to meet their needs and expectations toward the goal of a better education for each and every child.


***

(1)SRA is now the “Open Court” reading system. Its citing here by the author does not represent or imply any appraisal or endorsement.

(2)Where MY, M1, etc. are estimated mean scores for the response variable (Y) and the predictor variables (Xs).