High stakes tests aren’t better—and they never will be

About the only thing I still remember from fourth grade is my animal research project. I read all about beavers in the encyclopedia, and checked out a library book about beavers, and then I spent one entire Saturday writing a first-person, you-heard-it-here-first beaver diary, with a year’s worth of dated entries—complaining about my beaver brothers and sisters, boasting about my growing dam-building skills, and agonizing over finding a mate. I remember standing over the bathroom sink at home, feeling very studious indeed, calculating how much water to drip onto the cover of my notebook to achieve an authentically beaver-worn look without compromising readability.

Instead of moving us off the test prep hamster wheel entirely, the Common Core tests caused the pressure to ratchet ever higher.

At wealthy schools across the country, this is pretty much what fourth graders are still doing: writing animal diaries, building dioramas and 3D maps, making posters and pamphlets, staging skits and plays and debates. Because it’s 2020, they are probably also making videos or podcasts or building little robot beavers, tinkering with how much virtual water to drip on their presentation splash pages. If their teachers stay up to date, they likely spend more time working independently or in small groups in class than listening to whole-class lectures and completing homework outside of school. But their central work is very much what mine was: learn about something, and then make something new from what you learned.

These skills are the same ones that middle schoolers use to write reports, that high school and college students use to write papers, and that professionals used to conduct original research. They are, in fact, remarkably similar to the research and writing skills that the Common Core literacy standards articulate. In theory, at least, they are the skills that next-generation standardized tests offered by consortia such as the Partnership for Assessment of Readiness for College and Careers (PARCC) and Smarter Balanced are meant to assess.

But in public schools, including the one where I now teach, there is less time than ever for the beaver diaries. The pressure to “produce results”—to cause our students to get better scores on the annual state tests—gets more intense and more immediate every year. If there isn’t a beaver diary on this year’s test, every day we spend on the beaver diary is a day of work that the test may not directly measure. From the teacher’s perspective, then, the beaver diary becomes a risk and a sacrifice.

In the first flush of enthusiasm for the Common Core standards developed in 2009, forty-five states joined either the PARCC or Smarter Balanced consortia, hoping that more rigorous tests would measure richer, more authentic teaching. That goal is certainly visible in their design. In every grade from third through tenth, for example, the PARCC assessments ask students to complete a “research simulation task” in which they study a set of texts about a social studies or science topic and then write a report about it. Clearly the hope was that kids will spend the year happily learning and creating, and then transfer those skills to the arbitrarily chosen topic of sharks or hurricanes or the Great Barrier Reef when test day comes around.

In public schools, including the one where I now teach, there is less time than ever for genuine learning.

In fact, as Washington, D.C., prepared to implement the PARCC, I wrote optimistically in these pages that perhaps the difficulty of preparing for such a complex assessment would force us off the test preparation hamster wheel entirely. We’d have to simply teach students to read and think and write, and hope for the best.

Instead, the pressure ratcheted ever higher; to prepare for harder tests, we prepped even harder. The new tests proved so difficult and expensive that many states quickly dropped them; in 2020, only fifteen members remain in the Smarter Balanced consortium, and only a few will take the PARCC. The rest continue to tinker with their item banks and testing procedures, hoping that some test, somewhere, will measure what students have really learned, and thus reward and encourage better teaching.

No matter the particulars of the test, though, we will still face the same two problems: the kids and the teachers.

If you really want a kid to write a report about sharks, you’re going to want to start by telling her how excited you are that it’s finally the special day when we start the animal project. You’re going to need to show her some example projects from last year and give her a list of project choices. You’ll answer at least twenty questions about what kinds of animals are allowed, what kinds of reports are allowed, what colors she’s allowed to write in, and whether glitter counts as a color. You should probably set aside plenty of time for whispered conferences with friends about which animals they’re choosing. That’s all as it should be. It’s developmentally appropriate; it’s how fourth graders need to work.

But a child who can excel in that connected, collaborative classroom project may be entirely unable to reproduce those results on a decontextualized research task to be completed silently and independently, without help or discussion. From the kid’s perspective, the test experience is completely different. Aside from research and writing skills, the test requires a host of ancillary activities—from scrolling, highlighting, and manipulating text on the computer to maintaining silent focus for hours and analyzing a written assignment prompt—that are far beyond the scope of the fourth-grade curriculum. In fact they’re developmentally inappropriate for the vast majority of fourth graders.

The new tests proved so difficult and expensive that many states quickly dropped them. In 2020, only fifteen states will take Smarter Balanced, and only the District of Columbia will take the PARCC.

That brings us to the teacher’s side of the problem. Say the test asks students to do things that they can’t comfortably do and shouldn’t really need to master. I know it; you know it; my school knows it. Nevertheless, I absolutely must do everything in my power to ensure that by the end of this year, my little tribe of fourth graders can and does excel, in the sterility of the test environment. Otherwise my school and I will be perceived as having failed.

That means we’re going to have to spend our precious class time not just learning to research and write but specifically practicing how to apply these skills to unfamiliar, disconnected content in order to produce a formulaic, rubric-scorable five-paragraph essay. Here’s the absurdity in a nutshell: in order to prepare for the research simulation task, we’ve got to cut some of the real research work to make way for practicing the research simulation. I’ve got to plan experiences that are less developmentally appropriate, less connected to the bigger picture, less interactive, less collaborative, less creative, and less fun. Just to make sure we’re ready, when test day comes.

The test itself takes about two hours on each of about eight school days—sixteen school hours in total, so in that sense no worse than a couple of snow days. But testing leaves kids exhausted both mentally and emotionally; on the afternoon of a testing day, it’s foolish to plan anything more taxing than outside play. Figure the same routine for another two or three days of benchmark testing every quarter; then add in the professional days set aside for teachers to analyze the test data, and a couple of days of review and reassurance before test day. All together we’ve lost more than twenty instructional days to the test: an entire month of school days, for every public school child, in every grade from third through twelfth.

A child who can excel in a connected, collaborative classroom project may not be able to reproduce those results silently and independently, without help or discussion.

Add to that the curricular pressure, beginning on the first day of school but building steadily into the spring semester, to skip the beaver diary or the class play in favor of yet another test-aligned practice activity. Each year’s teacher has to worry about this year’s score; each year’s teacher makes her own compromises. By the time my students reach eleventh grade—by the time they’ve been through the seasons of the test-preparation cycle every year for eight years of their educational lives—how much total learning time will they have lost?

The vision driving the No Child Left Behind Act was to quantify and analyze the achievement gap in order to work toward closing it. But its most obvious outcome, from my perspective, has been to open a horrifying new chasm.

Students who are extremely educationally privileged, and who read and write well ahead of grade level, are more likely to find that their skills transfer straightforwardly to the test. This means that schools with a relatively homogeneous and wealthy population can sometimes afford to skimp on test preparation. And despite the many purported benefits of “data-driven instruction,” I’ve yet to hear of a single private school that’s chosen to adopt a regimen of frequent testing and data analysis for its students. The children of the wealthy are largely excused from this madness.

Meanwhile, the schools that rely on public funding have no choice but to pretend that what these tests measure is what we’re here to teach. When test scores falter, some schools toss out arts, science, sports, social studies, recess, and actual reading and writing in favor of increasingly frantic and impoverished test-prep programming. My own school fights hard to stay committed to pleasure, choice, authenticity, and creativity in the classroom, but when springtime comes to the “tested grades,” we can’t afford to be unprepared.

The same thing would happen if legislators decided to tie school ratings and funding to student performance on the Presidential Physical Fitness Test, that annual rota of jumps, sprints, pull-ups, and the “shuttle run.” Privileged kids would keep right on playing freeze tag and basketball; less-sanguine schools would pare down their PE curriculum to focus on daily shuttle-run drill. Their shuttle-run scores would improve, no doubt; but at what cost?

Schools whose students tend to perform poorly on these tests have responded by eviscerating their curricula in favor of even more test prep.

Or imagine telling high schools they’d be measured by their students’ performance on tests like the ACT or the SAT: instead of studying U.S. history, juniors would practice skimming history-themed passages fast enough to answer standardized test questions. Depressingly, some cities and states have already made this move; schools whose students tend to perform poorly on these tests have responded by eviscerating their curricula in favor of even more test prep.

In the most “data-driven” schools, teachers face increasing pressure to break down big learning goals into tiny, discrete steps whose progress can be measured weekly or daily. That approach works fine for teaching step-by-step material, such as long division or comma usage, but research and common sense agree that literacy simply can’t be learned that way. Checking daily or weekly whether students have made progress on artificial sub-steps like “making inferences” or “selecting evidence” does nothing for their learning but layer it with jargon, judgment, and stress.

It hurts to admit: fourth grade as I teach it is less fun than the fourth grade class I was in. We have access to extraordinary technology and travel, rich resources and diverse stories, but we’re pressed and we’re anxious, and that anxiety shapes every minute of the student experience. Administrators who are worried about test scores produce teachers who are worried about their quarterly benchmark scores, who in turn produce children who are worried about their daily “exit ticket” scores.

I still believe, as I argued in 2014, that parents and teachers deserve to know how their students are performing. But the cost of our current testing regime is too high. It is cumulative, and it is regressive. The students who most need a rich and connected school experience are most likely to see it sacrificed to test preparation.

I still believe that parents and teachers deserve to know how their students are performing. But the cost of our current testing regime is too high.

There are better ways to evaluate student learning. Literacy progress, in particular, is best measured through nuanced, one-on-one assessments such as the Fountas & Pinnell reading level system or the Columbia Teachers’ College writing rubrics. These measures take time and experience to administer, in part because they yield detailed information about each child’s strengths and learning needs, rather than a single-digit score. Yet they are still standardized enough to allow comparison among students and against grade-level norms.

Even sensitive tools such as these can become bludgeons, however, if we use them to rank teachers or rate schools, rather than to guide our instruction. Proponents of testing argue that it allows communities to hold schools accountable for their work, but we must not reduce the rich community-building work of our schools to the production of narrowly defined results. Instead, we can examine multiple measures—including classroom observation, school climate, attendance, student and family satisfaction, student health and wellbeing, persistence in subsequent schooling, and even the efficient use of data—to identify and support the students who need more help.

We must disconnect the hard tests from the high stakes, and we must do so immediately. Otherwise, in the guise of measuring the efficacy of our educational system, we’ll be steadily dismantling it.