Back to Resource Center

We Taught Students to Read. We Still Can’t Tell if They Understand.

 

The Science of Reading movement is winning the decoding war and exposing something far more uncomfortable underneath: our comprehension assessments are decades behind, and the gap is only widening.

Across the country, states are passing reading legislation, districts are replacing curricula, and teachers are doing the hard work of restructuring foundational literacy instruction around the science of reading. It is working, at least for decoding.

But as we get better at measuring whether students can unlock words, a quieter crisis is emerging. We still cannot tell teachers why a student is struggling to comprehend. We can only tell them whether or not they are.

The distance between decoding and skilled reading is the comprehension chasm. It persists because our assessments still struggle to show us why comprehension breaks down or what students need to become skilled readers. And it shows up in the classroom everyday.

Decoding was Never the Finish Line

The progress in foundational skills is real and worth naming. More than 40 states have passed science of reading legislation. Assessment vendors have built sharper, faster screeners for phonological awareness, decoding, and fluency. Our collective ability to know whether a student can read words is at an all-time high.

Blog Graphics for Comprehension Chasm (1080 x 2160 px)But Scarborough's Reading Rope describes two strands, not one. Word recognition is one half of skilled reading. Language comprehension, which includes vocabulary, background knowledge, syntax structures, inference, and literacy knowledge, is the other. And the field has directed much of its energy and its measurement infrastructure toward the first strand, while the second has remained largely unmeasured and underexplored.

Our ability to measure whether students are learning to read is drastically outpacing our ability to see whether they are reading to understand.

The consequences are already visible in the 2024 NAEP results, though not in the way most commentary suggested. The coverage has focused almost entirely on phonics and decoding. That framing is understandable but incomplete. The NAEP is not a decoding assessment. By 4th grade, it assumes students can decode. What it actually measures is whether students are making meaning from text. When only one in three 4th graders scored at or above proficiency, and when students in the middle and lower ranges declined the most across 46 states, that data was telling us something about language comprehension, but we lacked the tools to know it. 

The Students Nobody Is Talking About

PCR16828 copyPicture a typical 8th grade classroom. A handful of students are still working to decode, building the foundational skills that the science of reading movement has rightly made a priority. A few others have arrived at genuine reading proficiency, moving through complex texts with fluency and apparent ease. These are the students our current literacy conversation is designed to see. They show up in our assessments, our data meetings, and our intervention plans.

But they are not the whole room. 

Most of the students sitting there are fluent readers.  They find the answer, complete the task, turn in the work. Nothing about their reading skills triggers concern. And yet many of them cannot follow a complex argument to its conclusion, weigh competing interpretations, synthesize ideas across multiple sources, or articulate in writing what a text actually means. 

They have learned, because school has reliably taught them, that getting through a text is the same as understanding it.

These students are not struggling readers by any measure we currently use. They are something hard to name and even harder to catch: readers whose fluency got them this far, but it will not carry them further. 

In most classrooms, they are the majority. They move through school largely undetected, not as struggling readers, but as invisible ones.  We are at risk of producing a generation of students who can “read” but still struggle to make sense of what they read. And we will not see it coming, because our assessments are not built to show us the difference.

We Have the Curriculum, but Teaching Is Still Skimming the Surface

In November 2025, SRI Education studied 111 comprehension lessons across 24 schools in four vanguard districts, each using top-rated, knowledge-rich curricula for at least five years, with engaged teachers and engaged students. And yet, 67% of lessons reached only surface-level understanding. Just 24% produced a robust, deep, inferential, meaning-building understanding of text.

PANA0998 copy (1)Here is what that finding means: the curriculum wasn't the problem. The missing variable was the ability to see inside the learning, to distinguish a student completing a task from a student building genuine understanding. That is precisely what a high-quality instructional assessment (HQIA) is designed to do. And for comprehension, it simply doesn’t exist. 

Standard observation tools didn't catch the gap. Walkthroughs and fidelity checklists captured participation, posted standards, and visible curriculum use, but they missed whether students were actually building meaning. If two-thirds of our best classrooms are operating at the surface and measurement tools can't tell us that, we are flying blind. The reason why gets to the heart of what makes comprehension so uniquely difficult to measure.

Why Comprehension is So Difficult to Assess Well

Language comprehension is not a single skill. It is the product of vocabulary, background knowledge, syntax processing, working memory, inferencing, and more, all operating simultaneously on a given text. In the classroom, this shows up in subtle ways. A student may read accurately and even answer some questions correctly while relying on partial understanding — guessing, pulling isolated details, missing the connections that hold an argument together. Another may struggle to make sense of a sentence because of unfamiliar vocabulary, even though they can decode every word correctly.

PCR16771 copyYet comprehension assessments do not disentangle these threads. They return a score such as "below grade level" or "approaching proficiency," leaving the teacher to speculate the root cause. Districts using data to improve comprehension instruction often inadvertently reinforce the problem: teachers unpack standards, target narrow skills, and miss the larger meaning-building work. The SRI research reinforces this directly: even when teachers were using the same HQIM lesson in the same building, some facilitated compliance and others facilitated meaning-making. The curriculum itself was not the determining variable. The depth of the teaching was.

Worse, when an assessment can't distinguish between a student who recognizes a text feature and one who understands how it builds toward larger meaning, it doesn't just fail to inform instruction. It actively misdirects it, quietly reinforcing the surface-level pattern the SRI data exposed.

An assessment that doesn't reveal a student's thinking isn't functioning as an instructional tool. It's functioning as a compliance report.

HQIA for Language Comprehension

Last year, ANet published The Missing Link, a white paper introducing a new standard for what assessment should actually do for teachers. We called it High-Quality Instructional Assessment, (HQIA), a framework designed not to report on learning, but to fuel it. HQIA asks three things of any assessment: does it identify why a student answered incorrectly, not just that they did? Does it connect directly to the curriculum the teacher is already using? And does it drive instructional decisions, or just satisfy a compliance requirement?

In math, the impact of this shift was clear. Assessments designed with HQIA principles produced more actionable data and stronger instructional responses. In ELA, results were stronger when HQIA and high-quality instructional materials were used together. But we also surfaced a critical limitation: when it came to reading comprehension, the field lacked assessments with real instructional utility. We named the gap, but at the time, we were still examining its depth.

Comprehension Chasm Blog Graphics (1)

We did expect to find gaps when we began applying HQIA criteria to the comprehension assessment landscape, but we did not expect to find a void. Foundational skills assessments, by and large, meet the bar: they diagnose errors, connect to instruction, and inform teaching. For comprehension, however, the picture is stark. Measured against HQIA, most assessments can be described as offering little insight into why students struggle, limited connection to curriculum, and primarily score reporting.

PCR16840 copyIn our work with districts, the consequences are consistent. Schools running data cycles on interim results, doing everything right by conventional standards, often discover that students are completing assigned curriculum tasks, but they are not consistently building the underlying mental models of text that comprehension instruction is meant to develop.

What changes outcomes is not simply better materials or tighter data cycles. It is a shift in the questions we ask. Instead of asking, Did the student master the standard? we need to ask, How is this student making meaning from text, and where is that process breaking down?

That is the question a high-quality comprehension assessment must be built to answer. 

The Question the Field Needs to Ask

This is not a problem on the horizon. Foundational literacy programs are scaling, decoding scores are starting to move, and leaders across the country are staring at flat comprehension data wondering what isn't working.

The next frontier for literacy assessment is not another screener that tells us a student is reading below grade level. It is the diagnostic architecture that tells a teacher what is actually breaking down: whether the struggle is rooted in vocabulary, background knowledge, weak inferencing, or syntax processing. And what to do about it within the curriculum they already use.

The field set a standard for foundational skills assessment and held vendors to it. We can do the same for comprehension. The bar is not complicated: does this assessment tell a teacher something they couldn't have guessed on their own? Can it tell us why a student missed a question, not just that they did? If not, it is not an instructionally useful assessment. It's a compliance tool.

We have built sophisticated systems to determine whether students can read. We have not built systems that reliably show us whether students are becoming skilled readers. That is the next decade’s work, and it starts with acknowledging how much we still cannot see.


Referenced Research & Frameworks

Get k12 Education Resources

Subscribe to Our Newsletter

Subscribe to our newsletter to join our community and receive monthly selections of actionable resources, stories of best practices from across our national network of partner schools, districts and CMOs, and invitations to exclusive events. We're glad to be learning together alongside you.