We Taught Students to Read. We Still Can’t Tell if They Understand.

Written by Mary Schreuder, PhD | May 1, 2026 3:31:29 PM

The Science of Reading movement is winning the decoding war and exposing something far more uncomfortable underneath: our comprehension assessments are decades behind, and the gap is only widening.

Across the country, states are passing reading legislation, districts are replacing curricula, and teachers are doing the hard work of restructuring foundational literacy instruction around the science of reading. It is working, at least for decoding.

But as we get better at measuring whether students can unlock words, a quieter crisis is emerging. We still cannot tell teachers why a student is struggling to comprehend. We can only tell them whether or not they are.

That distinction — why versus whether — is the comprehension chasm. And it shows up in the classroom everyday.

Decoding was never the finish line

The progress in foundational skills is real and worth naming. More than 40 states have passed science of reading legislation. Assessment vendors have built sharper, faster screeners for phonological awareness, decoding, and fluency. Our collective urgency to understand and the ability to know whether a student can read words is at an all-time high.

But Scarborough's Reading Rope describes two strands, not one. Word recognition is one half of skilled reading. Language comprehension, which includes vocabulary, background knowledge, syntax structures, inference, and literacy knowledge, is the other. And the field has directed much of its energy and its assessment infrastructure toward the first strand, while the second has remained largely unmeasured and underexplored.

Our ability to measure whether students are learning to read is drastically outpacing our ability to see whether they are learning to understand.

The consequences are already visible in the 2024 NAEP results, though not in the way most commentary suggested. The coverage has focused almost entirely on phonics and decoding. That framing is understandable but incomplete. The NAEP is not a decoding assessment. By 4th grade, it assumes students can decode. What it actually measures is whether students are making meaning from text. When roughly only one in three 4th graders scored at or above proficiency, and when students in the middle and lower ranges declined the most across 46 states, that data was telling us something about language comprehension. We lacked the tools to hear it.

WHAT CURRENT ASSESSMENTS TELL TEACHERS	WHAT TEACHERS ACTUALLY NEED TO KNOW
This student scored "Below Basic" in reading	Is the breakdown in vocabulary, background knowledge, inference, syntax processing, etc.?
This student scored at the 35th percentile	Is this student showing surface-level comprehension but struggling to build a complete mental model of text? (SRI, 2025)
Student is "approaching proficiency" on RI.4.7	Is this student recognizing text features (surface) or understanding how those features build toward larger meaning? (SRI, 2025)
Assessment shows weakness in "main idea"	State ELA assessments cannot reliably measure separate comprehension standards in isolation; the data may be driving the wrong instructional response. (ANet, SRI)

The Toll On Students

Picture a typical 8th grade classroom. A few students in the room are still struggling with decoding, working hard to see a word and understand what it means. A few students are at genuine proficiency, moving through words and text with ease. In the current conversation about literacy and in the assessments available, this is all you’re able to see. But that is not everything that is happening in this classroom. The comprehension chasm sits with the majority of students in the middle.

These students are fluent readers who move through text at pace and with expression, but who cannot analyze a complex argument, synthesize ideas across sources, or write meaningfully about what they read. While their word recognition is strong, their comprehension sits on the surface. Fluency got them this far. It will not carry them further.

These students are not who the current literacy conversation is built around. Yet, they are, in most schools, the majority. This is the toll of the comprehension chasm.

We are at risk of producing a generation of students who can read but still struggle to make sense of what they read. And we will not see it coming, because our assessments are not built to show us the difference, making the chasm wider.

We Have the Curriculum, but Teaching Is Still Skimming the Surface

In November 2025, SRI Education studied 111 comprehension lessons across 24 schools in four vanguard districts, each using top-rated, knowledge-rich curricula for at least five years, with engaged teachers and engaged students. And yet, 67% of lessons reached only surface-level understanding. Just 24% produced a robust, deep, inferential, meaning-building understanding of text.

Here is what that finding means: the curriculum wasn't the problem. The missing variable was the ability to see inside the learning, to distinguish a student completing a task from a student building genuine understanding. That is precisely what a high-quality instructional assessment is designed to do. And for comprehension, we don't have it.

Standard observation tools didn't catch the gap. Walkthroughs and fidelity checklists captured participation, posted standards, and visible curriculum use. They missed whether students were actually building meaning. If two-thirds of our best classrooms are operating at the surface and our assessments can't tell us that, we are flying blind.

Why comprehension is so difficult to assess well

Language comprehension is not a single skill. It is the product of vocabulary, background knowledge, syntax processing, working memory, inferencing, and more, all operating simultaneously on a given text. In the classroom, this shows up in subtle ways. A student may read accurately and even answer some questions correctly while relying on partial understanding — guessing, pulling isolated details, missing the connections that hold an argument together. Another may struggle to make sense of a sentence because of unfamiliar vocabulary, even though they can decode every word correctly.

Yet comprehension assessments do not disentangle these threads. They return a score such as "below grade level" or "approaching proficiency," leaving the teacher to speculate on the root cause. Districts using data to improve comprehension instruction often inadvertently reinforce the problem: teachers unpack standards, target narrow skills, and miss the larger meaning-building work. The SRI research reinforces this directly: even when teachers were using the same HQIM lesson in the same building, some facilitated compliance and others facilitated meaning-making. The curriculum itself was not the determining variable. The depth of the teaching was.

Worse, when an assessment can't distinguish between a student who recognizes a text feature and one who understands how it builds toward larger meaning, it doesn't just fail to inform instruction. It actively misdirects it, quietly reinforcing the surface-level pattern the SRI data exposed.

An assessment that doesn't reveal a student's thinking isn't functioning as an instructional tool. It's functioning as a compliance report.

HQIA for Language Comprehension

When we began applying HQIA criteria to the comprehension assessment landscape, we expected to find gaps. We did not expect to find a chasm. For foundational skills, the field has largely cleared all three bars. For comprehension, the honest answer is: no, largely absent, and score only.

In our work with districts, the consequence is consistent. Schools running data cycles on interim results, doing everything right by conventional standards, often discover that students have been performing the tasks the curriculum asked of them without building the mental models it was designed to produce. The shift that changes outcomes isn't new materials. It's a new diagnostic lens: moving from did the student complete the task to how is this student processing meaning, and where is it breaking down?

That is the question a high-quality comprehension assessment must be built to answer.

The question the field needs to ask

This is not a problem on the horizon. Foundational literacy programs are scaling, decoding scores are starting to move, and leaders across the country are staring at flat comprehension data wondering what isn't working. The field missed something fundamental: we spent a decade failing to improve comprehension measurement, and districts are now absorbing the cost of that gap.

The next frontier for literacy assessment is not another screener that tells us a student is reading below grade level. It is the diagnostic architecture that tells a teacher what is actually breaking down — whether the struggle is rooted in vocabulary, background knowledge, weak inferencing, or syntax processing — and what to do about it within the curriculum already in use. The kind of specificity that tells a teacher: this student has strong decoding and adequate fluency but fragile domain vocabulary and surface use of text evidence, and here is what to do about it in tomorrow's lesson.

The field set a standard for foundational skills assessment and we held vendors to it. We can do the same for comprehension. The bar is not complicated: does this assessment tell a teacher something they couldn't have guessed on their own? Can it tell me why a student missed a question, not just that they did? We have spent a decade building sophisticated systems to determine whether students can read; it is time we build the systems to know whether that reading is actually working.

Being honest about the fact that we haven’t started this work isn’t a failure; it’s an opportunity. To close the comprehension chasm, we must work together to raise the bar on our standard and quality of assessments. We need the tools that give teachers instructionally useful insights to change the student trajectories today. We have the curricula and we have the teachers. Let’s build the tool to help both.

REFERENCED RESEARCH & FRAMEWORKS

NAEP

2024 NAEP Reading Assessment Results

National Center for Education Statistics (NCES). (2025). 2024 NAEP Reading Assessment: Results at Grades 4 and 8 for the Nation, States, and Districts. Available at The Nation's Report Card Grades 4 & 8.

ANET

The Missing Link & HQIA Evaluation Checklist

A practical tool for assessing whether district assessments truly drive instructional improvement — across Instructional Utility, Curricular Integrity, and Data-Driven Continuity.

SRI EDUCATION

Beyond the Surface: Leveraging HQIM for Robust Reading Comprehension

Reynolds et al. (November 2025). A study of 111 K–5 comprehension lessons across 4 districts found that 67% resulted in only surface-level understanding, even with mature HQIM implementation.

View full post