Scaffold mismatch is a quota bug, not a syntax bug
Every existing playbook on this topic treats scaffold mismatch as a prompting problem: add a skill, install a scaffolding MCP, write a rules file. Those all help at the top of the funnel. None of them tell you what the mismatch already cost you this week. That number exists on exactly one field, seven_day_opus.utilization, and if you are not reading it, you are flying blind into Friday's 429.
What every other guide on this gets right, and what they miss
The existing playbooks on Claude Code scaffold mismatch are almost all about prevention. Use a scaffolding skill. Install an MCP server that hands Claude a tree of valid file paths. Write a CLAUDE.md that enumerates your conventions. These work. I use them, this website uses them, and the AGENTS.md sitting at the root of the repo that serves this page is exactly that kind of fix.
What none of them do is tell you what last week's mismatches already cost. When Claude Code generates a Next.js 15 era unstable_cache import against a Next.js 16.2.4 codebase, fails, gets the error back, retries twice more before shipping, the only place the full cost exists is as a delta on seven_day_opus.utilization on Anthropic's server. Local-log tools see the tokens their client wrote; they cannot see the 4.7 retokenization expansion, they cannot see hidden adaptive-thinking tokens, and they have no way to group a 3-turn retry loop as one unit of cost.
The anchor: the five-line guard in this repo
This is the actual AGENTS.md sitting at /Users/matthewdi/claude-meter-website/AGENTS.md. It was added after a Claude Code session regenerated Next.js 15 era routing against a 16.2.4 repo three times in a row.
Five lines, one job: redirect Claude from "generate confidently" to "read the docs first." It cuts the first-pass mismatch rate sharply. What it does not do, and cannot do, is recover the Opus quota the pre-guard sessions spent. That quota is already gone, and it was gone without anyone in the log pipeline noticing.
The version stack where mismatch actually fires
Verbatim from claude-meter-website/package.json. Every one of these shipped breaking changes after Claude's training snapshot.
Where a retry loop hides its cost
Four stages. Each one compounds. The sum lands on one server-side float, and local-log tools see none of it.
Retry 1: wrong import path
Claude generates code importing from a package surface that moved. Build fails with a clear 'not exported from' error. You paste it back. Claude retokenizes the error text plus the original prompt plus the full file it was editing, under the 4.7 tokenizer.
Retry 2: API signature drift
Claude picks the new import correctly but calls it with the old signature. The runtime error is subtler; you spend a few turns narrowing it. Each turn is context you have already been billed for, re-billed.
Retry 3: hidden reasoning spike
By the third attempt, adaptive thinking is spending more output tokens reasoning about why 'the obvious thing' does not work. These tokens are omitted from the display, so ccusage cannot see them. seven_day_opus sees them.
Retry 4: fixed, but at what cost
The feature ships. Your PR looks normal in git. Your local JSONL shows a reasonable token count. Your seven_day_opus.utilization float jumped several percentage points more than the feature justified. Only ClaudeMeter preserves that number.
The float where the cost lives
ClaudeMeter reads one struct off /api/organizations/{org}/usage. The field scaffold-mismatch retries empty faster than anything else is right here:
seven_day_opus is an Option<Window> where Window is just { utilization: f64, resets_at: Option<DateTime> }. A 0.0 means your Opus weekly bucket is untouched; a 1.0 means the next Opus call 429s. Every scaffold-mismatch retry moves this number up by more than the equivalent first-shot success would have.
Five stages a mismatched prompt passes through
Stage 1 is the last thing ccusage can see. Stages 2 and 3 are the cost of scaffold mismatch; stage 4 is where the bill lands; stage 5 is why you cannot debug the problem from a local log.
From 'that will not compile' to seven_day_opus
1. Prompt goes out with pre-4.7 tokenizer estimate
Claude Code writes its own token estimate into ~/.claude/projects/<repo>/<session>.jsonl before the request is even sent. That number is computed locally, using the client's tokenizer. It does not reflect what Anthropic's backend actually counted.
2. Server retokenizes under 4.7 rules
Anthropic's backend re-tokenizes the payload (prompt + file contents + previous assistant output + error text) under the 4.7 tokenizer. The documented expansion is up to 1.35x. On a scaffold-mismatch retry, the 'file contents + error text' portion can be most of the payload.
3. Adaptive thinking fires on 'why did that fail'
4.7 defaults to adaptive thinking. Build errors are exactly the kind of input that makes it think harder. Those thinking tokens count as output, are billed against seven_day_opus, and are hidden from the terminal. Your JSONL does not capture them. ccusage does not see them.
4. Server updates utilization
The seven_day_opus.utilization float is incremented with the full post-retokenization, post-thinking count. This is the number the 429 gate reads against. It is also the only number ClaudeMeter's extension fetches on its next poll.
5. Client-side log keeps showing the pre-expansion count
~/.claude/projects/<repo>/*.jsonl is frozen at step 1. Any tool that reads it (ccusage, Claude-Code-Usage-Monitor, custom parsers) is reporting a number that was true one millisecond before the 4.7 tokenizer and adaptive thinking touched it. The scaffold-mismatch cost is the gap between that number and the seven_day_opus delta.
What ccusage sees vs what the server billed
The single clearest demonstration of the gap: the local JSONL summary and the seven_day_opus delta disagree, and the delta is always higher on a retry loop.
Same 3-retry loop, two different numbers
ccusage sums the prompt_tokens + completion_tokens fields Claude Code wrote into ~/.claude/projects/<repo>/<session>.jsonl. For a 3-retry scaffold-mismatch loop it reports roughly what you'd expect: the original prompt plus the three error messages plus three fix attempts. Call it 42,000 tokens. This is the pre-expansion, pre-thinking, pre-retokenization count. It is correct for the question 'how many tokens did my client send?' It is wrong for the question 'how much of my Opus weekly did this cost?'
- Client-side token estimate
- No tokenizer expansion applied
- No adaptive-thinking output counted
- Identical whether you are on 4.6 or 4.7
Three inputs, one bucket, one float
The three costs that stack on a mismatch retry all empty into the same Opus-only weekly float.
What flows into seven_day_opus during a scaffold-mismatch retry
Measure your own mismatch cost in one loop
The experiment is a three-curl affair: one before, one during (optional), one after. Do not trust a handwave number anyone on the internet gives you, including this page. Read your own account.
On a session where Claude Code produced compiling code on the first pass, the same experiment yields a much smaller delta. The gap between those two deltas is the marginal cost of the scaffold mismatch, isolated from the feature itself. That is the number worth budgeting against.
The honest limits of prevention
An AGENTS.md reduces the first-pass mismatch rate. It does not eliminate it. The subtler failures (a function renamed silently between minor versions, a Tailwind 4 class that stopped working at runtime instead of build time, a Server Action whose signature changed) still sneak through. When they do, the cost of the retry is the same as if there were no guard at all. ClaudeMeter is the backstop: it measures what the guard did not catch.
Think of it as two layers. Prompting fixes (AGENTS.md, scaffolding MCPs, rules files) reduce the rate. Quota reading (ClaudeMeter) measures the residual. Neither replaces the other. Shipping just one means you are either blind to the cost or paying it with no upper bound.
Numbers on this repo specifically
Counts read from the claude-meter-website repo and the claude-meter client source. No invented benchmarks.
A playbook that actually covers both ends
Prevention plus measurement, in five moves
- Ship an AGENTS.md (or CLAUDE.md) at the repo root that states the framework version and any breaking changes it introduced since the training cutoff. Keep it to five lines or fewer; Claude Code reads it on every invocation and long guard files dilute.
- Point the agent at the actual docs shipped with the package, not the public docs site. node_modules/next/dist/docs/ or node_modules/react/docs/ is the ground truth for your installed version.
- Pin versions in package.json. A caret range is enough for Claude to pick a scaffold from a version that no longer matches what npm installed.
- Before a big agentic session, check ClaudeMeter's seven_day_opus row. If it is already above 70 percent, a mismatch loop will 429 you mid-session. Better to switch to Sonnet for the edit pass and save Opus for planning.
- After the session, poll the quota delta. If the delta looks out of proportion to the feature, the extra cost was scaffold-mismatch retries, and that is the signal to widen your AGENTS.md or add a package-specific rules file.
What each tool can and cannot see
Local-log tools read ~/.claude/projects/*.jsonl. ClaudeMeter reads what Anthropic actually enforces against.
| Feature | ccusage / local-log tools | ClaudeMeter |
|---|---|---|
| Sees the full cost of a retry loop | No, counts visible tokens only | Yes, reads the server's utilization delta |
| Includes adaptive-thinking tokens on retries | No, thinking is omitted from the JSONL | Yes, already counted in seven_day_opus |
| Groups multi-turn mismatch retries as one cost | No, splits them per-message | Yes, diff any two snapshots around the session |
| Distinguishes Opus retries from Sonnet retries | No, single token count per session | Yes, seven_day_opus vs seven_day_sonnet are separate floats |
| Flags when you are about to 429 mid-retry | No, no awareness of the server ceiling | Yes, the menu-bar bar turns red past 0.9 |
| Machine-readable from a shell script | Yes, but from a stale source | Yes, via 127.0.0.1:63762/snapshots |
| Runs without a claude.ai cookie-paste step | N/A, different data path | Yes, the browser extension reuses your logged-in session |
Watch the bucket as you code
ClaudeMeter runs in your macOS menu bar, polls /api/organizations/{org}/usage every 60 seconds, and serves the combined snapshot at 127.0.0.1:63762/snapshots. Free, MIT, no keychain prompt with the browser extension.
Frequently asked questions
What exactly is a Claude Code scaffold mismatch?
It is when Claude Code generates code based on the framework conventions in its training data, and the actual project uses different conventions for the same framework. The most common flavor right now is Next.js 16 + React 19, which ship breaking changes to routing, server components, caching semantics, and the public import surface compared to what the training snapshot knew. Claude writes something that would have compiled a year ago, the build fails, you paste the error back, Claude tries again. Every step of that loop is billable against your plan. The ClaudeMeter marketing site (the one you are reading) literally has a five-line AGENTS.md at /Users/matthewdi/claude-meter-website/AGENTS.md that says 'This is NOT the Next.js you know' as the first-line defense against this exact behavior.
Why can't ccusage or Claude-Code-Usage-Monitor measure mismatch cost?
Both tools read ~/.claude/projects/<repo>/*.jsonl and sum what the client wrote. That counts tokens, but it misses three things that make scaffold-mismatch retries expensive: the 4.7 tokenizer expansion (applied server-side, after the JSONL is frozen), the hidden adaptive-thinking tokens Claude Code omits from the display, and the inability to group multi-turn retries under a single 'feature attempt.' The cost of a scaffold mismatch is the sum of every retry's hidden reasoning plus the retokenized input, and that sum only exists as one float on Anthropic's server: usage.seven_day_opus.utilization. ClaudeMeter reads it.
Where does the scaffold-mismatch cost actually land in the ClaudeMeter source?
It lands on seven_day_opus, the Opus-only weekly window. That field lives at /Users/matthewdi/claude-meter/src/models.rs line 23, declared as Option<Window>. The Window struct above it (line 4 to 7) has a utilization: f64 and a resets_at: Option<DateTime<Utc>>. The extension POSTs that struct to http://127.0.0.1:63762/snapshots on every 60-second tick (extension/background.js line 2 sets BRIDGE, line 3 sets POLL_MINUTES = 1). Every time a scaffold mismatch triggers a retry, the float on line 23 goes up.
What is the five-line AGENTS.md in the claude-meter-website repo and why does it matter?
It is a guard file the author added after watching Claude Code spend Opus quota regenerating Next.js 15 era code against a Next.js 16.2.4 codebase. It sits at the repo root and contains nothing but: 'This is NOT the Next.js you know. This version has breaking changes: APIs, conventions, and file structure may all differ from your training data. Read the relevant guide in node_modules/next/dist/docs/ before writing any code. Heed deprecation notices.' Claude Code picks it up automatically and changes its behavior from 'generate confidently' to 'read the docs first.' It is a free, local fix. What it does NOT fix is the quota damage from the sessions before you wrote it; that is on seven_day_opus forever until the weekly window resets.
How much Opus quota does a single scaffold-mismatch retry actually cost?
There is no universal number because the cost is content-dependent, but you can measure your own. With ClaudeMeter installed, run 'curl -s http://127.0.0.1:63762/snapshots | jq .[0].usage.seven_day_opus.utilization' before and after a failing generate-build-fix-build loop, and the delta between those two floats is the weekly percentage you spent on that one mismatch. In practice, the retries are disproportionately expensive compared to the successful first-shot version of the same task, because: (1) the error feedback you paste back is input tokens, retokenized; (2) 4.7 tends to think harder when it sees build errors, generating more hidden reasoning; (3) each retry repeats context that had already been billed once.
What frameworks does Claude Code's scaffold most often mismatch on right now?
Anything that shipped breaking changes after the training cutoff. In this repo the two that bite are Next.js 16 (the async API migration, turbopack build semantics, and the new file-based routing for parallel routes) and React 19 (the new use() hook, the removed forwardRef requirement, the deprecated useContext signature). The same pattern applies to Tailwind 4 (dropping tailwind.config.js for a CSS-first @theme directive), Remix 3, and anything in the TanStack ecosystem that rewrote its public API between versions. The package.json of this website shows the exact versions at risk: next 16.2.4, react 19.2.4, tailwindcss ^4.
Does the AGENTS.md in this repo actually prevent every scaffold mismatch?
No, and it is not supposed to. It bends Claude's behavior toward reading docs before writing code, which catches the obvious cases (wrong import path, wrong file location, wrong API signature). It does not catch subtle semantic mismatches: a function that exists with the same name but different behavior between Next.js 15 and 16, or a Tailwind utility class that was renamed silently. Those you only discover at build or runtime, after the tokens are spent. The honest stance is: AGENTS.md reduces the retry count, ClaudeMeter tells you what the remaining retries cost.
Can I see the quota hit in real time while Claude Code is retrying?
Yes. The extension fetches /api/organizations/{org}/usage every minute. Open the ClaudeMeter menu-bar popup and keep it visible while you're running a retry-heavy session; the '7d Opus' row updates live as the server-side float climbs. For scripted tracking, run a bash loop: 'while true; do curl -s http://127.0.0.1:63762/snapshots | jq -c "[now, .[0].usage.seven_day_opus.utilization]"; sleep 60; done'. That gives you a time-series of the quota drain; every inflection point lines up with a retry.
Is the scaffold mismatch problem specific to Claude Code, or does it hit Cursor and others too?
It hits every LLM coding agent whose training data is frozen before the frameworks it is writing for. The reason Claude Code feels especially painful on this is the combination of (1) plan-based quota rather than per-request billing, so cost is invisible until you hit 429, and (2) adaptive thinking on 4.7, which generates substantial hidden output on exactly the kind of 'why did this fail' reasoning a scaffold mismatch triggers. Cursor running on Claude hits the same quota. The quota read in ClaudeMeter is account-wide; it does not care which client spent the tokens.
Does moving to Sonnet instead of Opus fix scaffold-mismatch cost?
It moves the cost to a different float. Sonnet retries bill against seven_day_sonnet and the shared five_hour window, both visible in the same /usage payload (seven_day_sonnet lives at models.rs line 22, five_hour at line 20). seven_day_opus is left untouched. So if your week's Opus bucket is already 80 percent spent and you hit a scaffold mismatch, switching the retry loop to Sonnet is the right move. You still pay quota, just in a bucket that is less likely to 429 you on your next planning step. ClaudeMeter surfaces all four floats side by side so you can pick the cheaper bucket to burn.
Keep reading
The 4.7 regression that shows up in your quota
Most writeups focus on long-context recall and BrowseComp. The regression that ends Claude Code sessions early is the quota one, and it hides in the same seven_day_opus float.
Claude Code cost per PR
On a subscription, a PR does not cost dollars; it costs a fraction of a weekly bucket. Same endpoint, same method, different framing.
ClaudeMeter vs ccusage
Side by side on what each tool can and cannot see. The short version: JSONL is frozen pre-server; /usage is the only post-server truth.
Got a Claude Code session where the quota math does not match the work?
If seven_day_opus jumped a lot more than the feature justified, it was probably scaffold mismatch. Send a snapshot and a diff, happy to look. 15 minutes.