The Claude Code 4.7 regression that shows up in your quota, not your benchmarks

The writeups that exist for this topic cover the regressions you can read off a leaderboard: long-context recall above 100K tokens, a 4.4 point drop on BrowseComp, higher latency than 4.6, flatter creative output. All real. None of them cover the regression that makes a Claude Code session die on Tuesday afternoon instead of Friday morning: the same workload empties your seven_day_opus bucket faster on 4.7, and your local token log cannot see it.

Install ClaudeMeter, free

Matthew Diakonov, Written with AI

Published April 23, 20269 min read

4.9from Traced from the open-source ClaudeMeter client

Field located at src/models.rs line 23

Polled in extension/background.js line 24

Observable with one curl against 127.0.0.1:63762

The 4.7 regression nobody logs

Quota drain, not benchmark drop

Local JSONL is frozen before the 4.7 tokenizer fires

Adaptive thinking tokens are hidden from the display

Both land in seven_day_opus server-side

The only place the truth lives is /api/organizations/{org}/usage

ClaudeMeter serves it at 127.0.0.1:63762/snapshots

0:00 / 0:05

Why the benchmark posts miss this one

Every guide currently circulating on 4.7 regressions is a benchmark roundup. They cite Anthropic's own admission that long-context recall regressed above roughly 100K tokens, a 4.4 point drop on BrowseComp (83.7 to 79.3), slower completions, and reduced stylistic range on creative tasks. Those are real, reproducible losses that show up in eval suites.

The regression this page is about does not show up in any eval suite. It shows up on claude.ai/settings/usage. The 4.7 tokenizer expands input text by up to roughly 35 percent. Adaptive thinking generates more hidden output tokens than 4.6 did. Both of those costs are counted against your Opus weekly quota, at Anthropic's backend, after your Claude Code client has already written its local token log and moved on. The client has no idea.

Result: a session that finished Friday on 4.6 can finish Tuesday on 4.7 for the same set of prompts. The only number that tells you this is happening is the one Anthropic writes to usage.seven_day_opus.utilization.

Where 4.7 adds tokens your JSONL does not record

Four places. Every one of them lands on seven_day_opus.utilization. None of them land in ~/.claude/projects/*.jsonl.

Tokenizer expansion (1.0x to 1.35x)

Anthropic's 4.7 documentation describes a new tokenizer that uses up to roughly 35 percent more tokens for the same input text. The expansion is applied server-side, so your local token count underreports the real cost by that factor before anything else is counted.

Adaptive thinking, omitted from display

4.7 defaults to adaptive thinking. Claude Code hides those tokens from the user (display: omitted), which also means they never land in ~/.claude/projects/*.jsonl, but they absolutely land in seven_day_opus.utilization.

Longer average outputs on refactor-style prompts

Even without thinking, 4.7's completion length on the same prompt tends to exceed 4.6's. The extra output is metered against seven_day_opus the moment the response streams.

No per-request breakdown in the payload

/api/organizations/{org}/usage returns aggregates, not per-request rows. You can only see the regression by diffing two polls that bracket the prompt. ClaudeMeter caches every tick so this diff is a jq expression.

The anchor: the Opus weekly field, verbatim

This is every field ClaudeMeter deserializes from /api/organizations/{org}/usage. The 4.7 regression lands entirely on one of them:

claude-meter/src/models.rs

The seven_day_opus window holds a single utilization: f64 that Anthropic writes after the 4.7 tokenizer has been applied and after adaptive-thinking output has been billed. The companion resets_at tells you when the bucket empties. If you want to see the 4.7 regression in real time, watch this float.

The path a 4.7 prompt takes through your quota

Five steps. Step 1 is the last thing any local-log tool can see. Steps 2 through 4 are the entire regression.

From your prompt to seven_day_opus.utilization

1. Client writes JSONL from pre-tokenizer text

Claude Code records the text it sent and the text it received into ~/.claude/projects/<project>/*.jsonl. The token counts in that file are client-estimated, based on the 4.6 tokenizer assumptions baked into the CLI. ccusage and Claude-Code-Usage-Monitor read this file; that is the full scope of what they see.

2. Server retokenizes under 4.7 rules

Anthropic's backend retokenizes the same payload under 4.7's new vocabulary before billing. This is where 'up to 35 percent more tokens' kicks in. Your local JSONL has no way to know about this step; the retokenization result never travels back down to the client.

3. Adaptive thinking generates hidden output

4.7 runs an internal reasoning pass on most prompts. Those thinking tokens are counted as output tokens against your account. Claude Code's default is to hide them from the user, so they're absent from the JSONL but present in the utilization.

4. Server writes seven_day_opus.utilization

The utilization float is updated with the post-retokenization, post-thinking count. This is the number the 429-enforcement gate reads against. It's also the number ClaudeMeter reads every 60 seconds and exposes at 127.0.0.1:63762/snapshots.

5. Client-side tools stay silent

Because the JSONL is frozen after step 1, the local-log summary keeps showing pre-expansion counts. You'll see the regression only if you compare the client summary against the server utilization. That comparison is the entire thesis of this page.

Same prompt, two very different numbers

Toggle between what your local JSONL thinks happened and what Anthropic actually billed against seven_day_opus. This is the core of the regression, in one diff:

18,400 local tokens → 24,800-ish against the weekly denominator

ccusage reads ~/.claude/projects/<repo>/session.jsonl and sums the prompt_tokens + completion_tokens fields that Claude Code wrote at the moment of the call. It does not know about the 4.7 tokenizer expansion, and it does not see any thinking tokens the CLI hid from the display. Its answer for this prompt is 18,400 tokens, same shape as it would have been on 4.6.

Reads pre-expansion text
Counts only visible output
No awareness of seven_day_opus
Same answer on 4.6 and 4.7

Where the server-truth number comes from

Three inputs, one float. The extension fetches all three every 60 seconds and ships the combined snapshot to the menu-bar app.

Server inputs → seven_day_opus utilization

Reading the regression yourself

The cleanest way to see this is a diff of two polls that bracket a single 4.7 prompt. ClaudeMeter caches every tick and serves them at the loopback bridge, so a shell script suffices:

diff two ticks across a 4.7 call

The difference between the local token count and the server utilization delta is the regression. On 4.6 the same method would close the gap (modulo the 4.6 tokenizer's own overhead). On 4.7 the gap widens, and it keeps widening every time you enable adaptive thinking or feed a longer prompt.

The row in the popup that carries the regression

ClaudeMeter's extension popup renders every usage window on its own row, with a color-coded bar and a resets_at countdown. The 4.7 regression is entirely isolated to the 7d Opus row. 4.7 does not change seven_day_sonnet or the shared five_hour denominators; the bucket it fills faster is Opus-only.

claude-meter/extension/popup.js

If your menu bar shows the 7d Opus bar crossing 80 percent halfway through your week for a workload that used to finish at 60 percent on 4.6, that is the regression. There is no other explanation.

The three endpoints that back the number

The extension fetches all three on every tick. /usage is where seven_day_opus lives, but /overage_spend_limit decides whether a 4.7-inflated hit becomes a billed 200 or a hard 429, and /subscription_details controls the denominator:

claude-meter/extension/background.js

Numbers you can reproduce

From Anthropic's own 4.7 documentation and the open-source ClaudeMeter client. No invented benchmarks.

0%max 4.7 tokenizer expansion vs 4.6

0ptBrowseComp drop (83.7 -> 79.3)

0sClaudeMeter poll cadence

0localhost bridge port

lines of Rust + JS in the entire ClaudeMeter client

server field that carries the 4.7 quota regression

local-log tools that can measure it

What has to be true to measure the regression on your account

Preconditions for a clean before/after read

You are signed into claude.ai in the same browser the ClaudeMeter extension is loaded into. No cookie, no /usage read.
Your plan has an Opus weekly quota (Pro, Max 5x, or Max 20x). Free plans do not populate seven_day_opus, so there is nothing for 4.7 to inflate.
The menu-bar app is running. The extension can still surface the number on its own, but the localhost bridge is what lets shell scripts read it without hitting claude.ai directly.
You are issuing 4.7 calls through Claude Code (not the API). Only claude.ai subscription traffic is metered against seven_day_opus; API traffic bills separately.
Your claude.ai session is fresh. The /api/organizations/{org}/usage endpoint 401s on an expired cookie; ClaudeMeter surfaces this as an error in the menu bar, not silent zeros.

What each class of tool can and cannot see

Local-log summaries read the frozen JSONL. ClaudeMeter reads the float Anthropic actually enforces against.

Feature	Local log tools	ClaudeMeter
Sees 4.7 tokenizer expansion	No, reads pre-tokenizer text from local JSONL	Yes, /usage returns the post-expansion float
Sees omitted adaptive-thinking tokens	No, thinking is hidden from the JSONL	Yes, already counted in seven_day_opus.utilization
Isolates Opus-only weekly spend	No, summarizes all models together	Yes, seven_day_opus is its own float on the struct
Watches the enforcement gate, not just the logs	No, reads what the client thinks it sent	Yes, reads what Anthropic will actually 429 on
Diffable per-prompt cost	No, per-request token count is 4.6-era estimate	Yes, diff two snapshots bracketing the call
Catches the Sonnet-to-Opus surprise on the 5-hour float	No, cannot cross-reference five_hour and seven_day_opus	Yes, both floats live on the same Window struct
Exposes all of this over a scriptable localhost HTTP bridge	No	Yes, 127.0.0.1:63762/snapshots

The honest caveat

The quota regression is the one that actually stops your session, but the subjective regressions are the ones most Claude Code users will notice first: 4.7 is slower than 4.6, its long-context recall degrades above roughly 100K tokens, and its BrowseComp score dropped 4.4 points. If your workload lives in any of those regions, the quota-side cost is not your biggest problem. If it doesn't, the quota cost is the one you can actually fix, because it responds to prompt length, adaptive-thinking toggles, and the choice between 4.7 and 4.6 per call.

Everything on this page is reproducible on your own account. The endpoints are undocumented and Anthropic can change them, so ClaudeMeter parses into strict Rust structs; a schema break shows up as a parse error in the menu bar, not as silent wrong numbers. The repo is MIT and under 900 lines between src/ and extension/. You can audit the request path in about ten minutes.

Watch your seven_day_opus bucket in real time

ClaudeMeter runs in your macOS menu bar, polls /api/organizations/{org}/usage every 60 seconds, and serves the combined snapshot at 127.0.0.1:63762/snapshots. Free, MIT, no keychain prompt with the browser extension.

Install ClaudeMeter

Frequently asked questions

What is the Claude Code 4.7 regression most people are missing?

The quota regression. 4.7 uses a new tokenizer that expands text roughly 1.0x to 1.35x compared to 4.6, and its adaptive thinking mode generates more hidden output tokens than 4.6 did. Both expansions are applied server-side before Anthropic writes the usage float to your seven_day_opus bucket. The result is that the same Claude Code session, run against the same repo, can fill your Opus weekly utilization 20 to 35 percent faster on 4.7 than it did on 4.6. This is not covered by any of the popular benchmark writeups on this release because it does not show up in accuracy scores or eval suites; it only shows up on claude.ai/settings/usage.

Why can't ccusage or Claude-Code-Usage-Monitor see this regression?

Those tools read the JSONL that Claude Code writes under ~/.claude/projects/. That file records a token count produced by the client before Anthropic applies the 4.7 tokenizer, and it excludes adaptive-thinking tokens when the client hides them (display: omitted). Anthropic's server applies the tokenizer expansion and counts the hidden thinking tokens after the JSONL is written, then writes the result to your seven_day_opus.utilization float. The local log and the server quota disagree by exactly the amount of the regression. The only way to close the gap is to read the server's /api/organizations/{org}/usage endpoint, which is what ClaudeMeter's src/api.rs does on every 60-second tick.

Where does the Opus weekly bucket live in the ClaudeMeter source?

It's the seven_day_opus field on UsageResponse at src/models.rs line 23. The type is Option<Window>, where Window has a utilization float and a resets_at timestamp. The extension posts the parsed struct to 127.0.0.1:63762/snapshots as JSON on every tick, so any process on your machine can curl that URL and read the authoritative 4.7-inflated number. The menu-bar app renders the same float as '7d Opus' in extension/popup.js line 63.

Are there other confirmed Claude Code 4.7 regressions?

Yes, and they're well-covered elsewhere. Anthropic's own release notes for 4.7 call out a regression in long-context recall above roughly 100K tokens, specifically in needle-in-a-haystack style tasks. Independent reports cite a 4.4 point drop on BrowseComp (83.7 to 79.3), higher latency than 4.6, and reduced stylistic flexibility on creative tasks. Those are real and worth knowing. The one I focus on here is the one that empties your weekly Opus bucket before Friday, because that's the one that stops your Claude Code session mid-refactor.

How much faster does the seven_day_opus bucket actually fill on 4.7?

There's no published ratio because the expansion is per-prompt and content-dependent. Anthropic's own 4.7 documentation describes the tokenizer as using 1.0x to 1.35x more tokens for the same input text. Adaptive thinking on top of that adds a variable output-side cost that 4.6 did not have. In practice you can read the ratio yourself: run the same prompt on 4.6 and 4.7 back-to-back, then diff the seven_day_opus.utilization float returned by /api/organizations/{org}/usage before and after each call. ClaudeMeter's localhost bridge makes this a two-curl experiment.

Is the Opus 4.7 quota drain the same as hitting the 5-hour limit?

No. The 5-hour window (usage.five_hour.utilization) is shared across every model. The Opus weekly window (usage.seven_day_opus.utilization) is Opus-only and independent. A 4.7 session can pin the Opus weekly float while leaving the 5-hour float at 30 percent, because Sonnet traffic doesn't contribute to seven_day_opus. The popular confusion is to watch only the 5-hour bar on claude.ai and assume the Opus weekly is in proportion. It isn't, and 4.7's tokenizer expansion widens the gap further.

Does this regression affect Claude Code API users or only the subscription?

This page is about the subscription quotas (Pro, Max 5x, Max 20x) that power Claude Code when it's signed into claude.ai. API billing uses different accounting: it bills per-token by the 4.7 tokenizer too, so the same text costs more on the API than it did on 4.6, but it doesn't land in a seven_day_opus bucket. If you run Claude Code on an API key, the regression shows up as a higher invoice rather than an earlier 429. The /usage endpoint ClaudeMeter reads is specific to the subscription plan.

Can I tell from the JSON payload whether a request was charged at the 4.7 rate?

Not directly. The /usage endpoint returns only the current utilization fractions and their reset timestamps, not per-request metadata. What you can do is compare deltas: poll /api/organizations/{org}/usage before a prompt, run the prompt on 4.7, poll again. The delta in seven_day_opus.utilization represents the server-side token count for that single call, already expanded. ClaudeMeter caches every snapshot the extension fetches, so you can diff two ticks that bracket your prompt and see the real cost.

What endpoint does ClaudeMeter use to pick up the 4.7 quota state?

Three of them. extension/background.js line 24 fetches /api/organizations/{org}/usage for utilization (seven_day_opus lives in that payload). Line 26 fetches /api/organizations/{org}/overage_spend_limit to see whether 4.7 requests will bill through to extra usage or hard-stop at 429. Line 28 fetches /api/organizations/{org}/subscription_details for the plan tier that sets the denominator. All three are undocumented and authed with your existing claude.ai cookies. No API key, no login flow.

Is this regression documented anywhere by Anthropic?

Partially. Anthropic's 'What's new in Claude 4.7' page acknowledges the new tokenizer and the 1.0x to 1.35x range. Their Claude Code best-practices page warns that adaptive thinking can generate substantial hidden output. What Anthropic does not publish is a specific multiplier on seven_day_opus, so the quota regression is a behavioral fact you have to measure on your own account. ClaudeMeter's client is MIT and under 900 lines; src/api.rs and src/models.rs together are the entire trust surface.

How do I watch the seven_day_opus float in real time?

Install ClaudeMeter (brew install --cask m13v/tap/claude-meter) and load the extension into any Chromium browser. The extension polls every 60 seconds and posts the snapshot to the menu-bar app's localhost bridge. Open the popup and the '7d Opus' row shows the current utilization with its resets_at delta. For a scripted check, curl http://127.0.0.1:63762/snapshots and read .[0].usage.seven_day_opus.utilization directly.

Keep reading

Claude Opus 4.7 rate limit: three endpoints, not one number

The rate limit is not a ceiling; it is a tuple. /usage, /overage_spend_limit, /subscription_details together decide whether your next 4.7 call 200s, bills, or 429s.

Read

Claude Code Opus 4.7 usage limits

Where the seven_day_opus float lives in the schema and why 4.7 fills it faster than 4.6 on the same workload.

Read

The Claude rolling window cap is seven windows

Anthropic publishes two rolling windows; the endpoint returns seven. Every field, what it gates, and which ones 4.7 actually trips.

Read

Got a 4.7 workload where the quota math doesn't line up?

If you see a seven_day_opus delta that doesn't match either the 4.6 baseline or the 4.7 tokenizer expansion, send a snapshot over. Easy to diagnose with one JSON.

Why the benchmark posts miss this one

Where 4.7 adds tokens your JSONL does not record

Tokenizer expansion (1.0x to 1.35x)

Adaptive thinking, omitted from display

Longer average outputs on refactor-style prompts

No per-request breakdown in the payload

The anchor: the Opus weekly field, verbatim

The path a 4.7 prompt takes through your quota

From your prompt to seven_day_opus.utilization

1. Client writes JSONL from pre-tokenizer text

2. Server retokenizes under 4.7 rules

3. Adaptive thinking generates hidden output

4. Server writes seven_day_opus.utilization

5. Client-side tools stay silent

Same prompt, two very different numbers

18,400 local tokens → 24,800-ish against the weekly denominator

Where the server-truth number comes from

Server inputs → seven_day_opus utilization

Reading the regression yourself

The row in the popup that carries the regression

The three endpoints that back the number

Numbers you can reproduce

What has to be true to measure the regression on your account

What each class of tool can and cannot see

The honest caveat

Watch your seven_day_opus bucket in real time

Frequently asked questions

Keep reading

Claude Opus 4.7 rate limit: three endpoints, not one number

Claude Code Opus 4.7 usage limits

The Claude rolling window cap is seven windows

Got a 4.7 workload where the quota math doesn't line up?

Comments (••)

Comments ()