ccusage says 5 percent, claude.ai says rate limited: they measure different things

Did Claude usage hit 100 percent after one prompt while your local counter still showed plenty of headroom? You are not imagining it. Local token counters and the claude.ai rate limiter answer two different questions and operate on two different denominators. Most people only learn this when they are stuck mid-refactor with a 429 and a token meter that says everything is fine.

M
Matthew Diakonov
9 min read
4.9from Sourced from the live claude.ai usage endpoint
Field names from src/models.rs, lines 3-7
Verifiable in 30 seconds with one curl
Same JSON the Settings page itself fetches

The thread that surfaced this for everyone

A pattern that shows up almost weekly in the Claude Pro user base: someone is deep in a refactor, ccusage in a tmux pane says 5 percent of the plan is used, then the very next prompt comes back with a generic rate-limit error. They restart ccusage, double-check the count, swap models, ask in Discord. The local counter still says 5 percent. claude.ai/settings/usage shows the 5-hour bar pinned at 100 percent.

Both numbers are correct. They just answer different questions, and the rate limiter only listens to one of them.

Mental model swap

ccusage prints '5 percent used' so I have hours of headroom. I can keep refactoring. I will check again in a while when it climbs.

  • Implies local token counts equal server quota
  • Implies attachments and tool calls are token-equivalent
  • Implies peak-hour traffic does not exist
  • Ignores anything sent in the browser on claude.ai

What ccusage actually measures

ccusage and friends (ccburn, Claude-Code-Usage-Monitor) read one set of files: the JSONL transcripts Claude Code writes under ~/.claude/projects/<project>/<session>.jsonl. Every assistant turn appends one row that looks roughly like this:

~/.claude/projects/<project>/<session>.jsonl

The tool sums input_tokens + output_tokens across rows in a recent window, picks a denominator (a plan guess, often hard-coded), and prints a percent. That is a perfectly fine measurement of how fast Claude Code is moving tokens through your machine. It is not a measurement of your remaining 5-hour quota on the server.

What the server actually measures

The Anthropic rate limiter looks at exactly one number per window: a single utilization float in the response from GET /api/organizations/{org_uuid}/usage. Same endpoint claude.ai/settings/usage hits to draw your bar. Same field rendered in the in-app indicator:

claude.ai/api/organizations/{org_uuid}/usage

One utilization float. One resets_at ISO timestamp. The 429 fires when utilization >= 1.0 (or 100.0; the same payload sometimes returns 0.97 and sometimes 97.0 across sibling buckets, which is its own source of bugs).

One prompt, two ledgers

The same prompt lands in two places at once: a local JSONL file and a server-side weighted bucket. Local counters watch the file. The rate limiter watches the bucket.

local counter vs server quota

YouClaude Codeclaude.ai serverLocal JSONLfive_hour.utilizationsend promptPOST /completionsincrement by weight(prompt, model, attachments, tools, peak)append { input_tokens, output_tokens, ts }ccusage reads, divides, says 5 percentclaude.ai/settings/usage shows 97 percentnext prompt: 429 if utilization >= 1

Why the two numbers diverge so dramatically

The gap is not noise. It is structural. Anthropic applies at least five independent weighting factors to every prompt before it lands on five_hour.utilization, and your local JSONL file never sees any of them.

Weights the server adds, every prompt

  • Peak-hour multiplier (Anthropic late-2025 note): weekday US Pacific midday raises the quota cost of every prompt. Local logs see the same tokens; server utilization sees more.
  • Per-attachment cost: PDFs, images, and files tacked to a prompt land on utilization. Token logs record only the text tokens Claude Code sent.
  • Per-tool-call cost: code execution, web browsing, and MCP tool calls add weight that local logs do not account for in the same units.
  • Per-model weight: Opus costs more per token than Sonnet. Two 1000-token prompts on different models produce identical local counts and different server deltas.
  • Browser-chat usage: prompts sent on claude.ai (not via Claude Code) never land in ~/.claude/projects, but they absolutely land on five_hour.utilization.

Inputs that move the server number, but not your token log

What lands on five_hour.utilization but not on JSONL

Prompt tokens
Attachments
Tool calls
Model picked
Peak-hour multiplier
five_hour.utilization
Settings page bar
ClaudeMeter menu bar
429 at >= 1.0

Reproduce the gap in 60 seconds

You do not need a new tool to confirm any of this. Run your local counter and the server endpoint side by side, then read off both percentages:

ccusage vs server-truth

The first time you do this during weekday peak hours with Opus and a couple of attachments in a session, the gap is usually wide enough to feel like a typo.

The 0-to-1 vs 0-to-100 trap

One small wrinkle if you call the endpoint yourself: utilization arrives on inconsistent scales. We have seen the same payload come back with five_hour.utilization at 0.97 and a sibling bucket at 94.0. ClaudeMeter normalizes with one clamp:

claude-meter/extension/popup.js

Skip that clamp and a bucket at 0.97 renders as “less than 1 percent”, which is the exact failure mode that gives ccusage-style tools their false confidence on the next prompt.

The verification path, end to end

1

Run your favorite local counter

ccusage, ccburn, Claude-Code-Usage-Monitor: pick one. They all read ~/.claude/projects/**/*.jsonl, sum input_tokens + output_tokens, and compare to a chosen ceiling. Note the percent.

2

Open claude.ai/settings/usage at the same moment

This is the only first-party surface that renders the server-truth number. The page calls /api/organizations/{org_uuid}/usage and draws the bar from five_hour.utilization. Note the percent next to the 5-hour bar.

3

Compute the gap

Subtract local from server. A small gap (single percentage points) usually means you are off-peak, no attachments, Sonnet only. A large gap (tens of percentage points) usually means peak hours, Opus, attachments, or browser-chat usage that local logs do not see.

4

Decide which one to trust

If your goal is 'how much did Claude Code burn locally', trust the local counter. If your goal is 'will my next prompt 429', trust the server number. They are not interchangeable.

5

Project to the cap

From two consecutive server polls, compute Δu / Δt. At a positive burn rate, ETA_to_429 = (100 minus current_utilization) / burn_rate minutes. ClaudeMeter persists every poll to snapshots.json so you can compute this without a new tool.

Server-truth quota vs local-token counter

Side by side. Same workload, different questions.

Featuretokens summed from JSONL (ccusage)five_hour.utilization (server)
Data source~/.claude/projects/**/*.jsonl (local)GET /api/organizations/{org}/usage (server)
Question answeredhow many tokens did Claude Code emithow full is my server-side quota
Peak-hour multiplierinvisiblebaked in
Attachment costinvisiblebaked in
Tool-call costtext tokens onlybaked in
Browser-chat usage on claude.ainot countedcounted
Predicts 429no (different denominator)yes
Refresh cadenceon tail of JSONL writesevery 60 seconds (live)

Numbers that matter

From the implementation. No invented benchmarks.

0field the rate limiter actually checks
0weights local logs cannot see
0sClaudeMeter poll cadence
0cookies you have to paste

Common myths to drop

Myth: local tokens equal server quotaMyth: ccusage predicts 429sMyth: attachments are free in token logsMyth: peak hours are a separate fieldMyth: Opus and Sonnet weigh the sameMyth: browser-chat prompts do not count

Predicting the cap mid-refactor

Once you stop asking the wrong tool, the prediction problem becomes simple. Two consecutive polls of five_hour.utilization give you a delta. Divide by the wall-clock minutes between them. That is your burn rate. From there:

ETA_to_429 = (0 0) / 0 0 minutes

At 97 percent server utilization with a 3.2 percent per minute burn rate, you have under one minute. The local counter, sitting at 5 percent, gives you no signal at all. ClaudeMeter sits in the macOS menu bar and updates both the percent and the resets countdown every 60 seconds, so the cap is visible without breaking flow. ccusage stays useful for what it does well: telling you what Claude Code burned on disk.

When ccusage is the right answer

This is not a takedown of local token counters. They are excellent at the question they actually answer. If your goal is to attribute Claude Code spend per project, audit how a long agent loop fanned out, or feed token counts into a billing model, local JSONL is the right source. It is accurate to the byte, fast to read, and runs offline.

The mismatch only shows up when people press the local counter into service as a rate-limit predictor. It cannot do that job because the rate limiter does not look at the same numbers. ccusage and ClaudeMeter are complementary, not substitutes.

The honest caveat

The endpoint is internal and undocumented. The field names have been stable for many months but Anthropic could rename or reshape them in any release. ClaudeMeter deserializes the response into a strict Rust struct in src/models.rs, so when the shape changes the menu bar surfaces a parse error and a release ships the same day. Until then, this is the field, and it is the only one that matches what the rate limiter enforces.

Watch the server number, not just the local one

ClaudeMeter sits in your macOS menu bar and refreshes every 60 seconds. Free, MIT licensed, no cookie paste, reads the same JSON claude.ai/settings/usage reads. Pair it with ccusage for full coverage.

Install ClaudeMeter

Frequently asked questions

If ccusage says I am at 5 percent and claude.ai rate-limits me, who is wrong?

Neither. They are answering different questions. ccusage tots up token counts in your local Claude Code JSONL files under ~/.claude/projects and divides by a chosen ceiling. The claude.ai rate limiter trips on five_hour.utilization on the server, weighted by peak-hour multiplier, attachments, tool calls, model class, and any browser-chat usage. The local count and the server count agree only by coincidence. Treat them as two separate dashboards.

Why is the difference so big in practice, like 5 percent local vs 100 percent server?

Because the two numbers do not share a denominator and the server applies weights local logs cannot see. Five examples of weight that lives only on the server: a peak-hour multiplier on weekday US Pacific midday hours, a per-attachment cost that fires the moment you upload a PDF or image, a per-tool-call cost on code execution and web browsing, a per-model weight (Opus burns faster than Sonnet for the same prompt), and any prompt you sent on claude.ai in the browser, which never lands in the JSONL files at all. Stack a few of those up and the gap goes from cosmetic to existential.

Where is the server-truth number, exactly?

GET https://claude.ai/api/organizations/{your-org-uuid}/usage with your logged-in claude.ai cookies. The response is JSON with a five_hour object that contains a utilization float and a resets_at ISO timestamp. The bar drawn on claude.ai/settings/usage is rendered from the same two values. ClaudeMeter calls this endpoint once a minute via your existing browser session and surfaces the raw numbers. There is no documentation, but the field names have been stable for months.

Can I just trust ccusage if I never use attachments and never browse claude.ai?

Closer, but still no. Even with no attachments and no browser chat, the peak-hour multiplier still applies, the per-tool-call cost on Claude Code itself still applies, and the per-model weight still applies. ccusage is correct as a local token-flow measurement. It is not a faithful proxy for the rate limiter. If you are trying to predict whether your next prompt will 429, the only reliable signal is five_hour.utilization on the server.

What does a session-level rolling-window meter actually look like?

Three things on screen at once: the current utilization fraction (server-truth, 0 to 100), the resets_at timestamp converted to a human countdown like '5h: 47m', and either the rate of change (Δu/Δt over the last poll) or a simple ETA to 100 percent. ClaudeMeter renders the first two in the macOS menu bar; the third you can compute yourself from the persisted snapshots.json file.

How do I predict the cap mid-refactor without a tool?

Open claude.ai/settings/usage in a tab, leave it open, and refresh every couple of minutes. Watch the bar move. Because the bar is rendered from five_hour.utilization, the same number the rate limiter checks, you will see a 429 coming roughly one minute before it lands. The downside is that you have to break flow to look at it. The upside is that no install is required and the number is exact.

Stuck reconciling local and server numbers?

If your numbers diverge in a shape we have not seen, send a snapshot. Happy to map it. 15 minutes is plenty.