The Claude Code rolling 5-hour wall is one float on one endpoint.

When Claude Code 429s your refactor at the 5-hour mark, the wall is literally a single f64 on a single server endpoint. Your local Claude Code logs cannot read it. ccusage cannot see it. /usage inside Code measures something else entirely. This page walks you through the exact field, the exact endpoint, and why three classes of local tools systematically under-report what the server actually enforces.

M
Matthew Diakonov
6 min read
Direct answer (verified 2026-05-14)

The wall is the boolean five_hour.utilization >= 1.0 on the cookie-authenticated GET /api/organizations/{org_uuid}/usage endpoint on claude.ai . When that float crosses 1.0, the next Claude Code prompt returns 429 with usage_limit_reached and stays 429 until enough of your oldest messages age off the rolling 5-hour sliding window. Cookie-authenticated, no API token. The struct that decodes this payload is UsageResponse at src/models.rs in the MIT-licensed claude-meter repo.

1. The field that 429s your loop

One Rust struct decodes the response from /api/organizations/{org}/usage. The first field is the rolling 5-hour wall. The other six are the weekly caps. utilization is a fraction between 0.0 and 1.0+; resets_at is the ISO timestamp of the next message age-off.

claude-meter/src/models.rs

2. Why your local tools said you had headroom

Three concrete reasons the JSONL files Claude Code writes to ~/.claude/projects systematically lag the server's five_hour.utilization.

  • Server-side cache reweighting. Cold-turn tokens bill at 1.25x base input. Cached prefix reads bill at 0.10x. The JSONL records raw token counts before this multiplication runs. A cold turn with 75,000 tokens of system prompt + MCP definitions shows up locally as 75,000 cache_creation tokens. On the server it is charged as ~93,750 input-equivalent tokens against five_hour.
  • claude.ai web chat fills the same bucket. Every prompt you send in the browser chat hits the same five_hour float. Claude Code never wrote those bytes to your JSONL. ccusage cannot see them. The server sums everything; the union is what fires the 429.
  • Opus thinking tokens are server-counted. Extended-thinking tokens on Opus 4.7 do not all land in the streamed JSONL the way input/output tokens do. The server's float reflects the full thinking spend; the local sum reflects what was streamed. On a heavy Opus session this gap alone is 10 to 20 percentage points.

3. One refactor, the float climbing the whole time

A real Claude Code session walked through cache-state moments. Each line is a point where the server's five_hour.utilization moved more than the local-disk sum predicted, and why.

1

T+0:00 Refactor starts. five_hour at 18%.

You open Claude Code with three MCP servers loaded. five_hour.utilization is 0.18 on the server. ccusage's local sum says 4% because it has not seen the rolling tail from yesterday's session.

2

T+0:45 Sonnet to Opus 4.7 switch. five_hour at 41%.

The hard part of the refactor needs Opus. That switch is a cold cache turn: the full system prompt + MCP tool definitions re-bill at 1.25x. Server-side reweighting bumps the bucket faster than the JSONL token counts suggest. ccusage now says 12%.

3

T+1:20 /compact. Cold turn. five_hour at 63%.

Window pressure forces an auto-compact. Another cold cache turn. The Settings bar climbs past 60%. /usage inside Claude Code shows 8% of the CURRENT turn's context window used, which is true and irrelevant to the rolling bucket.

4

T+1:55 You open claude.ai chat in another tab. five_hour at 71%.

You ask claude.ai web chat a quick question while Code is thinking. That request hits the same five_hour bucket; the JSONL never sees it. ccusage stays at 18%. The server is at 71%.

5

T+2:30 five_hour at 99%. ClaudeMeter menu bar turns amber.

Server-side, you are one cold turn from the wall. /usage inside Code still shows current-turn context at 11%. The only signal that matches the server is the float from /api/organizations/{org}/usage.

6

T+2:33 429 mid-refactor. Loop stops.

Next Claude Code prompt returns 429 with usage_limit_reached. Your refactor is paused. The wall will not fully release for 5 hours after your last message, but resets_at on the /usage response shows the next age-off boundary, which is when the bar first steps down.

4. The 60-second poll that watches the wall

The whole loop is 30 lines of JavaScript. Cookie-authenticated; reuses your existing claude.ai session. One HTTPS request per minute. The JSON comes back from the same endpoint claude.ai/settings/usage renders, so the numbers match the Settings page exactly.

claude-meter/extension/background.js

5. What that looks like in your terminal

One claude-meter status read of all four clocks at once. The 5-hour bar at 92% is the one worth watching when you are mid-refactor; the OAuth-apps weekly bucket at 83% is the one that bites first across a full week.

claude-meter --once

6. The moment the wall fires

When five_hour.utilization crosses 1.0, the next Claude Code prompt returns 429 with usage_limit_reached. Three plays, in order of how fast they unblock you.

  • Enable extra usage. On claude.ai/settings/billing, flip extra-usage on and set a monthly cap. From that moment, post-wall prompts charge per token instead of 429ing. This is the fastest unblock for a refactor you want to ship today. ClaudeMeter shows the extra-usage balance in dollars alongside the 5-hour bar.
  • Switch to claude.ai web for tasks that do not need agentic loop. If the 5-hour wall is what you hit, web chat does not unblock you (same bucket). But if your real blocker is the OAuth-only weekly bucket (seven_day_oauth_apps), web chat is on a different bucket and keeps working.
  • Wait, but watch resets_at, not 5 hours from now. resets_at on the same endpoint is the next message age-off boundary. That is when the bar first steps down by however much your oldest message cost. The bar only returns to 0% five hours after your LAST message. Most people misread resets_at as a full reset and come back disappointed.

Hitting the 5-hour wall every day this week?

A 20-minute call to walk through your Claude Code setup and see which clock is actually firing first.

FAQ

Frequently asked questions

What is the Claude Code rolling 5-hour wall, in one sentence?

It is the moment a single float, five_hour.utilization, returned by a cookie-authenticated GET to /api/organizations/{org_uuid}/usage on claude.ai, crosses 1.0. From that instant every Claude Code prompt 429s until enough of your oldest messages age off the rolling 5-hour window. The wall is server-side; the JSONL files Claude Code writes to ~/.claude/projects are not the source of truth.

Why is it called rolling instead of resetting at a fixed hour?

Because each message you send has its own 5-hour age-off clock. The cost of message one drops off 5 hours after message one. The cost of message two drops off 5 hours after message two. The Settings page shows a single resets_at timestamp, but that is just when the OLDEST message in the window ages off, not when the bar returns to 0%. If you sent 10 messages over 2 hours and stop, the bucket drains in 10 steps over the next 5 hours, not in one drop at the resets_at time.

Why does ccusage say I have headroom right before I get 429ed?

ccusage sums tokens from your local ~/.claude/projects/*.jsonl logs. Three things make that sum diverge from the server's float. First, server-side cache reweighting: cold-turn tokens bill at 1.25x and cached prefixes read at 0.10x, but the JSONL records raw token counts. Second, claude.ai browser chat fills the same five_hour bucket but never writes to your local logs. Third, thinking tokens on Opus 4.7 are server-counted in ways the streamed JSONL undercounts. The result: on heavy days ccusage typically trails the server by 15 to 30 percentage points.

Does the /usage command inside Claude Code show the rolling 5-hour wall?

No. /usage inside Claude Code is a context-window meter for the current turn. It tells you how full your 200K input window is right now, broken down by system prompt, MCP tool definitions, file reads, and your prompts. It says nothing about the rolling 5-hour bucket on the server. Those are two different surfaces measuring two different things. The five_hour bucket is the one that fires 429s; the /usage local meter is the one that decides whether Claude Code auto-compacts.

When the wall fires mid-refactor, what is the fastest recovery?

Three plays, in order. First, finish what you can on claude.ai web chat for an hour: web chat hits the same five_hour bucket so it does not unblock you, but if your blocker is actually the OAuth-only weekly bucket (seven_day_oauth_apps), web chat keeps working. Second, enable extra-usage (metered) credits on your account: claude.ai/settings/billing turns it on; from then on, post-wall prompts charge per token instead of 429ing. Third, if you must wait, look at resets_at on the same endpoint: it is the next age-off boundary, not the time the wall fully releases. A few percent typically drains every 20 to 30 minutes once your oldest messages start aging off.

Can I read five_hour.utilization myself without installing anything?

Yes. Sign in to claude.ai once in any Chromium browser, then open DevTools on /settings/usage and watch the network tab. The request is GET /api/organizations/{your-org-uuid}/usage, response is JSON, and the field is response.five_hour.utilization (0.0 to 1.0+). The same endpoint also returns seven_day, seven_day_opus, and seven_day_oauth_apps. Cookie-authenticated only; no API token or OAuth flow needed because you reuse your existing session.

Why does the wall sometimes fire when /usage shows 5% used?

Because /usage measures the CURRENT turn's context window, not the rolling 5-hour bucket. A long agentic loop can have 50 prior turns sitting in the rolling 5-hour bucket while the current turn's context is freshly compacted to 5%. The 5% number is honest about the current turn and useless about the rolling window. The Settings page is the canonical source for the 5-hour bar; the menu bar app polls the same endpoint every 60 seconds so you do not have to refresh manually.

What is the relationship between the 5-hour wall and the weekly cap?

Four independent rolling buckets, charged simultaneously: five_hour (5-hour sliding window, all clients), seven_day (168-hour cap, all clients), seven_day_opus (168-hour Opus-only cap), and seven_day_oauth_apps (168-hour OAuth-only cap, i.e. Claude Code + MCP only). Each is a separate utilization float with its own resets_at. Any one at 1.0 fires the next 429. Claude Code users in heavy agentic loops typically hit five_hour first inside a session and seven_day_oauth_apps first across a week, while heavy claude.ai chat users hit five_hour and seven_day. Anthropic doubled the five_hour limit on May 6 2026; the weekly caps were not doubled.

Where does ClaudeMeter read the wall from, exactly?

From the same endpoint claude.ai/settings/usage renders. The browser extension at extension/background.js polls /api/organizations/{org}/usage on claude.ai every 60 seconds with credentials: 'include' (your existing claude.ai cookie). It POSTs the JSON to the local menu bar app on 127.0.0.1:63762, which renders five_hour.utilization with its resets_at as a relative duration. The Rust UsageResponse struct that decodes the payload is at src/models.rs line 18 to 28 in github.com/m13v/claude-meter. MIT licensed. No telemetry. One HTTPS request per minute to claude.ai using your own session.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.