Skip to content

Claude Code Degradation 2026

Intermediate

Timeline, root causes, and workarounds for the March 2026 Claude Code quality regression and token cost inflation event.

What Happened — Timeline

Date Event
Early Feb 2026 Opus 4.6 released with Adaptive Thinking (model decides per-turn reasoning budget)
2026-02-12 redact-thinking-2026-02-12 header rolled out — hides thinking traces from UI
2026-03-03 Default effort silently lowered from high to medium (effort=85). No changelog entry.
2026-03-08 Quality cliff in longitudinal session data (stop-hook violations spike, reads-per-edit collapses)
2026-03-23+ Cache invalidation bugs in v2.1.100 deployed. Token costs inflate 10-20x silently.
2026-03-26 Anthropic confirms peak-hour throttling (5-11am PT weekdays)
2026-03-31 Anthropic public statement: "hitting limits way faster than expected"
2026-04-02 6,852-session longitudinal analysis published. Median thinking depth: 2,200 → 600 chars (-67%). Reads-per-edit: 6.6 → 2.0. Team monthly Bedrock-equivalent cost: $345 → $42,121 (~122x).
2026-04-09 Community workaround guides for CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 published

Root Causes (Confirmed)

1. Effort Default Lowered (effort=85)

Anthropic lowered the default effort level from high to medium (internal value 85) on 2026-03-03. Confirmed by Anthropic engineering. Motivation: users "consuming too many tokens per task."

Effect: model allocates less thinking budget per turn. For simple tasks: negligible. For complex multi-step engineering: 67% reduction in thinking depth.

Adaptive thinking can allocate zero tokens on easy-seeming turns, producing fabricated commit SHAs, non-existent packages, and hallucinated API versions.

2. Cache Invalidation Bugs in v2.1.100

Two separate bugs reverse-engineered from Claude Code v2.1.100 binary:

Bug 1 — Billing sentinel string replacement:
When conversation history contains billing-related terms, a regex replacement hits the wrong byte position → breaks cache prefix → entire context re-billed as uncached. Effect: 10-20x token inflation.

Bug 2 — Resume/continue tool injection position:
Tool definitions injected at a different position than in fresh sessions → cache invalidated → full re-processing of all prior context on every --resume/--continue.

Combined effect: ~20,000 invisible tokens added to every request in affected sessions. Sessions drain quota ~40% faster.

Measured Impact

Metric Pre-cliff (Feb) Post-cliff (March)
Median thinking block length 2,200 chars 600 chars (-67%)
Reads-per-edit ratio 6.6 2.0
Stop-hook violations/day ~0 ~10
API request count for equivalent task baseline ~80x higher
Monthly cost (team, Bedrock-equivalent) $345 $42,121 (~122x)

Workarounds

Tier 1 — Officially Supported

# Force maximum reasoning effort persistently
export CLAUDE_CODE_EFFORT_LEVEL=max

# Disable adaptive thinking — force fixed reasoning budget per turn
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1

In-session: /effort max or /effort high. Per-turn trigger: ultrathink keyword.

settings.json:

{
  "effort": "max",
  "disableAdaptiveThinking": true
}

API users — explicit thinking budget (not adaptive):

{"thinking": {"type": "enabled", "budget_tokens": 32000}}
# NOT {"type": "adaptive"} — adaptive silently nullifies maxThinkingTokens

Tier 2 — Community Workarounds

Cache hygiene (Bug 2 mitigation): - Avoid --resume / --continue until v2.1.100 cache bugs are confirmed fixed - Start fresh sessions for long workflows - Avoid billing-related strings in conversation history (Bug 1 trigger): "billing", "usage limit", "quota"

CLAUDE.md prefix stability (Bug 1 mitigation): - No timestamps near the top of CLAUDE.md (breaks cache on every session) - No dynamic content in system prompt prefix - Define all tools once; use state machines instead of swapping tool definitions - Preserve error messages in context (reduces retry cycles ~40%)

Token monitoring: - Track cache_read_input_tokens vs cache_creation_input_tokens in API responses - Target: >80% cache hit rate on long sessions - Below 80% → something is breaking the prefix

Version rollback:

# Rollback to v2.1.87 (pre-regression):
curl -fsSL https://claude.ai/install.sh | bash
# Use native installer, NOT npm

Tier 3 — Alternative Models

Model SWE-bench Cost vs Opus 4.6 Notes
DeepSeek V3.2 ~90% of GPT-5.4 ~20x cheaper Strong agentic coding
Gemini 3.1 Pro 80.6% ~50% cheaper Comparable on coding
Qwen3.6-Plus Beats Opus 4.5 on terminal Via Tongyi subscription 1M context
GLM-5.1 Agentic-tuned Free (self-host) Open weights

Pragmatic 2026 stance: Start with open-weight (DeepSeek/Qwen/GLM) for routine tasks; use Opus 4.6 only for subtasks where the gap matters.

What Anthropic Has Not Addressed (as of 2026-04-14)

  • Whether v2.1.100 cache bugs are fixed
  • Exact mapping of effort=85 / medium / high / max to thinking-token budgets
  • Why the effort-default change was not in any release note
  • Whether model weights changed between Opus 4.5 and 4.6 (denied, not independently verifiable)

KV-Cache Best Practices (Permanent)

Regardless of Anthropic changes, cache-friendly CLAUDE.md structure reduces costs:

CLAUDE.md structure (cache-friendly):
  1. Static invariant rules (never changes between sessions)
  2. Project structure (changes rarely)
  3. Per-session context (if any, goes at END)

Never at the top:
  - Current timestamps
  - Dynamic project status
  - Session-specific context

See context window management for full KV-cache optimization guide.

Gotchas

  • Issue: /effort max in-session resets after compaction or new session without persisted settings. -> Fix: Set "effort": "max" in settings.json AND export CLAUDE_CODE_EFFORT_LEVEL=max in shell profile.
  • Issue: Adaptive thinking allocated zero tokens on a turn → model produces confident but fabricated output (commit SHAs, package names, API versions). -> Fix: CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 forces a minimum thinking budget per turn. Verify critical outputs against actual filesystem/APIs.
  • Issue: Cache hit rate monitoring shows low hits even with stable prefix → legacy v2.1.100 tool injection bug. -> Fix: Upgrade to latest Claude Code via native installer (not npm). Check cache_creation_input_tokens vs cache_read_input_tokens in API response to confirm.

See Also