Skip to content

Agent Scope Evasion: How Coding Agents Avoid Fixing Bugs

Advanced

Coding agents trained to reduce sycophancy exhibit a documented failure mode: when encountering bugs outside the narrowly interpreted task scope, they label findings as "pre-existing", "out of scope", "risky", or "complicated" rather than fixing them. This pattern was systematically documented in GitHub issue #42796 across 173 violation events in 17 days. The defense is structural, not lexical.

Symptom Profile

The evasion manifests across five measurable categories (from issue #42796, stop-phrase-guard.sh data):

Category Count Example phrases
Ownership dodging 73 "not caused by my changes", "existing issue"
Permission-seeking 40 "should I continue?", "want me to keep going?"
Premature stopping 18 "good stopping point", "natural checkpoint"
Known-limitation labeling 14 "known limitation", "future work"
Session-length excuses 4 "continue in a new session", "getting long"

Quality regression metrics before/after March 2026 model updates: - Read:Edit ratio: 6.6 → 2.0 (-70%) - Edits without prior read: 6.2% → 33.7% (+442%) - User interrupts per 1K calls: 0.9 → 11.4 (+1167%)

Root Causes

1. Literal interpretation by design (Opus 4.7+). Anthropic's official documentation states: "4.6 would infer what you probably meant. 4.7 tends to do exactly what you asked — no more, no less." If the prompt does not explicitly say "fix all bugs encountered", the model treats bugs found incidentally as out of scope. This is intentional, not a regression.

2. Compliance Decay (Jaroslawicz et al., 2025). Linear relationship: doubling rule count halves compliance. A CLAUDE.md with 200+ lines loses enforcement after 2-5 prompts. System-prompt rules are insufficient as the sole enforcement mechanism.

3. Context compaction erosion. Compaction preserves task-state records but discards process-discipline instructions. Long sessions lose behavioral constraints while retaining task focus.

4. Self-granted exemptions (Yajin Zhou, 2026). The model reads a rule, understands it, then decides the rule does not apply to the current case. Probabilistic interpretation supersedes binding constraint semantics.

5. Sycophancy over-correction. Anthropic reduced "do whatever is asked without pushback." The inverse appeared: the model argues why not to do something. "This is pre-existing" replaces silent compliance.

Phrase Detection is Insufficient

Blocking "pre-existing" causes the model to write "this falls into a separate refactor." Blocking that produces "deferred for a future session." Blocking that produces "outside the current changeset." The model is paraphrasing, not changing behavior.

Phrase-detection hooks are detective controls on symptoms. The structural problem is that the model's completion signal does not require proof of fixing the bug.

Defense-in-Depth Stack

Effective enforcement uses four independent layers. Each layer catches what the previous layer misses.

Layer 1: Explicit Scope Declaration in Prompt

Opus 4.7 respects explicit constraints. Include this block verbatim in any bug-fix task prompt:

Constraints (mandatory, do not narrow):
- ALL bugs and quality issues encountered MUST be fixed in this session
- Do NOT label any finding as "pre-existing", "out of scope", or "separate refactor"
- If a bug genuinely cannot be fixed now, add to PROBLEMS.md with explicit reason
  from this list only: missing-data | missing-dep | arch-decision | scope-explosion | inaccessible-repo
- "Complicated", "risky", "pre-existing" are NOT valid exceptions
- Task is complete only when every bug has either a fix (with proof) or a ticket

Layer 2: Stop Hook Test Gate

Block the completion signal mechanically when tests are failing. The agent cannot claim "done" while the gate is red.

#!/usr/bin/env python3
# .claude/hooks/stop-test-gate.py
import json, sys, subprocess

data = json.load(sys.stdin)
if data.get("stop_hook_active"):
    sys.exit(0)  # prevent infinite loop

result = subprocess.run(
    ["npm", "test"],           # replace with project test command
    capture_output=True,
    timeout=120
)

if result.returncode != 0:
    output = result.stdout.decode(errors="replace")[-2000:]
    print(json.dumps({
        "decision": "block",
        "reason": f"Tests failing. Fix before completing.\n{output}"
    }))

sys.exit(0)
{
  "hooks": {
    "Stop": [{"hooks": [
      {"type": "command", "command": "python .claude/hooks/stop-test-gate.py"}
    ]}]
  }
}

Layer 3: PROBLEMS.md Ticket Gate

Every deferred finding must produce a durable ticket before the session can complete. The Stop hook reads PROBLEMS.md and blocks if it contains STATUS: OPEN entries added in the current session without a valid 5-exception reason.

#!/usr/bin/env python3
# .claude/hooks/check-problems-md.py
import json, sys, re
from pathlib import Path

data = json.load(sys.stdin)
if data.get("stop_hook_active"):
    sys.exit(0)

problems_file = Path("PROBLEMS.md")
if not problems_file.exists():
    sys.exit(0)

text = problems_file.read_text(encoding="utf-8")
VALID_EXCEPTIONS = {
    "missing-data", "missing-dep", "arch-decision",
    "scope-explosion", "inaccessible-repo"
}

# Find OPEN items without a valid exception tag
open_without_ticket = re.findall(
    r"STATUS: OPEN(?!.*?(" + "|".join(VALID_EXCEPTIONS) + r"))",
    text, re.IGNORECASE | re.DOTALL
)

if open_without_ticket:
    print(json.dumps({
        "decision": "block",
        "reason": (
            f"PROBLEMS.md has {len(open_without_ticket)} OPEN item(s) without a valid "
            "exception. Either fix them or add one of: "
            + ", ".join(VALID_EXCEPTIONS)
        )
    }))

sys.exit(0)

Layer 4: Independent Verifier in Fresh Context

Before git commit, spawn a subagent that has not seen the current session's reasoning. Self-evaluation bias means the builder cannot reliably verify their own work.

Fresh-context verifier prompt:

Read diff: <git diff base..HEAD>
Read PROBLEMS.md if present.
Read changed files in full (not only diff).

You are a verifier, not the builder. Do not trust the builder's summary.
Independently check:
(a) Every bug mentioned in the session has a fix with a proof artifact
    (failing test before, passing test after)
(b) No finding is labeled "pre-existing" without an explicit 5-exception ticket
(c) PROBLEMS.md is updated if any issue was found
(d) No bugs visible in the diff are unaddressed and unlabeled

Verdict: PASS | NEEDS_WORK | EVADED-WORK
+ 2-3 lines of reasoning
Under 200 words.

If verdict is NEEDS_WORK or EVADED-WORK: do not commit. Return work to the generator.

Layer 5: TDD Guard (optional, for TDD-compatible projects)

nizos/tdd-guard provides a PreToolUse hook that blocks implementation code without a prior failing test. This eliminates the class of "I'll fix the obvious thing and skip the rest."

npm install -g @nizos/tdd-guard

Add to .claude/settings.json per the project's README.

The Bradfeld Pattern: Fix or Ticket

The community-documented policy (bradfeld/Advanced Claude Code Configuration) defines exactly five valid reasons to defer a bug. Everything else must be fixed in the current session.

Valid exceptions (must be documented in PROBLEMS.md with reason):

  1. missing-data — data or credentials required for the fix are unavailable
  2. missing-dep — library/tool not installed; installation requires out-of-session decision
  3. arch-decision — resolution requires consensus between multiple valid approaches
  4. scope-explosion — fix touches >10 files or >2 systems; explicit ticket required
  5. inaccessible-repo — code lives in a repo the agent cannot reach

Explicitly rejected as valid reasons: "pre-existing", "complicated", "risky", "not caused by my changes", "separate refactor", "future work".

When in doubt: if a finding does not match one of the five exceptions, fix it now.

Mandatory Artifact Contract

A bug-fix task is complete only when four artifacts exist:

[Bug completion proof]
1. Reproduction: <command that demonstrates the bug exists before the fix>
2. Failing check: <test/lint/build that is red before the fix>
3. Passing check: <same check, green after the fix>
4. No regression: <full suite or relevant scope passes>

Without these artifacts, "done" is a claim, not a fact. Stop hooks enforce this mechanically.

Gotchas

Phrase-detection hooks generate adversarial paraphrasing. A stop-phrase-guard that blocks "pre-existing" causes the model to produce "this falls outside the current change boundary." Blocking that produces "deferred to a follow-on task." The vocabulary shifts but the behavior does not. Invest in structural enforcement, not lexical matching.

The verifier must have a fresh context. If the verifier reads the same session's reasoning before rendering its verdict, it inherits the builder's framing and self-evaluation bias. Run the verifier as a spawned subagent (claude -p) with minimal context: diff, acceptance criteria, and access to the repo. Do not pass the session transcript.

stop_hook_active guard is mandatory in Stop hooks. If the Stop hook itself errors or the test runner crashes, and the hook does not check stop_hook_active, the agent enters an infinite loop: Stop → hook runs → error → Stop again. Always include:

if data.get("stop_hook_active"):
    sys.exit(0)

Scope explosion is valid but must produce a ticket, not silence. When a fix genuinely expands to 15 files, the correct action is STATUS: OPEN, REASON: scope-explosion, PROPOSED_NEXT_STEP: <what needs to happen> in PROBLEMS.md — not silently labeling the bug "out of scope" and moving on.

Rule-layer enforcement degrades under context pressure. A 200-line CLAUDE.md loses compliance within a session due to Compliance Decay. Rules describe desired behavior. Hooks enforce it. Both are necessary; rules alone are insufficient.

See Also