Skip to content

Autonomous Agent Evolution

Intermediate

Replacing fixed evolutionary search (agents as stateless workers) with long-lived autonomous agents that control the entire search process: what to retrieve, when to evaluate, what to retain. Key innovation: workspace isolation + shared knowledge layer + periodic reflection.

Core Architecture

Isolated Workspaces

Each agent operates in a separate workspace (git worktree, container, or directory) to prevent interference. Agents can run different experiments in parallel without merge conflicts.

agent-0/          # git worktree for agent 0
agent-1/          # git worktree for agent 1
agent-2/          # git worktree for agent 2
.shared/          # shared knowledge (symlinked into each workspace)
  attempts/       # historical evaluations indexed by commit hash
  notes/          # markdown observations (hierarchical)
  skills/         # reusable procedures and scripts
# Setup per-agent worktrees from a base repo
git worktree add agent-0 -b agent-0-branch
git worktree add agent-1 -b agent-1-branch
# Symlink shared knowledge into each
ln -s $(pwd)/.shared agent-0/.shared
ln -s $(pwd)/.shared agent-1/.shared

Why worktrees over branches: agents need simultaneous filesystem access. Branch switching would serialize work. Worktrees give each agent a full working copy while sharing the git object store.

Shared Knowledge Layer

Three artifact types, all readable by every agent:

Artifact Format Purpose
attempts/ {commit-hash}.json with score + metadata Prevent re-evaluation of identical solutions
notes/ Hierarchical markdown files Observations, patterns, failed approaches
skills/ Executable scripts + markdown docs Reusable procedures discovered during search
// .shared/attempts/a3f2b1c.json
{
  "commit": "a3f2b1c",
  "agent": 2,
  "score": 0.3809,
  "approach": "greedy interval optimization with symmetry breaking",
  "timestamp": "2026-04-01T14:23:00Z",
  "parent_commit": "b7e4d2a",
  "delta_score": 0.0003
}
<!-- .shared/notes/optimization/symmetry-breaking.md -->
# Symmetry Breaking in Interval Placement

Forcing the first interval to start at 0 eliminates ~50% of the search
space without losing optimal solutions. Confirmed by agents 0 and 2
across 8 independent evaluations.

Related attempts: a3f2b1c, d4e5f6g

Heartbeat Mechanism

Three reflection types at different frequencies prevent tunnel vision:

Per-iteration reflection (every evaluation):

After eval #{n} with score {s}:
1. What did this change accomplish?
2. Was the score change expected?
3. Write 1-2 line observation to .shared/notes/

Periodic consolidation (every ~10 evaluations):

1. Review own progress over last 10 evals
2. Browse other agents' notes in .shared/notes/
3. Organize scattered observations into structured notes
4. Extract reusable patterns into .shared/skills/
5. Identify promising directions from other agents' work

Stagnation redirection (5 consecutive non-improving evaluations):

1. Forced reassessment: "Current approach is not working"
2. Read all recent notes from ALL agents
3. Identify unexplored directions
4. Pivot to fundamentally different approach
5. Log the pivot reason in notes

Without heartbeats, agents fixate on local optima and stop sharing knowledge. The consolidation step is critical - it forces cross-pollination between agents.

Implementation Patterns

File-Based Knowledge Sharing

The simplest approach for Claude Code / coding agent setups:

import json
import glob
from pathlib import Path

SHARED = Path(".shared")

def log_attempt(commit: str, score: float, approach: str, agent_id: int):
    """Log evaluation result to shared knowledge."""
    entry = {
        "commit": commit,
        "agent": agent_id,
        "score": score,
        "approach": approach,
    }
    (SHARED / "attempts" / f"{commit}.json").write_text(json.dumps(entry, indent=2))

def get_best_score() -> float:
    """Read current best from shared attempts."""
    best = 0.0
    for f in glob.glob(str(SHARED / "attempts" / "*.json")):
        data = json.loads(Path(f).read_text())
        best = max(best, data["score"])
    return best

def check_stagnation(agent_id: int, window: int = 5) -> bool:
    """Detect if this agent has stagnated."""
    my_attempts = sorted(
        [json.loads(Path(f).read_text())
         for f in glob.glob(str(SHARED / "attempts" / "*.json"))
         if json.loads(Path(f).read_text())["agent"] == agent_id],
        key=lambda x: x["timestamp"],
    )
    if len(my_attempts) < window:
        return False
    recent = my_attempts[-window:]
    return all(a.get("delta_score", 0) <= 0 for a in recent)

Evaluation Deduplication

Avoid re-running expensive evaluations on identical solutions:

import hashlib

def solution_hash(code: str) -> str:
    """Content-based hash ignoring whitespace/comments."""
    # Strip comments and normalize whitespace
    lines = [l.strip() for l in code.splitlines()
             if l.strip() and not l.strip().startswith("#")]
    return hashlib.sha256("\n".join(lines).encode()).hexdigest()[:12]

def already_evaluated(code: str) -> bool:
    h = solution_hash(code)
    return (SHARED / "attempts" / f"{h}.json").exists()

Comparison with Linear Approaches

Aspect Linear (autoresearch) Parallel Evolution
Agents 1 3-8
Search strategy Sequential keep/discard Parallel diverse exploration
Knowledge sharing Git history only Explicit shared knowledge layer
Stagnation handling Manual Automatic redirection
Reflection Optional Built-in heartbeat
Improvement rate ~9.5% (per eval) ~36.8% (per eval)
Total evaluations needed 84 (for same quality) 19
Cost per run ~$0.10/cycle ~$0.40/cycle (4 agents)
Effective cost/improvement Higher Lower (3-4x)

The parallel approach reaches better solutions with fewer total evaluations because agents explore different directions simultaneously and share discoveries.

Integration with Existing Patterns

With agent design patterns (Reflexion): heartbeat reflection is a formalized version of self-critique applied to the search process itself, not just individual outputs.

With multi agent systems (Shared State): the .shared/ knowledge layer is a concrete implementation of the shared-state communication protocol using the filesystem.

With context engineering: each agent maintains its own context focused on its current exploration direction. The shared knowledge layer acts as external memory, preventing context bloat from carrying all agents' history.

Gotchas

  • File locking on shared writes: multiple agents writing to .shared/ simultaneously can corrupt JSON files. Use atomic writes (write to temp file, then rename) or per-agent subdirectories with periodic merge
  • Note quality degrades without structure: agents generate vague notes ("tried X, didn't work") unless the heartbeat prompt explicitly requires structured observations with scores and hypotheses. Template the note format
  • Stagnation detection threshold matters: too sensitive (2-3 evals) causes premature pivots away from promising directions. Too loose (10+ evals) wastes compute. 5 consecutive non-improving evals is a reasonable default but should be tuned per task complexity
  • Shared skills can propagate bad patterns: if one agent writes a flawed skill to .shared/skills/, others will adopt it. Add a minimum score threshold before promoting observations to skills

See Also