Skip to content

Forgetting Strategies

Intermediate

When and how to remove information from LLM agent memory. Unbounded memory growth degrades retrieval quality - irrelevant results dilute relevant ones. Strategic forgetting is as important as strategic remembering.

Key Facts

  • Memory without pruning degrades retrieval precision over time - more entries means more false positives
  • "Forgetting" usually means archival (move to cold storage), not deletion
  • Token cost scales with memory size - even if retrieval is fast, injecting too much context is expensive
  • Compaction (summarization) is a form of lossy forgetting - details are permanently lost
  • The optimal memory size depends on retrieval method: index navigation degrades at ~500 entries, vector search degrades more gradually
  • Never delete data that hasn't been archived first

Forgetting Mechanisms

Mechanism What It Does Reversible Best For
TTL (Time-to-Live) Auto-expire after fixed period No (unless archived) Temporary state, session data
Relevance scoring Score entries, prune lowest Yes (if archived) Large memory with mixed quality
Compaction Summarize, replace details No Conversation history
Archival Move to cold storage Yes Old but potentially useful data
Deduplication Merge redundant entries Partially Overlapping information
Explicit deletion Remove on user request No Privacy, corrections

TTL-Based Expiry

Assign a time-to-live based on the type of information:

TTL_CONFIG = {
    "session_state": timedelta(hours=24),       # task-specific, short-lived
    "project_status": timedelta(days=30),        # changes monthly
    "tool_output": timedelta(days=7),            # raw outputs are transient
    "decision": timedelta(days=365),             # decisions have long relevance
    "preference": timedelta(days=730),           # preferences are stable
    "identity": None,                            # never expires
}

def should_expire(entry: MemoryEntry, now: datetime) -> bool:
    ttl = TTL_CONFIG.get(entry.type)
    if ttl is None:
        return False
    return (now - entry.created_at) > ttl

Don't delete on expiry - archive. Move expired entries to a separate store. If they're needed later, they can be retrieved from archive.

Relevance Scoring

Score each memory entry by utility. Factors:

def relevance_score(entry: MemoryEntry, now: datetime) -> float:
    # Recency: newer is more relevant
    age_days = (now - entry.created_at).days
    recency = 1.0 / (1 + age_days / 30)  # 30-day half-life

    # Access frequency: more-retrieved is more relevant
    access_freq = min(entry.access_count / 10, 1.0)

    # Freshness: recently accessed is more relevant
    last_access_days = (now - entry.last_accessed).days
    freshness = 1.0 / (1 + last_access_days / 7)

    # Type weight: some types are inherently more valuable
    type_weights = {
        "decision": 1.0,
        "gotcha": 0.9,
        "preference": 0.8,
        "finding": 0.7,
        "observation": 0.5,
        "tool_output": 0.3,
    }
    type_weight = type_weights.get(entry.type, 0.5)

    return (0.3 * recency + 0.3 * access_freq + 0.2 * freshness + 0.2 * type_weight)

Prune entries below a threshold. Suggested: archive entries scoring < 0.2.

Compaction

Lossy summarization of detailed history into compressed form. See context window management for conversation compaction specifics.

def compact_memory_topic(topic: str, entries: list[MemoryEntry]) -> MemoryEntry:
    """Replace multiple entries with one summary."""
    full_text = "\n".join(e.text for e in entries)

    summary = llm.complete(f"""Summarize these {len(entries)} memory entries about '{topic}'.
Preserve: key decisions, current state, gotchas, preferences.
Discard: intermediate steps, superseded information, raw outputs.

{full_text}""")

    return MemoryEntry(
        text=summary,
        type="compacted_summary",
        created_at=max(e.created_at for e in entries),
        metadata={"compacted_from": len(entries), "topic": topic}
    )

Warning: Compaction is irreversible. Archive the original entries before compacting.

Archival

Move old entries from active memory to cold storage:

def archive_old_entries(active_store, archive_store, max_age_days: int = 90):
    now = datetime.utcnow()
    to_archive = []

    for entry in active_store.all():
        if should_archive(entry, now, max_age_days):
            to_archive.append(entry)

    for entry in to_archive:
        archive_store.add(entry)
        active_store.remove(entry.id)

    return len(to_archive)

def should_archive(entry: MemoryEntry, now: datetime, max_age: int) -> bool:
    # Don't archive: identity, active decisions, high-relevance entries
    if entry.type in ("identity", "active_decision"):
        return False
    if relevance_score(entry, now) > 0.5:
        return False
    return (now - entry.last_accessed).days > max_age

Active memory stays fast and focused. Archive is searchable but with higher latency.

Deduplication

Multiple conversations often produce overlapping memories:

def deduplicate(entries: list[MemoryEntry], similarity_threshold: float = 0.92):
    """Merge near-duplicate entries, keeping the most recent."""
    unique = []
    for entry in sorted(entries, key=lambda e: e.created_at, reverse=True):
        is_dup = False
        for existing in unique:
            sim = cosine_similarity(
                embed(entry.text), embed(existing.text)
            )
            if sim > similarity_threshold:
                # Keep existing (more recent), update metadata
                existing.metadata["also_from"] = entry.id
                is_dup = True
                break
        if not is_dup:
            unique.append(entry)
    return unique

Memory Size Targets

Memory Type Target Size Pruning Trigger
Identity (L0) <50 tokens Never prune
Critical facts (L1) <200 tokens Manual review only
Active project memory <5K tokens When project completes
Historical memory <50K tokens Monthly archival
Archive Unlimited Annual review

Patterns

Graduated Retention

0-7 days:   Keep everything (verbatim)
7-30 days:  Keep decisions, gotchas, preferences. Archive raw outputs
30-90 days: Compact findings into summaries. Archive details
90+ days:   Archive to cold storage. Keep only identity + decisions

Access-Triggered Preservation

def on_memory_access(entry_id: str):
    """Every time an entry is retrieved, extend its life."""
    entry = store.get(entry_id)
    entry.last_accessed = datetime.utcnow()
    entry.access_count += 1
    store.update(entry)
    # Frequently accessed entries never get archived

Gotchas

  • Forgetting without archival is data destruction. Always archive before pruning. The cost of storing old data in cold storage is negligible compared to the cost of losing information that turns out to be needed later
  • Compaction accumulates information loss. If you compact a compacted summary, quality degrades exponentially. Track compaction depth and never compact more than twice. If memory is too large after two compactions, archive instead
  • Deduplication threshold is tricky to tune. Too aggressive (>0.85 similarity = duplicate) merges genuinely different entries. Too conservative (<0.95) leaves redundancy. Start at 0.92 and adjust based on false positive rate. Always log what was merged for review

See Also