Forgetting Strategies¶
When and how to remove information from LLM agent memory. Unbounded memory growth degrades retrieval quality - irrelevant results dilute relevant ones. Strategic forgetting is as important as strategic remembering.
Key Facts¶
- Memory without pruning degrades retrieval precision over time - more entries means more false positives
- "Forgetting" usually means archival (move to cold storage), not deletion
- Token cost scales with memory size - even if retrieval is fast, injecting too much context is expensive
- Compaction (summarization) is a form of lossy forgetting - details are permanently lost
- The optimal memory size depends on retrieval method: index navigation degrades at ~500 entries, vector search degrades more gradually
- Never delete data that hasn't been archived first
Forgetting Mechanisms¶
| Mechanism | What It Does | Reversible | Best For |
|---|---|---|---|
| TTL (Time-to-Live) | Auto-expire after fixed period | No (unless archived) | Temporary state, session data |
| Relevance scoring | Score entries, prune lowest | Yes (if archived) | Large memory with mixed quality |
| Compaction | Summarize, replace details | No | Conversation history |
| Archival | Move to cold storage | Yes | Old but potentially useful data |
| Deduplication | Merge redundant entries | Partially | Overlapping information |
| Explicit deletion | Remove on user request | No | Privacy, corrections |
TTL-Based Expiry¶
Assign a time-to-live based on the type of information:
TTL_CONFIG = {
"session_state": timedelta(hours=24), # task-specific, short-lived
"project_status": timedelta(days=30), # changes monthly
"tool_output": timedelta(days=7), # raw outputs are transient
"decision": timedelta(days=365), # decisions have long relevance
"preference": timedelta(days=730), # preferences are stable
"identity": None, # never expires
}
def should_expire(entry: MemoryEntry, now: datetime) -> bool:
ttl = TTL_CONFIG.get(entry.type)
if ttl is None:
return False
return (now - entry.created_at) > ttl
Don't delete on expiry - archive. Move expired entries to a separate store. If they're needed later, they can be retrieved from archive.
Relevance Scoring¶
Score each memory entry by utility. Factors:
def relevance_score(entry: MemoryEntry, now: datetime) -> float:
# Recency: newer is more relevant
age_days = (now - entry.created_at).days
recency = 1.0 / (1 + age_days / 30) # 30-day half-life
# Access frequency: more-retrieved is more relevant
access_freq = min(entry.access_count / 10, 1.0)
# Freshness: recently accessed is more relevant
last_access_days = (now - entry.last_accessed).days
freshness = 1.0 / (1 + last_access_days / 7)
# Type weight: some types are inherently more valuable
type_weights = {
"decision": 1.0,
"gotcha": 0.9,
"preference": 0.8,
"finding": 0.7,
"observation": 0.5,
"tool_output": 0.3,
}
type_weight = type_weights.get(entry.type, 0.5)
return (0.3 * recency + 0.3 * access_freq + 0.2 * freshness + 0.2 * type_weight)
Prune entries below a threshold. Suggested: archive entries scoring < 0.2.
Compaction¶
Lossy summarization of detailed history into compressed form. See context window management for conversation compaction specifics.
def compact_memory_topic(topic: str, entries: list[MemoryEntry]) -> MemoryEntry:
"""Replace multiple entries with one summary."""
full_text = "\n".join(e.text for e in entries)
summary = llm.complete(f"""Summarize these {len(entries)} memory entries about '{topic}'.
Preserve: key decisions, current state, gotchas, preferences.
Discard: intermediate steps, superseded information, raw outputs.
{full_text}""")
return MemoryEntry(
text=summary,
type="compacted_summary",
created_at=max(e.created_at for e in entries),
metadata={"compacted_from": len(entries), "topic": topic}
)
Warning: Compaction is irreversible. Archive the original entries before compacting.
Archival¶
Move old entries from active memory to cold storage:
def archive_old_entries(active_store, archive_store, max_age_days: int = 90):
now = datetime.utcnow()
to_archive = []
for entry in active_store.all():
if should_archive(entry, now, max_age_days):
to_archive.append(entry)
for entry in to_archive:
archive_store.add(entry)
active_store.remove(entry.id)
return len(to_archive)
def should_archive(entry: MemoryEntry, now: datetime, max_age: int) -> bool:
# Don't archive: identity, active decisions, high-relevance entries
if entry.type in ("identity", "active_decision"):
return False
if relevance_score(entry, now) > 0.5:
return False
return (now - entry.last_accessed).days > max_age
Active memory stays fast and focused. Archive is searchable but with higher latency.
Deduplication¶
Multiple conversations often produce overlapping memories:
def deduplicate(entries: list[MemoryEntry], similarity_threshold: float = 0.92):
"""Merge near-duplicate entries, keeping the most recent."""
unique = []
for entry in sorted(entries, key=lambda e: e.created_at, reverse=True):
is_dup = False
for existing in unique:
sim = cosine_similarity(
embed(entry.text), embed(existing.text)
)
if sim > similarity_threshold:
# Keep existing (more recent), update metadata
existing.metadata["also_from"] = entry.id
is_dup = True
break
if not is_dup:
unique.append(entry)
return unique
Memory Size Targets¶
| Memory Type | Target Size | Pruning Trigger |
|---|---|---|
| Identity (L0) | <50 tokens | Never prune |
| Critical facts (L1) | <200 tokens | Manual review only |
| Active project memory | <5K tokens | When project completes |
| Historical memory | <50K tokens | Monthly archival |
| Archive | Unlimited | Annual review |
Patterns¶
Graduated Retention¶
0-7 days: Keep everything (verbatim)
7-30 days: Keep decisions, gotchas, preferences. Archive raw outputs
30-90 days: Compact findings into summaries. Archive details
90+ days: Archive to cold storage. Keep only identity + decisions
Access-Triggered Preservation¶
def on_memory_access(entry_id: str):
"""Every time an entry is retrieved, extend its life."""
entry = store.get(entry_id)
entry.last_accessed = datetime.utcnow()
entry.access_count += 1
store.update(entry)
# Frequently accessed entries never get archived
Gotchas¶
- Forgetting without archival is data destruction. Always archive before pruning. The cost of storing old data in cold storage is negligible compared to the cost of losing information that turns out to be needed later
- Compaction accumulates information loss. If you compact a compacted summary, quality degrades exponentially. Track compaction depth and never compact more than twice. If memory is too large after two compactions, archive instead
- Deduplication threshold is tricky to tune. Too aggressive (>0.85 similarity = duplicate) merges genuinely different entries. Too conservative (<0.95) leaves redundancy. Start at 0.92 and adjust based on false positive rate. Always log what was merged for review