Knowledge Graph Memory for AI Agents¶
Structured knowledge graphs as persistent agent memory - beyond flat note files. The agent continuously extracts entities from incoming data (email, meetings, transcripts), resolves them to canonical records, and maintains a bidirectionally linked Obsidian-compatible vault.
Reference implementation: Rowboat (YC S24, Apache 2.0).
Architecture: Two-Layer Storage¶
raw/ - Append-only sources (email, transcripts, voice memos)
- Never modified by agent, only read
knowledge/
People/ - Person entities with roles, orgs, emails
Organizations/
Projects/ - Status, stakeholders, timeline
Topics/ - Keywords, related notes
Meetings/ - Attendees, decisions, action items
Agent Notes/ - User model (identity, preferences, style)
Today.md - Auto-generated daily brief (inline task)
Core invariant: raw data is immutable source of truth; knowledge/ is the agent's processed, cross-linked world model.
Three-Stage Pipeline¶
Stage 1: Signal Filtering (for email/feeds)¶
Before building the graph, filter noise. LLM applies YAML frontmatter to each source item:
---
labels:
relationship: [investor, customer, team]
topics: [fundraising, hiring, legal]
type: "intro" | "followup" | ""
filter: [cold-outreach, newsletter, spam]
action: "action-required" | "urgent" | "waiting" | ""
processed: true
labeled_at: "2026-04-11T10:00:00Z"
---
Noise-first principle: identify what to SKIP before deciding what to keep. If filter array is non-empty, the item is excluded from graph building regardless of other tags.
Strictness levels control how aggressively notes are created: - high - only explicit interactions create notes; emails only update existing contacts - medium - personalized business emails create notes; filters consumer services - low - most human senders, minimal filtering
Stage 2: Graph Building¶
Processes filtered sources into entity notes. Before each batch:
- Rebuild knowledge index - scan all
knowledge/.md files, extract entities - Inject index into prompt - agent receives structured table of known entities
- Process batch - agent writes/updates notes, resolves to canonical names
Change detection uses hybrid approach:
Eliminates false positives from timestamp-only changes.Stage 3: Tagging¶
Applies categorical tags to generated knowledge notes. Tag categories: relationship (12), topic (11), email-type (2), noise (13), action (3), status (5), source (6).
Entity Resolution¶
Critical pattern: before processing each batch, rebuild an in-memory index of all known entities:
interface KnowledgeIndex {
people: { file, name, email?, aliases[], organization?, role? }[]
organizations: { file, name, domain?, aliases[] }[]
projects: { file, name, status?, aliases[] }[]
topics: { file, name, keywords[], aliases[] }[]
}
The agent receives this as a formatted markdown table. When it encounters "John Smith" in source data, it resolves to the existing [[People/John Smith]] rather than creating a duplicate.
Without this: agents create John Smith.md, J. Smith.md, Smith, John.md for the same person.
Note Templates¶
People Note¶
# Person Name
## Info
- **Role:**
- **Organization:** [[Organizations/OrgName]]
- **Email:**
## Summary
## Connected to
## Activity
## Key facts
## Open items
Wiki-Link Conventions¶
- Syntax:
[[Folder/Canonical Name]](absolute path within knowledge/) - Always bidirectional: if Person links to Org, Org must link back to Person
- Agent resolves name variants to canonical names via the knowledge index before writing links
Agent Notes - Structured User Model¶
Separate knowledge/Agent Notes/ directory for the AI's understanding of the user:
Agent Notes/
user.md - identity facts only (roles, companies, location)
preferences.md - explicit rules ("no meetings before 11am")
style/
email.md - writing patterns by recipient type
Data sources: user's sent messages, inbox items (manual notes), conversation logs.
Processing: deduplicates aggressively, timestamps facts for staleness detection.
This is distinct from the main knowledge graph - it's the agent's model of WHO it works for, not WHAT it knows about the world.
Inline Tasks (Live Notes)¶
Executable JSON blocks embedded in Markdown. Notes with live_note: true frontmatter are polled on a schedule:
---
live_note: true
---
# Today
```task
{
"instruction": "Create a daily brief for me",
"schedule": { "type": "cron", "expression": "*/15 * * * *" },
"lastRunAt": "2026-04-11T10:30:00Z",
"targetId": "dailybrief"
}
Results written here automatically
Schedule types: `cron`, `window` (time-constrained execution), `one-shot`.
**Effect:** static knowledge files become active agents. The graph is not just a store - it runs.
## Version History
Git-track the knowledge vault using isomorphic-git:
- Only `.md` files are tracked
- Commit after each processing batch
- Full restoration from any commit
- Audit trail of knowledge evolution
```typescript
// After each batch completes
await git.commit({
dir: knowledgeDir,
message: `Processed batch: ${files.join(', ')}`,
author: { name: 'agent', email: 'agent@local' }
});
Scaling Patterns¶
| Scale | Retrieval Strategy |
|---|---|
| <200 notes | Index file navigation only |
| 200-1000 | Domain-specific indexes + grep search |
| 1000-5000 | BM25 with LLM reranking |
| 5000+ | Full rag pipeline + vector databases |
For moderate scale (the common case), a structured index + Unix tools (grep, find) outperforms vector search in latency and cost.
Applying These Patterns Without the Full Stack¶
Entity Resolution for File-Based Memory¶
Before creating new memory entries, scan existing files for the same entity:
def resolve_entity(name: str, memory_dir: Path) -> Optional[Path]:
"""Find existing file for entity before creating duplicate."""
for f in memory_dir.rglob("*.md"):
content = f.read_text()
if name.lower() in content.lower():
# Check aliases in frontmatter or body
return f
return None
Noise-First Filtering for Any Ingestion¶
Define explicit skip rules before extract rules:
SKIP_PATTERNS = ["newsletter", "no-reply", "unsubscribe", "automated"]
def should_skip(source: dict) -> bool:
"""Noise-first: skip known noise categories first."""
return any(p in str(source).lower() for p in SKIP_PATTERNS)
Two-Layer Raw/Processed Separation¶
sessions/raw/ - immutable session transcripts, never modified
memory/ - extracted, consolidated knowledge
- derived from raw but not tied to it
Allows re-processing raw data with improved extraction logic without data loss.
Gotchas¶
- Entity resolution must run BEFORE each batch, not once at startup. The knowledge graph grows during processing; a batch at step 5 may need entities created in step 2. Rebuild the index before every batch, not just on initialization
- Skip tags must be hard filters. If noise filtering is advisory (weighted), spam leaks into the graph over time. Make skip/filter tags override all create signals - a cold-outreach email never creates a person note, regardless of relationship tags
- Wiki-links without bidirectional enforcement diverge. One-way links create orphaned nodes that the agent can't navigate to. Enforce bidirectionality at write time: when agent writes
[[People/Alice]]in an org note, check that Alice's note links back to that org - Inline tasks in live notes require polling, not file watchers. File system events don't carry schedule context. A polling loop (every 15s) reading
live_note: truefrontmatter is more reliable across OS platforms than inotify/FSEvents