The Problem
LLMs are stateless — every call starts with a blank slate. Without explicit memory management, an agent forgets what the user said three turns ago, can’t reference previous conclusions, and gives inconsistent answers across a session. Naively dumping the entire conversation into every prompt hits the context window limit quickly.
The Solution
Define a Memory interface with Add(), Recent(), and Search(). Concrete implementations swap the storage strategy: InMemoryStore keeps a fixed-size circular buffer of recent entries, EpisodicStore maintains a full append-only log grouped by session, and a VectorStore stub shows how semantic search fits the same interface. The agent calls memory.Recent(n) before each LLM call to hydrate the prompt, and memory.Add() after each turn to record the exchange.
Structure
Hydrating the Prompt
Before each LLM call the agent calls memory.Recent(n) to retrieve the last n turns. The returned slice is formatted into the prompt as conversation history, giving the model context without overwhelming the token budget.
flowchart LR Agent["Agent"] Memory["Memory interface"] InMem["InMemoryStore (circular buffer)"] Episodic["EpisodicStore (session log)"] Vector["VectorStore (semantic search)"] Prompt["PromptBuilder"] LLM["LLM"] Agent -->|"Recent(10)"| Memory Memory --> Prompt Prompt --> LLM LLM -->|"response"| Agent Agent -->|"Add(entry)"| Memory InMem -.->|"implements"| Memory Episodic -.->|"implements"| Memory Vector -.->|"implements"| Memory
Implementation
package main
import "time"
// Role identifies who produced a memory entry.
type Role string
const (
RoleUser Role = "user"
RoleAssistant Role = "assistant"
RoleSystem Role = "system"
)
// MemoryEntry is one recorded turn in the agent's history.
type MemoryEntry struct {
Role Role
Content string
Timestamp time.Time
SessionID string
}
// Memory is a pluggable store for agent conversation history.
type Memory interface {
Add(entry MemoryEntry)
Recent(n int) []MemoryEntry
Search(query string) []MemoryEntry
Clear()
} Real-World Analogy
A doctor’s consultation notes: during the appointment (InMemoryStore), the doctor remembers the last few things discussed. After the visit, notes go into the patient’s full record (EpisodicStore). When a specialist is needed, they search the record by symptom keyword (VectorStore). Each retrieval mode serves a different clinical need from the same underlying record.
Pros and Cons
| Pros | Cons |
|---|---|
Single Memory interface lets you swap backends without changing agent code | Managing token budget across memory and response requires care |
InMemoryStore has zero dependencies for simple use cases | Circular buffer discards old entries — fine for recency, bad for recall |
EpisodicStore enables full session replay for debugging | Growing history increases prompt cost on every turn |
VectorStore enables semantic search across thousands of entries | Embedding and vector search add infrastructure complexity |
Best Practices
- Always bound the context you inject — call
Recent(n)with a fixed n and measure the token count before sending to the LLM. - Use
RWMutexin all memory implementations — agents that run tools concurrently will read and write simultaneously. - Tag each
MemoryEntrywithSessionIDfrom the start; retrofitting session isolation into a flat store is painful. - Summarize old episodes periodically rather than discarding them — a summary entry preserves semantics without burning tokens.
- Write a
Search()implementation even if it’s just keyword matching initially; the interface makes upgrading to semantic search a drop-in replacement.
When to Use
- Any multi-turn conversational agent that needs to reference earlier messages.
- Long-running assistants that persist across sessions.
- Agents that must recall specific facts from a large history without re-sending everything.
When NOT to Use
- Single-turn request/response agents with no conversational state.
- Workflows where the full context always fits in one LLM call and memory management adds unnecessary overhead.