Memory in Go

The Problem

LLMs are stateless — every call starts with a blank slate. Without explicit memory management, an agent forgets what the user said three turns ago, can’t reference previous conclusions, and gives inconsistent answers across a session. Naively dumping the entire conversation into every prompt hits the context window limit quickly.

The Solution

Define a Memory interface with Add(), Recent(), and Search(). Concrete implementations swap the storage strategy: InMemoryStore keeps a fixed-size circular buffer of recent entries, EpisodicStore maintains a full append-only log grouped by session, and a VectorStore stub shows how semantic search fits the same interface. The agent calls memory.Recent(n) before each LLM call to hydrate the prompt, and memory.Add() after each turn to record the exchange.

Structure

Memory Pattern

Step 1 of 4

Hydrating the Prompt

Before each LLM call the agent calls memory.Recent(n) to retrieve the last n turns. The returned slice is formatted into the prompt as conversation history, giving the model context without overwhelming the token budget.

flowchart LR
Agent["Agent"]
Memory["Memory interface"]
InMem["InMemoryStore
(circular buffer)"]
Episodic["EpisodicStore
(session log)"]
Vector["VectorStore
(semantic search)"]
Prompt["PromptBuilder"]
LLM["LLM"]

Agent -->|"Recent(10)"| Memory
Memory --> Prompt
Prompt --> LLM
LLM -->|"response"| Agent
Agent -->|"Add(entry)"| Memory
InMem -.->|"implements"| Memory
Episodic -.->|"implements"| Memory
Vector -.->|"implements"| Memory

Implementation

package main

import "time"

// Role identifies who produced a memory entry.
type Role string

const (
	RoleUser      Role = "user"
	RoleAssistant Role = "assistant"
	RoleSystem    Role = "system"
)

// MemoryEntry is one recorded turn in the agent's history.
type MemoryEntry struct {
	Role      Role
	Content   string
	Timestamp time.Time
	SessionID string
}

// Memory is a pluggable store for agent conversation history.
type Memory interface {
	Add(entry MemoryEntry)
	Recent(n int) []MemoryEntry
	Search(query string) []MemoryEntry
	Clear()
}

package main

import (
	"strings"
	"sync"
	"time"
)

// InMemoryStore keeps the N most recent entries in a circular buffer.
type InMemoryStore struct {
	mu      sync.RWMutex
	entries []MemoryEntry
	maxSize int
}

func NewInMemoryStore(maxSize int) *InMemoryStore {
	return &InMemoryStore{maxSize: maxSize, entries: make([]MemoryEntry, 0, maxSize)}
}

func (s *InMemoryStore) Add(entry MemoryEntry) {
	s.mu.Lock()
	defer s.mu.Unlock()
	if len(s.entries) >= s.maxSize {
		s.entries = s.entries[1:]
	}
	s.entries = append(s.entries, entry)
}

func (s *InMemoryStore) Recent(n int) []MemoryEntry {
	s.mu.RLock()
	defer s.mu.RUnlock()
	if n >= len(s.entries) {
		return append([]MemoryEntry(nil), s.entries...)
	}
	return append([]MemoryEntry(nil), s.entries[len(s.entries)-n:]...)
}

// Search does a naive keyword match — a VectorMemory would do semantic search.
func (s *InMemoryStore) Search(query string) []MemoryEntry {
	s.mu.RLock()
	defer s.mu.RUnlock()
	var matches []MemoryEntry
	for _, e := range s.entries {
		if strings.Contains(strings.ToLower(e.Content), strings.ToLower(query)) {
			matches = append(matches, e)
		}
	}
	return matches
}

func (s *InMemoryStore) Clear() {
	s.mu.Lock()
	defer s.mu.Unlock()
	s.entries = s.entries[:0]
}

// EpisodicStore appends every entry and groups them by session.
type EpisodicStore struct {
	mu      sync.RWMutex
	entries []MemoryEntry
}

func (e *EpisodicStore) Add(entry MemoryEntry) {
	if entry.Timestamp.IsZero() {
		entry.Timestamp = time.Now()
	}
	e.mu.Lock()
	defer e.mu.Unlock()
	e.entries = append(e.entries, entry)
}

func (e *EpisodicStore) Recent(n int) []MemoryEntry {
	e.mu.RLock()
	defer e.mu.RUnlock()
	if n >= len(e.entries) {
		return append([]MemoryEntry(nil), e.entries...)
	}
	return append([]MemoryEntry(nil), e.entries[len(e.entries)-n:]...)
}

func (e *EpisodicStore) Search(query string) []MemoryEntry {
	e.mu.RLock()
	defer e.mu.RUnlock()
	var out []MemoryEntry
	for _, entry := range e.entries {
		if strings.Contains(strings.ToLower(entry.Content), strings.ToLower(query)) {
			out = append(out, entry)
		}
	}
	return out
}

func (e *EpisodicStore) Clear() {
	e.mu.Lock()
	defer e.mu.Unlock()
	e.entries = e.entries[:0]
}

package main

import (
	"fmt"
	"time"
)

func main() {
	mem := NewInMemoryStore(100)

	// Simulate a conversation turn: record user message then assistant reply.
	mem.Add(MemoryEntry{Role: RoleUser, Content: "What is a goroutine?", Timestamp: time.Now()})
	mem.Add(MemoryEntry{Role: RoleAssistant, Content: "A goroutine is a lightweight thread managed by the Go runtime.", Timestamp: time.Now()})
	mem.Add(MemoryEntry{Role: RoleUser, Content: "How many can I run at once?", Timestamp: time.Now()})

	// Hydrate the next LLM prompt with the last 2 turns.
	recent := mem.Recent(2)
	fmt.Println("Context for next LLM call:")
	for _, e := range recent {
		fmt.Printf("  [%s] %s\n", e.Role, e.Content)
	}

	// Search for relevant past entries.
	matches := mem.Search("goroutine")
	fmt.Printf("\nFound %d entries mentioning 'goroutine'\n", len(matches))
	// Output:
	// Context for next LLM call:
	//   [assistant] A goroutine is a lightweight thread managed by the Go runtime.
	//   [user] How many can I run at once?
	// Found 1 entries mentioning 'goroutine'
}

Real-World Analogy

A doctor’s consultation notes: during the appointment (InMemoryStore), the doctor remembers the last few things discussed. After the visit, notes go into the patient’s full record (EpisodicStore). When a specialist is needed, they search the record by symptom keyword (VectorStore). Each retrieval mode serves a different clinical need from the same underlying record.

Pros and Cons

Pros	Cons
Single `Memory` interface lets you swap backends without changing agent code	Managing token budget across memory and response requires care
`InMemoryStore` has zero dependencies for simple use cases	Circular buffer discards old entries — fine for recency, bad for recall
`EpisodicStore` enables full session replay for debugging	Growing history increases prompt cost on every turn
`VectorStore` enables semantic search across thousands of entries	Embedding and vector search add infrastructure complexity

Best Practices

Always bound the context you inject — call Recent(n) with a fixed n and measure the token count before sending to the LLM.
Use RWMutex in all memory implementations — agents that run tools concurrently will read and write simultaneously.
Tag each MemoryEntry with SessionID from the start; retrofitting session isolation into a flat store is painful.
Summarize old episodes periodically rather than discarding them — a summary entry preserves semantics without burning tokens.
Write a Search() implementation even if it’s just keyword matching initially; the interface makes upgrading to semantic search a drop-in replacement.

When to Use

Any multi-turn conversational agent that needs to reference earlier messages.
Long-running assistants that persist across sessions.
Agents that must recall specific facts from a large history without re-sending everything.

When NOT to Use

Single-turn request/response agents with no conversational state.
Workflows where the full context always fits in one LLM call and memory management adds unnecessary overhead.

Memento in Go Capture and restore object state without exposing internal details so undo and rollback remain explicit and controlled. Proxy in Go Control access to another object by standing in front of it to add caching, authorization, rate limiting, or lazy loading. Retrieval-Augmented Generation in Go Ground LLM responses in authoritative documents by retrieving relevant chunks before generation so the model can cite real data instead of hallucinating.