Mnemosyne vs Letta (MemGPT)

An honest, technical comparison for users running memory systems locally with Hermes Agent and OpenClaw.

Last updated: 2026-05-13 · Mnemosyne v2.8.0

TL;DR: Letta is a research-grade agent framework with OS-inspired virtual context management, self-editing memory, and a 42-table PostgreSQL schema. Mnemosyne is a lightweight memory layer with 3 SQLite tables, in-process operation, and near-zero infrastructure. Letta is what you deploy when context windows are the bottleneck; Mnemosyne is what you use when you want memory that just works without standing up a database cluster.

Architecture

Dimension	Mnemosyne	Letta Self-Hosted
Process model	In-process Python library	Separate Docker containers (Letta server + PostgreSQL + pgvector)
IPC overhead	Zero (direct function calls)	REST API + JSON serialization to Letta server
Database	SQLite (single file, WAL mode)	PostgreSQL + pgvector (42-table schema, Alembic migrations)
Embedding model	fastembed ONNX -- BAAI/bge-small-en-v1.5 (~67MB)	Configurable -- OpenAI, HuggingFace, or local embedding endpoints
Vector search	sqlite-vec (int8/bit/float32) or numpy fallback	pgvector HNSW (mature, optimized)
Cold start	Instant (if models cached locally)	~10–30s (Docker container boot + PostgreSQL init + agent creation)
Runtime memory	~10–20MB per session (SQLite + ONNX)	~200–500MB (PostgreSQL pool + Letta server + agent runtime)
Stars / community	New (v2.8.0)	~21.8K GitHub stars, active research community
Pricing	Free (MIT)	OSS free (Letta Code), Cloud API $20–$750/mo + usage

Memory Model

Mnemosyne: BEAM (Bilevel Episodic-Associative Memory)

Three SQLite tables:

Tier	Purpose	Behavior
Working memory	Hot, recent context	Auto-injected into prompts. TTL-based eviction (default 24h). Max 10,000 items. FTS5 indexed.
Episodic memory	Long-term consolidated storage	Populated by `sleep()` consolidation. Hybrid vector + FTS5 search.
Scratchpad	Temporary agent workspace	Not searchable, not consolidated. Cleared explicitly. Max 1,000 items.

Additional: TripleStore -- temporal knowledge graph with valid_from/valid_until for point-in-time queries.

Core operations: remember(), recall(), sleep() -- intentionally simple.

Letta: OS-Inspired Virtual Context Management

Letta models an agent's memory like an operating system, with two distinct tiers and active context paging:

Tier	Letta name	OS analogy	Behavior
Main context	Core memory	RAM	Fixed-size block injected into every LLM call. Agent self-edits via `core_memory_append` and `core_memory_replace`. Limited to the model's context window.
External context	Archival memory	Disk	Unlimited long-term storage. Agent searches via `archival_memory_insert` and `archival_memory_search`. Text + embedding pairs stored in pgvector.
Context paging	MemGPT paging	Virtual memory	When context fills up, Letta autonomously pages data between core and archival memory. The agent decides what to evict, what to keep, and when to fetch from "disk."

Core operations: core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search -- the agent itself calls these as tool functions.

Key difference: Mnemosyne's sleep() is an explicit consolidation step called by the developer. Letta's paging is autonomous -- the agent decides when and how to manage its own context, which is powerful but requires a capable frontier model to do well.

Context Management: Letta's Defining Feature

This is where Letta is genuinely unique among memory systems. No other open-source memory layer provides OS-style virtual context paging with autonomous agent-driven memory management.

Feature	Mnemosyne	Letta
Self-editing memory	No -- `sleep()` is developer-triggered, agent writes to memory via tools but cannot restructure its own memory layout	Yes -- agent calls `core_memory_replace` to rewrite its own working context, `archival_memory_insert` to persist to long-term storage
Context overflow handling	TTL-based eviction on working memory. Developer controls what stays via `max_items`.	Autonomous paging: agent detects context is full, decides what to evict to archival, fetches relevant archival data back into core context as needed
Unlimited context illusion	No -- working memory is bounded. Episodic memory is searchable but not context-aware.	Yes -- from the LLM's perspective, Letta agents appear to have unlimited context because they page data in and out autonomously
Model requirements	Works with any local or remote model	Best results require GPT-4-class frontier models. Local models struggle with autonomous memory management.

Honest assessment: Letta's virtual context management is impressive research (backed by the MemGPT paper). For workloads where context windows are the primary bottleneck -- long-form writing, research agents, multi-session conversations -- it solves a real problem. But it comes at a cost: complex infrastructure and expensive model requirements.

Retrieval

Feature	Mnemosyne	Letta Self-Hosted
Vector search	sqlite-vec (cosine distance)	pgvector HNSW (via archival_memory_search)
Keyword search	SQLite FTS5	PostgreSQL full-text (passage-based, embedded in archival_memory_search)
Graph search	TripleStore (subject-predicate-object, temporal)	No native knowledge graph
Temporal search	`temporal_weight` + `temporal_halflife` params on `recall()`	No native temporal search. Metadata filtering via tags but no time-based decay.
Scoring formula	`vec_weight × vec + fts_weight × fts + importance_weight × importance`, then × recency decay	Embedding cosine distance. Passage-based chunks returned with scores.
Default weights	50% vector, 30% FTS, 20% importance	Single-strategy (embedding similarity)
Configurable?	Yes -- per-query `vec_weight`, `fts_weight`, `importance_weight` params	Limited -- `archival_memory_search` accepts query + optional limit/offset
Reranking	None (single-pass)	None. Relies on embedding quality + LLM's own judgment when results enter core context.

Entity Extraction

Feature	Mnemosyne	Letta
Method	Regex patterns + pure Python Levenshtein distance	LLM-driven (agent extracts entities as part of memory editing, no separate pipeline)
Patterns	`@mentions`, `#hashtags`, `"quoted phrases"`, capitalized sequences (2–5 words)	No dedicated entity extraction. Entities live in core memory blocks as agent-managed structured text.
Fuzzy matching	Levenshtein distance with prefix/substring bonuses	Not applicable -- core_memory is agent-written, not automatically normalized
Storage	TripleStore triples: `(memory_id, "mentions", "entity_name")`	Stored as structured text in core memory blocks (e.g., "User: Alice, Age: 30, Location: NY")
Speed	~0.01ms per extraction	N/A -- entity tracking is agent-managed, not a pipeline
Opt-in?	`extract_entities=True` on `remember()`	Always on (agents maintain their own structured context)

Verdict: Mnemosyne provides automatic entity extraction with fuzzy matching. Letta delegates entity tracking entirely to the agent -- it is more flexible (agents can structure memory however they want) but less reliable (depends on model quality and consistency).

Fact Extraction

Feature	Mnemosyne	Letta
Method	LLM-driven: sends text to LLM, parses 2–5 factual statements	LLM-driven via core_memory editing. Agents decide what is fact-worthy and restructure their own context blocks.
Fallback chain	Remote OpenAI-compatible API → local ctransformers GGUF → skip (graceful)	No fallback chain. Relies on the configured LLM provider.
Storage	TripleStore: `(memory_id, "fact", fact_text)`	Stored in core memory blocks as agent-edited structured text. Not separately queryable.
Opt-in?	`extract=True` on `remember()`	Always on (agent-managed). No separate fact pipeline.

Integrations

MCP (Model Context Protocol)

Mnemosyne provides an MCP server with 6 tools and 2 transports:

Tool	Description
`mnemosyne_remember`	Store a memory (supports entity extraction, fact extraction, bank selection)
`mnemosyne_recall`	Search memories with hybrid scoring and configurable weights
`mnemosyne_sleep`	Run consolidation cycle
`mnemosyne_scratchpad_read`	Read agent scratchpad
`mnemosyne_scratchpad_write`	Write to scratchpad
`mnemosyne_get_stats`	Get memory statistics

mnemosyne mcp                          # stdio transport (Claude Desktop, etc.)
mnemosyne mcp --transport sse --port 8080  # SSE transport (web clients)
mnemosyne mcp --bank project_a            # scoped to a specific bank

Letta Integration

Letta provides a REST API and Python SDK (letta-client). Agents are created server-side and communicate via the API. The agent itself handles memory -- external tools access core/archival memory through the agent's tool interface.

	Mnemosyne	Letta
Hermes	Native (in-process, no serialization)	REST API client to Letta server
OpenClaw	Planned (adapter not yet built)	REST API client (custom integration required)
MCP	6 tools, stdio + SSE	Not MCP-native. Tool calls happen server-side inside the agent.
Cross-machine	Export/import JSON only	REST API -- any machine with HTTP access to the Letta server
Python SDK	`from mnemosyne import Mnemosyne` (direct import)	`pip install letta-client` → REST client to remote server

Memory Banks

Feature	Mnemosyne	Letta
Named banks	`BankManager` -- create, list, delete, rename banks	Agents as isolation units. Each agent has its own core + archival memory.
Isolation	Per-bank SQLite file under `data_dir/banks/<name>/`	Per-agent PostgreSQL rows. Core memory blocks and archival passages are scoped to the agent.
Usage	`Mnemosyne(bank="work")` or `mnemosyne mcp --bank work`	Create a new agent per project/user. No "bank" concept -- agents are the isolation boundary.
Multi-tenancy	No access control	Cloud API provides organization-level isolation. Self-hosted: agent-level isolation only.

Additional Features

Mnemosyne-specific (not in Letta)

Feature	Module	Description
Streaming	`core/streaming.py`	`MemoryStream` with push (callbacks) and pull (iterator) patterns. Thread-safe event buffer.
Delta sync	`core/streaming.py`	`DeltaSync` -- incremental synchronization between Mnemosyne instances with checkpointed resume.
Pattern detection	`core/patterns.py`	`PatternDetector` -- temporal (hour/weekday), content (keyword frequency, co-occurrence), sequence patterns.
Memory compression	`core/patterns.py`	`MemoryCompressor` -- dictionary-based, RLE, and semantic compression strategies.
Plugin system	`core/plugins.py`	`MnemosynePlugin` base class with 4 lifecycle hooks. Discovers plugins from `~/.hermes/mnemosyne/plugins/`.
Diagnostics	`diagnose.py`	PII-safe health check -- dependencies, database state, vector readiness. No memory content or API keys.
Temporal knowledge graph	TripleStore	Subject-predicate-object triples with `valid_from`/`valid_until` for point-in-time queries.
Hybrid retrieval	`recall()`	Three-signal hybrid (vector + FTS5 + importance) with configurable weights and recency decay.

Letta-specific (not in Mnemosyne)

Feature	Description
Virtual context paging	OS-inspired memory management: agents autonomously page data between core (RAM) and archival (disk) memory. Unique among open-source memory systems.
Self-editing memory	Agents call `core_memory_append`, `core_memory_replace`, `archival_memory_insert` as tool functions -- they restructure their own memory layout at runtime.
Research pedigree	Backed by the MemGPT paper (UC Berkeley). Published at NeurIPS 2023 workshops. Active academic interest and citations.
Agent templates	Pre-built agent types for different use cases (MemGPT, MemGPT+extras, custom). Each template defines base personality, memory blocks, and tool set.
Multi-agent orchestration	Letta server manages multiple agents simultaneously. Agents can be composed into pipelines.
Cloud API	Hosted Letta with managed infrastructure. Pay-as-you-go pricing. No PostgreSQL to manage.
Human-in-the-loop	Agents can pause and request human input mid-conversation. Useful for approval workflows.

Performance Characteristics

Metric	Mnemosyne	Letta Self-Hosted
Recall latency (10K corpus)	~2–10ms -- in-process SQLite + sqlite-vec, no HTTP overhead	~50–300ms -- REST API round-trip + PostgreSQL + pgvector
IPC model	Direct Python function call	REST API to Letta server → JSON serialization → agent-side tool execution
Storage footprint	~50–100MB SQLite file per 10K memories	~500MB–1GB PostgreSQL + pgvector index per agent (42 tables, dense schema)
Model download	One-time ~67MB (fastembed ONNX)	None required locally (embeddings via API), or variable (HuggingFace models if self-hosting embeddings)
Runtime memory	~10–20MB per session	~200–500MB (PostgreSQL pool + Letta server + per-agent context)
LLM calls per operation	0–1 (recall is local; fact extraction calls LLM once)	1+ per tool call (agent may chain multiple memory operations in a single turn)
Startup time	Instant	~10–30s (Docker + PostgreSQL + agent creation)

Important caveat: Letta's latency is higher per operation because the agent reasons about memory management. This is a feature, not a bug -- the agent decides what to store, how to structure it, and when to page data. Mnemosyne's operations are deterministic and fast by design, but they do not adapt to context the way Letta's agents do.

When to Choose What

Choose Mnemosyne if:

You want pip install with zero containers and no PostgreSQL
You need the fastest possible recall latency for interactive agent loops
You're running on a resource-constrained environment (VPS, ephemeral VM, CI)
You're building a single-user, single-machine agent (Hermes, Claude Desktop, etc.)
You want an MCP-compatible memory layer (stdio + SSE)
You want deterministic, predictable memory behavior without autonomous agent decisions
You want hybrid retrieval (vector + keyword + importance) with configurable weights
You want memory banks with per-bank SQLite isolation without standing up PostgreSQL

Choose Letta if:

Context windows are your primary bottleneck and you need agents that handle arbitrarily long conversations
You want agents that autonomously manage their own memory (self-editing, context paging)
You need multi-agent orchestration with a single server managing multiple agent instances
You are building research prototypes or experimenting with OS-inspired agent architectures
You are comfortable with Docker + PostgreSQL + pgvector as infrastructure requirements
You have budget for frontier models (GPT-4 class) that can reliably manage autonomous memory
You want pre-built agent templates with defined personalities and memory structures
You need a hosted cloud option (Letta Cloud API) to avoid managing infrastructure

Neither is "better." They solve fundamentally different problems. Letta addresses the context window bottleneck with autonomous memory management. Mnemosyne provides fast, deterministic, low-infrastructure memory. Choose based on whether you need an agent that manages its own memory, or a memory layer that your agent controls directly.

Known Gaps in Mnemosyne (honest list)

Gap	Severity	Workaround
No autonomous context paging	Low for short sessions, high for long-form agents	`sleep()` does explicit consolidation. For long conversations, call `sleep()` periodically or use `max_items` limits carefully.
No self-editing memory	Medium	Agent writes to memory via `remember()`, `scratchpad_write()` -- but cannot restructure existing memory blocks. Use `invalidate()` + re-write for restructuring.
No multi-agent orchestration	Low for single-agent setups, medium for multi-agent	Run multiple Mnemosyne instances with per-agent banks. No built-in agent-to-agent communication.
No cross-machine network API	Medium for multi-machine setups	Export/import JSON; same-machine sharing via shared SQLite file
No cross-encoder reranking	Low for most queries	Hybrid scoring with configurable weights covers common cases
No automatic conflict detection	Medium	Manual `invalidate(memory_id, replacement_id=new_id)`
No multi-tenancy / access control	High for SaaS use cases	Use per-bank SQLite isolation for domain separation
No pre-built agent templates / personalities	Low	Agents using Mnemosyne handle personality in their own system prompt -- memory is just the storage layer
No cloud-hosted option	Low for self-hosted users	Mnemosyne is designed for local/self-hosted use. No plans for a hosted service.

Every feature listed for Mnemosyne has been verified against the v2.8.0 source code. Letta features are based on the open-source repository and documentation as of May 2026. If anything here is wrong, please open an issue -- we'll fix it.