Mnemosyne vs Letta (MemGPT)
An honest, technical comparison for users running memory systems locally with Hermes Agent and OpenClaw.
Last updated: 2026-05-13 · Mnemosyne v2.8.0
TL;DR: Letta is a research-grade agent framework with OS-inspired virtual context management, self-editing memory, and a 42-table PostgreSQL schema. Mnemosyne is a lightweight memory layer with 3 SQLite tables, in-process operation, and near-zero infrastructure. Letta is what you deploy when context windows are the bottleneck; Mnemosyne is what you use when you want memory that just works without standing up a database cluster.
Architecture
| Dimension | Mnemosyne | Letta Self-Hosted |
|---|---|---|
| Process model | In-process Python library | Separate Docker containers (Letta server + PostgreSQL + pgvector) |
| IPC overhead | Zero (direct function calls) | REST API + JSON serialization to Letta server |
| Database | SQLite (single file, WAL mode) | PostgreSQL + pgvector (42-table schema, Alembic migrations) |
| Embedding model | fastembed ONNX -- BAAI/bge-small-en-v1.5 (~67MB) | Configurable -- OpenAI, HuggingFace, or local embedding endpoints |
| Vector search | sqlite-vec (int8/bit/float32) or numpy fallback | pgvector HNSW (mature, optimized) |
| Cold start | Instant (if models cached locally) | ~10–30s (Docker container boot + PostgreSQL init + agent creation) |
| Runtime memory | ~10–20MB per session (SQLite + ONNX) | ~200–500MB (PostgreSQL pool + Letta server + agent runtime) |
| Stars / community | New (v2.8.0) | ~21.8K GitHub stars, active research community |
| Pricing | Free (MIT) | OSS free (Letta Code), Cloud API $20–$750/mo + usage |
Memory Model
Mnemosyne: BEAM (Bilevel Episodic-Associative Memory)
Three SQLite tables:
| Tier | Purpose | Behavior |
|---|---|---|
| Working memory | Hot, recent context | Auto-injected into prompts. TTL-based eviction (default 24h). Max 10,000 items. FTS5 indexed. |
| Episodic memory | Long-term consolidated storage | Populated by sleep() consolidation. Hybrid vector + FTS5 search. |
| Scratchpad | Temporary agent workspace | Not searchable, not consolidated. Cleared explicitly. Max 1,000 items. |
Additional: TripleStore -- temporal knowledge graph with valid_from/valid_until for point-in-time queries.
Core operations: remember(), recall(), sleep() -- intentionally simple.
Letta: OS-Inspired Virtual Context Management
Letta models an agent's memory like an operating system, with two distinct tiers and active context paging:
| Tier | Letta name | OS analogy | Behavior |
|---|---|---|---|
| Main context | Core memory | RAM | Fixed-size block injected into every LLM call. Agent self-edits via core_memory_append and core_memory_replace. Limited to the model's context window. |
| External context | Archival memory | Disk | Unlimited long-term storage. Agent searches via archival_memory_insert and archival_memory_search. Text + embedding pairs stored in pgvector. |
| Context paging | MemGPT paging | Virtual memory | When context fills up, Letta autonomously pages data between core and archival memory. The agent decides what to evict, what to keep, and when to fetch from "disk." |
Core operations: core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search -- the agent itself calls these as tool functions.
Key difference: Mnemosyne's sleep() is an explicit consolidation step called by the developer. Letta's paging is autonomous -- the agent decides when and how to manage its own context, which is powerful but requires a capable frontier model to do well.
Context Management: Letta's Defining Feature
This is where Letta is genuinely unique among memory systems. No other open-source memory layer provides OS-style virtual context paging with autonomous agent-driven memory management.
| Feature | Mnemosyne | Letta |
|---|---|---|
| Self-editing memory | No -- sleep() is developer-triggered, agent writes to memory via tools but cannot restructure its own memory layout | Yes -- agent calls core_memory_replace to rewrite its own working context, archival_memory_insert to persist to long-term storage |
| Context overflow handling | TTL-based eviction on working memory. Developer controls what stays via max_items. | Autonomous paging: agent detects context is full, decides what to evict to archival, fetches relevant archival data back into core context as needed |
| Unlimited context illusion | No -- working memory is bounded. Episodic memory is searchable but not context-aware. | Yes -- from the LLM's perspective, Letta agents appear to have unlimited context because they page data in and out autonomously |
| Model requirements | Works with any local or remote model | Best results require GPT-4-class frontier models. Local models struggle with autonomous memory management. |
Honest assessment: Letta's virtual context management is impressive research (backed by the MemGPT paper). For workloads where context windows are the primary bottleneck -- long-form writing, research agents, multi-session conversations -- it solves a real problem. But it comes at a cost: complex infrastructure and expensive model requirements.
Retrieval
| Feature | Mnemosyne | Letta Self-Hosted |
|---|---|---|
| Vector search | sqlite-vec (cosine distance) | pgvector HNSW (via archival_memory_search) |
| Keyword search | SQLite FTS5 | PostgreSQL full-text (passage-based, embedded in archival_memory_search) |
| Graph search | TripleStore (subject-predicate-object, temporal) | No native knowledge graph |
| Temporal search | temporal_weight + temporal_halflife params on recall() | No native temporal search. Metadata filtering via tags but no time-based decay. |
| Scoring formula | vec_weight × vec + fts_weight × fts + importance_weight × importance, then × recency decay | Embedding cosine distance. Passage-based chunks returned with scores. |
| Default weights | 50% vector, 30% FTS, 20% importance | Single-strategy (embedding similarity) |
| Configurable? | Yes -- per-query vec_weight, fts_weight, importance_weight params | Limited -- archival_memory_search accepts query + optional limit/offset |
| Reranking | None (single-pass) | None. Relies on embedding quality + LLM's own judgment when results enter core context. |
Entity Extraction
| Feature | Mnemosyne | Letta |
|---|---|---|
| Method | Regex patterns + pure Python Levenshtein distance | LLM-driven (agent extracts entities as part of memory editing, no separate pipeline) |
| Patterns | @mentions, #hashtags, "quoted phrases", capitalized sequences (2–5 words) | No dedicated entity extraction. Entities live in core memory blocks as agent-managed structured text. |
| Fuzzy matching | Levenshtein distance with prefix/substring bonuses | Not applicable -- core_memory is agent-written, not automatically normalized |
| Storage | TripleStore triples: (memory_id, "mentions", "entity_name") | Stored as structured text in core memory blocks (e.g., "User: Alice, Age: 30, Location: NY") |
| Speed | ~0.01ms per extraction | N/A -- entity tracking is agent-managed, not a pipeline |
| Opt-in? | extract_entities=True on remember() | Always on (agents maintain their own structured context) |
Verdict: Mnemosyne provides automatic entity extraction with fuzzy matching. Letta delegates entity tracking entirely to the agent -- it is more flexible (agents can structure memory however they want) but less reliable (depends on model quality and consistency).
Fact Extraction
| Feature | Mnemosyne | Letta |
|---|---|---|
| Method | LLM-driven: sends text to LLM, parses 2–5 factual statements | LLM-driven via core_memory editing. Agents decide what is fact-worthy and restructure their own context blocks. |
| Fallback chain | Remote OpenAI-compatible API → local ctransformers GGUF → skip (graceful) | No fallback chain. Relies on the configured LLM provider. |
| Storage | TripleStore: (memory_id, "fact", fact_text) | Stored in core memory blocks as agent-edited structured text. Not separately queryable. |
| Opt-in? | extract=True on remember() | Always on (agent-managed). No separate fact pipeline. |
Integrations
MCP (Model Context Protocol)
Mnemosyne provides an MCP server with 6 tools and 2 transports:
| Tool | Description |
|---|---|
mnemosyne_remember | Store a memory (supports entity extraction, fact extraction, bank selection) |
mnemosyne_recall | Search memories with hybrid scoring and configurable weights |
mnemosyne_sleep | Run consolidation cycle |
mnemosyne_scratchpad_read | Read agent scratchpad |
mnemosyne_scratchpad_write | Write to scratchpad |
mnemosyne_get_stats | Get memory statistics |
mnemosyne mcp # stdio transport (Claude Desktop, etc.)
mnemosyne mcp --transport sse --port 8080 # SSE transport (web clients)
mnemosyne mcp --bank project_a # scoped to a specific bank
Letta Integration
Letta provides a REST API and Python SDK (letta-client). Agents are created server-side and communicate via the API. The agent itself handles memory -- external tools access core/archival memory through the agent's tool interface.
| Mnemosyne | Letta | |
|---|---|---|
| Hermes | Native (in-process, no serialization) | REST API client to Letta server |
| OpenClaw | Planned (adapter not yet built) | REST API client (custom integration required) |
| MCP | 6 tools, stdio + SSE | Not MCP-native. Tool calls happen server-side inside the agent. |
| Cross-machine | Export/import JSON only | REST API -- any machine with HTTP access to the Letta server |
| Python SDK | from mnemosyne import Mnemosyne (direct import) | pip install letta-client → REST client to remote server |
Memory Banks
| Feature | Mnemosyne | Letta |
|---|---|---|
| Named banks | BankManager -- create, list, delete, rename banks | Agents as isolation units. Each agent has its own core + archival memory. |
| Isolation | Per-bank SQLite file under data_dir/banks/<name>/ | Per-agent PostgreSQL rows. Core memory blocks and archival passages are scoped to the agent. |
| Usage | Mnemosyne(bank="work") or mnemosyne mcp --bank work | Create a new agent per project/user. No "bank" concept -- agents are the isolation boundary. |
| Multi-tenancy | No access control | Cloud API provides organization-level isolation. Self-hosted: agent-level isolation only. |
Additional Features
Mnemosyne-specific (not in Letta)
| Feature | Module | Description |
|---|---|---|
| Streaming | core/streaming.py | MemoryStream with push (callbacks) and pull (iterator) patterns. Thread-safe event buffer. |
| Delta sync | core/streaming.py | DeltaSync -- incremental synchronization between Mnemosyne instances with checkpointed resume. |
| Pattern detection | core/patterns.py | PatternDetector -- temporal (hour/weekday), content (keyword frequency, co-occurrence), sequence patterns. |
| Memory compression | core/patterns.py | MemoryCompressor -- dictionary-based, RLE, and semantic compression strategies. |
| Plugin system | core/plugins.py | MnemosynePlugin base class with 4 lifecycle hooks. Discovers plugins from ~/.hermes/mnemosyne/plugins/. |
| Diagnostics | diagnose.py | PII-safe health check -- dependencies, database state, vector readiness. No memory content or API keys. |
| Temporal knowledge graph | TripleStore | Subject-predicate-object triples with valid_from/valid_until for point-in-time queries. |
| Hybrid retrieval | recall() | Three-signal hybrid (vector + FTS5 + importance) with configurable weights and recency decay. |
Letta-specific (not in Mnemosyne)
| Feature | Description |
|---|---|
| Virtual context paging | OS-inspired memory management: agents autonomously page data between core (RAM) and archival (disk) memory. Unique among open-source memory systems. |
| Self-editing memory | Agents call core_memory_append, core_memory_replace, archival_memory_insert as tool functions -- they restructure their own memory layout at runtime. |
| Research pedigree | Backed by the MemGPT paper (UC Berkeley). Published at NeurIPS 2023 workshops. Active academic interest and citations. |
| Agent templates | Pre-built agent types for different use cases (MemGPT, MemGPT+extras, custom). Each template defines base personality, memory blocks, and tool set. |
| Multi-agent orchestration | Letta server manages multiple agents simultaneously. Agents can be composed into pipelines. |
| Cloud API | Hosted Letta with managed infrastructure. Pay-as-you-go pricing. No PostgreSQL to manage. |
| Human-in-the-loop | Agents can pause and request human input mid-conversation. Useful for approval workflows. |
Performance Characteristics
| Metric | Mnemosyne | Letta Self-Hosted |
|---|---|---|
| Recall latency (10K corpus) | ~2–10ms -- in-process SQLite + sqlite-vec, no HTTP overhead | ~50–300ms -- REST API round-trip + PostgreSQL + pgvector |
| IPC model | Direct Python function call | REST API to Letta server → JSON serialization → agent-side tool execution |
| Storage footprint | ~50–100MB SQLite file per 10K memories | ~500MB–1GB PostgreSQL + pgvector index per agent (42 tables, dense schema) |
| Model download | One-time ~67MB (fastembed ONNX) | None required locally (embeddings via API), or variable (HuggingFace models if self-hosting embeddings) |
| Runtime memory | ~10–20MB per session | ~200–500MB (PostgreSQL pool + Letta server + per-agent context) |
| LLM calls per operation | 0–1 (recall is local; fact extraction calls LLM once) | 1+ per tool call (agent may chain multiple memory operations in a single turn) |
| Startup time | Instant | ~10–30s (Docker + PostgreSQL + agent creation) |
Important caveat: Letta's latency is higher per operation because the agent reasons about memory management. This is a feature, not a bug -- the agent decides what to store, how to structure it, and when to page data. Mnemosyne's operations are deterministic and fast by design, but they do not adapt to context the way Letta's agents do.
When to Choose What
Choose Mnemosyne if:
- You want
pip installwith zero containers and no PostgreSQL - You need the fastest possible recall latency for interactive agent loops
- You're running on a resource-constrained environment (VPS, ephemeral VM, CI)
- You're building a single-user, single-machine agent (Hermes, Claude Desktop, etc.)
- You want an MCP-compatible memory layer (stdio + SSE)
- You want deterministic, predictable memory behavior without autonomous agent decisions
- You want hybrid retrieval (vector + keyword + importance) with configurable weights
- You want memory banks with per-bank SQLite isolation without standing up PostgreSQL
Choose Letta if:
- Context windows are your primary bottleneck and you need agents that handle arbitrarily long conversations
- You want agents that autonomously manage their own memory (self-editing, context paging)
- You need multi-agent orchestration with a single server managing multiple agent instances
- You are building research prototypes or experimenting with OS-inspired agent architectures
- You are comfortable with Docker + PostgreSQL + pgvector as infrastructure requirements
- You have budget for frontier models (GPT-4 class) that can reliably manage autonomous memory
- You want pre-built agent templates with defined personalities and memory structures
- You need a hosted cloud option (Letta Cloud API) to avoid managing infrastructure
Neither is "better." They solve fundamentally different problems. Letta addresses the context window bottleneck with autonomous memory management. Mnemosyne provides fast, deterministic, low-infrastructure memory. Choose based on whether you need an agent that manages its own memory, or a memory layer that your agent controls directly.
Known Gaps in Mnemosyne (honest list)
| Gap | Severity | Workaround |
|---|---|---|
| No autonomous context paging | Low for short sessions, high for long-form agents | sleep() does explicit consolidation. For long conversations, call sleep() periodically or use max_items limits carefully. |
| No self-editing memory | Medium | Agent writes to memory via remember(), scratchpad_write() -- but cannot restructure existing memory blocks. Use invalidate() + re-write for restructuring. |
| No multi-agent orchestration | Low for single-agent setups, medium for multi-agent | Run multiple Mnemosyne instances with per-agent banks. No built-in agent-to-agent communication. |
| No cross-machine network API | Medium for multi-machine setups | Export/import JSON; same-machine sharing via shared SQLite file |
| No cross-encoder reranking | Low for most queries | Hybrid scoring with configurable weights covers common cases |
| No automatic conflict detection | Medium | Manual invalidate(memory_id, replacement_id=new_id) |
| No multi-tenancy / access control | High for SaaS use cases | Use per-bank SQLite isolation for domain separation |
| No pre-built agent templates / personalities | Low | Agents using Mnemosyne handle personality in their own system prompt -- memory is just the storage layer |
| No cloud-hosted option | Low for self-hosted users | Mnemosyne is designed for local/self-hosted use. No plans for a hosted service. |
Every feature listed for Mnemosyne has been verified against the v2.8.0 source code. Letta features are based on the open-source repository and documentation as of May 2026. If anything here is wrong, please open an issue -- we'll fix it.
Mnemosyne