Mnemosyne vs Letta (MemGPT)

An honest, technical comparison for users running memory systems locally with Hermes Agent and OpenClaw.

Last updated: 2026-05-13 · Mnemosyne v2.8.0

TL;DR: Letta is a research-grade agent framework with OS-inspired virtual context management, self-editing memory, and a 42-table PostgreSQL schema. Mnemosyne is a lightweight memory layer with 3 SQLite tables, in-process operation, and near-zero infrastructure. Letta is what you deploy when context windows are the bottleneck; Mnemosyne is what you use when you want memory that just works without standing up a database cluster.


Architecture

DimensionMnemosyneLetta Self-Hosted
Process modelIn-process Python librarySeparate Docker containers (Letta server + PostgreSQL + pgvector)
IPC overheadZero (direct function calls)REST API + JSON serialization to Letta server
DatabaseSQLite (single file, WAL mode)PostgreSQL + pgvector (42-table schema, Alembic migrations)
Embedding modelfastembed ONNX -- BAAI/bge-small-en-v1.5 (~67MB)Configurable -- OpenAI, HuggingFace, or local embedding endpoints
Vector searchsqlite-vec (int8/bit/float32) or numpy fallbackpgvector HNSW (mature, optimized)
Cold startInstant (if models cached locally)~10–30s (Docker container boot + PostgreSQL init + agent creation)
Runtime memory~10–20MB per session (SQLite + ONNX)~200–500MB (PostgreSQL pool + Letta server + agent runtime)
Stars / communityNew (v2.8.0)~21.8K GitHub stars, active research community
PricingFree (MIT)OSS free (Letta Code), Cloud API $20–$750/mo + usage

Memory Model

Mnemosyne: BEAM (Bilevel Episodic-Associative Memory)

Three SQLite tables:

TierPurposeBehavior
Working memoryHot, recent contextAuto-injected into prompts. TTL-based eviction (default 24h). Max 10,000 items. FTS5 indexed.
Episodic memoryLong-term consolidated storagePopulated by sleep() consolidation. Hybrid vector + FTS5 search.
ScratchpadTemporary agent workspaceNot searchable, not consolidated. Cleared explicitly. Max 1,000 items.

Additional: TripleStore -- temporal knowledge graph with valid_from/valid_until for point-in-time queries.

Core operations: remember(), recall(), sleep() -- intentionally simple.

Letta: OS-Inspired Virtual Context Management

Letta models an agent's memory like an operating system, with two distinct tiers and active context paging:

TierLetta nameOS analogyBehavior
Main contextCore memoryRAMFixed-size block injected into every LLM call. Agent self-edits via core_memory_append and core_memory_replace. Limited to the model's context window.
External contextArchival memoryDiskUnlimited long-term storage. Agent searches via archival_memory_insert and archival_memory_search. Text + embedding pairs stored in pgvector.
Context pagingMemGPT pagingVirtual memoryWhen context fills up, Letta autonomously pages data between core and archival memory. The agent decides what to evict, what to keep, and when to fetch from "disk."

Core operations: core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search -- the agent itself calls these as tool functions.

Key difference: Mnemosyne's sleep() is an explicit consolidation step called by the developer. Letta's paging is autonomous -- the agent decides when and how to manage its own context, which is powerful but requires a capable frontier model to do well.


Context Management: Letta's Defining Feature

This is where Letta is genuinely unique among memory systems. No other open-source memory layer provides OS-style virtual context paging with autonomous agent-driven memory management.

FeatureMnemosyneLetta
Self-editing memoryNo -- sleep() is developer-triggered, agent writes to memory via tools but cannot restructure its own memory layoutYes -- agent calls core_memory_replace to rewrite its own working context, archival_memory_insert to persist to long-term storage
Context overflow handlingTTL-based eviction on working memory. Developer controls what stays via max_items.Autonomous paging: agent detects context is full, decides what to evict to archival, fetches relevant archival data back into core context as needed
Unlimited context illusionNo -- working memory is bounded. Episodic memory is searchable but not context-aware.Yes -- from the LLM's perspective, Letta agents appear to have unlimited context because they page data in and out autonomously
Model requirementsWorks with any local or remote modelBest results require GPT-4-class frontier models. Local models struggle with autonomous memory management.

Honest assessment: Letta's virtual context management is impressive research (backed by the MemGPT paper). For workloads where context windows are the primary bottleneck -- long-form writing, research agents, multi-session conversations -- it solves a real problem. But it comes at a cost: complex infrastructure and expensive model requirements.


Retrieval

FeatureMnemosyneLetta Self-Hosted
Vector searchsqlite-vec (cosine distance)pgvector HNSW (via archival_memory_search)
Keyword searchSQLite FTS5PostgreSQL full-text (passage-based, embedded in archival_memory_search)
Graph searchTripleStore (subject-predicate-object, temporal)No native knowledge graph
Temporal searchtemporal_weight + temporal_halflife params on recall()No native temporal search. Metadata filtering via tags but no time-based decay.
Scoring formulavec_weight × vec + fts_weight × fts + importance_weight × importance, then × recency decayEmbedding cosine distance. Passage-based chunks returned with scores.
Default weights50% vector, 30% FTS, 20% importanceSingle-strategy (embedding similarity)
Configurable?Yes -- per-query vec_weight, fts_weight, importance_weight paramsLimited -- archival_memory_search accepts query + optional limit/offset
RerankingNone (single-pass)None. Relies on embedding quality + LLM's own judgment when results enter core context.

Entity Extraction

FeatureMnemosyneLetta
MethodRegex patterns + pure Python Levenshtein distanceLLM-driven (agent extracts entities as part of memory editing, no separate pipeline)
Patterns@mentions, #hashtags, "quoted phrases", capitalized sequences (2–5 words)No dedicated entity extraction. Entities live in core memory blocks as agent-managed structured text.
Fuzzy matchingLevenshtein distance with prefix/substring bonusesNot applicable -- core_memory is agent-written, not automatically normalized
StorageTripleStore triples: (memory_id, "mentions", "entity_name")Stored as structured text in core memory blocks (e.g., "User: Alice, Age: 30, Location: NY")
Speed~0.01ms per extractionN/A -- entity tracking is agent-managed, not a pipeline
Opt-in?extract_entities=True on remember()Always on (agents maintain their own structured context)

Verdict: Mnemosyne provides automatic entity extraction with fuzzy matching. Letta delegates entity tracking entirely to the agent -- it is more flexible (agents can structure memory however they want) but less reliable (depends on model quality and consistency).


Fact Extraction

FeatureMnemosyneLetta
MethodLLM-driven: sends text to LLM, parses 2–5 factual statementsLLM-driven via core_memory editing. Agents decide what is fact-worthy and restructure their own context blocks.
Fallback chainRemote OpenAI-compatible API → local ctransformers GGUF → skip (graceful)No fallback chain. Relies on the configured LLM provider.
StorageTripleStore: (memory_id, "fact", fact_text)Stored in core memory blocks as agent-edited structured text. Not separately queryable.
Opt-in?extract=True on remember()Always on (agent-managed). No separate fact pipeline.

Integrations

MCP (Model Context Protocol)

Mnemosyne provides an MCP server with 6 tools and 2 transports:

ToolDescription
mnemosyne_rememberStore a memory (supports entity extraction, fact extraction, bank selection)
mnemosyne_recallSearch memories with hybrid scoring and configurable weights
mnemosyne_sleepRun consolidation cycle
mnemosyne_scratchpad_readRead agent scratchpad
mnemosyne_scratchpad_writeWrite to scratchpad
mnemosyne_get_statsGet memory statistics
mnemosyne mcp                          # stdio transport (Claude Desktop, etc.)
mnemosyne mcp --transport sse --port 8080  # SSE transport (web clients)
mnemosyne mcp --bank project_a            # scoped to a specific bank

Letta Integration

Letta provides a REST API and Python SDK (letta-client). Agents are created server-side and communicate via the API. The agent itself handles memory -- external tools access core/archival memory through the agent's tool interface.

MnemosyneLetta
HermesNative (in-process, no serialization)REST API client to Letta server
OpenClawPlanned (adapter not yet built)REST API client (custom integration required)
MCP6 tools, stdio + SSENot MCP-native. Tool calls happen server-side inside the agent.
Cross-machineExport/import JSON onlyREST API -- any machine with HTTP access to the Letta server
Python SDKfrom mnemosyne import Mnemosyne (direct import)pip install letta-client → REST client to remote server

Memory Banks

FeatureMnemosyneLetta
Named banksBankManager -- create, list, delete, rename banksAgents as isolation units. Each agent has its own core + archival memory.
IsolationPer-bank SQLite file under data_dir/banks/<name>/Per-agent PostgreSQL rows. Core memory blocks and archival passages are scoped to the agent.
UsageMnemosyne(bank="work") or mnemosyne mcp --bank workCreate a new agent per project/user. No "bank" concept -- agents are the isolation boundary.
Multi-tenancyNo access controlCloud API provides organization-level isolation. Self-hosted: agent-level isolation only.

Additional Features

Mnemosyne-specific (not in Letta)

FeatureModuleDescription
Streamingcore/streaming.pyMemoryStream with push (callbacks) and pull (iterator) patterns. Thread-safe event buffer.
Delta synccore/streaming.pyDeltaSync -- incremental synchronization between Mnemosyne instances with checkpointed resume.
Pattern detectioncore/patterns.pyPatternDetector -- temporal (hour/weekday), content (keyword frequency, co-occurrence), sequence patterns.
Memory compressioncore/patterns.pyMemoryCompressor -- dictionary-based, RLE, and semantic compression strategies.
Plugin systemcore/plugins.pyMnemosynePlugin base class with 4 lifecycle hooks. Discovers plugins from ~/.hermes/mnemosyne/plugins/.
Diagnosticsdiagnose.pyPII-safe health check -- dependencies, database state, vector readiness. No memory content or API keys.
Temporal knowledge graphTripleStoreSubject-predicate-object triples with valid_from/valid_until for point-in-time queries.
Hybrid retrievalrecall()Three-signal hybrid (vector + FTS5 + importance) with configurable weights and recency decay.

Letta-specific (not in Mnemosyne)

FeatureDescription
Virtual context pagingOS-inspired memory management: agents autonomously page data between core (RAM) and archival (disk) memory. Unique among open-source memory systems.
Self-editing memoryAgents call core_memory_append, core_memory_replace, archival_memory_insert as tool functions -- they restructure their own memory layout at runtime.
Research pedigreeBacked by the MemGPT paper (UC Berkeley). Published at NeurIPS 2023 workshops. Active academic interest and citations.
Agent templatesPre-built agent types for different use cases (MemGPT, MemGPT+extras, custom). Each template defines base personality, memory blocks, and tool set.
Multi-agent orchestrationLetta server manages multiple agents simultaneously. Agents can be composed into pipelines.
Cloud APIHosted Letta with managed infrastructure. Pay-as-you-go pricing. No PostgreSQL to manage.
Human-in-the-loopAgents can pause and request human input mid-conversation. Useful for approval workflows.

Performance Characteristics

MetricMnemosyneLetta Self-Hosted
Recall latency (10K corpus)~2–10ms -- in-process SQLite + sqlite-vec, no HTTP overhead~50–300ms -- REST API round-trip + PostgreSQL + pgvector
IPC modelDirect Python function callREST API to Letta server → JSON serialization → agent-side tool execution
Storage footprint~50–100MB SQLite file per 10K memories~500MB–1GB PostgreSQL + pgvector index per agent (42 tables, dense schema)
Model downloadOne-time ~67MB (fastembed ONNX)None required locally (embeddings via API), or variable (HuggingFace models if self-hosting embeddings)
Runtime memory~10–20MB per session~200–500MB (PostgreSQL pool + Letta server + per-agent context)
LLM calls per operation0–1 (recall is local; fact extraction calls LLM once)1+ per tool call (agent may chain multiple memory operations in a single turn)
Startup timeInstant~10–30s (Docker + PostgreSQL + agent creation)

Important caveat: Letta's latency is higher per operation because the agent reasons about memory management. This is a feature, not a bug -- the agent decides what to store, how to structure it, and when to page data. Mnemosyne's operations are deterministic and fast by design, but they do not adapt to context the way Letta's agents do.


When to Choose What

Choose Mnemosyne if:

  • You want pip install with zero containers and no PostgreSQL
  • You need the fastest possible recall latency for interactive agent loops
  • You're running on a resource-constrained environment (VPS, ephemeral VM, CI)
  • You're building a single-user, single-machine agent (Hermes, Claude Desktop, etc.)
  • You want an MCP-compatible memory layer (stdio + SSE)
  • You want deterministic, predictable memory behavior without autonomous agent decisions
  • You want hybrid retrieval (vector + keyword + importance) with configurable weights
  • You want memory banks with per-bank SQLite isolation without standing up PostgreSQL

Choose Letta if:

  • Context windows are your primary bottleneck and you need agents that handle arbitrarily long conversations
  • You want agents that autonomously manage their own memory (self-editing, context paging)
  • You need multi-agent orchestration with a single server managing multiple agent instances
  • You are building research prototypes or experimenting with OS-inspired agent architectures
  • You are comfortable with Docker + PostgreSQL + pgvector as infrastructure requirements
  • You have budget for frontier models (GPT-4 class) that can reliably manage autonomous memory
  • You want pre-built agent templates with defined personalities and memory structures
  • You need a hosted cloud option (Letta Cloud API) to avoid managing infrastructure

Neither is "better." They solve fundamentally different problems. Letta addresses the context window bottleneck with autonomous memory management. Mnemosyne provides fast, deterministic, low-infrastructure memory. Choose based on whether you need an agent that manages its own memory, or a memory layer that your agent controls directly.


Known Gaps in Mnemosyne (honest list)

GapSeverityWorkaround
No autonomous context pagingLow for short sessions, high for long-form agentssleep() does explicit consolidation. For long conversations, call sleep() periodically or use max_items limits carefully.
No self-editing memoryMediumAgent writes to memory via remember(), scratchpad_write() -- but cannot restructure existing memory blocks. Use invalidate() + re-write for restructuring.
No multi-agent orchestrationLow for single-agent setups, medium for multi-agentRun multiple Mnemosyne instances with per-agent banks. No built-in agent-to-agent communication.
No cross-machine network APIMedium for multi-machine setupsExport/import JSON; same-machine sharing via shared SQLite file
No cross-encoder rerankingLow for most queriesHybrid scoring with configurable weights covers common cases
No automatic conflict detectionMediumManual invalidate(memory_id, replacement_id=new_id)
No multi-tenancy / access controlHigh for SaaS use casesUse per-bank SQLite isolation for domain separation
No pre-built agent templates / personalitiesLowAgents using Mnemosyne handle personality in their own system prompt -- memory is just the storage layer
No cloud-hosted optionLow for self-hosted usersMnemosyne is designed for local/self-hosted use. No plans for a hosted service.

Every feature listed for Mnemosyne has been verified against the v2.8.0 source code. Letta features are based on the open-source repository and documentation as of May 2026. If anything here is wrong, please open an issue -- we'll fix it.