Mnemosyne vs Honcho

An honest, technical comparison for users running memory systems locally with Hermes Agent and OpenClaw.

Last updated: 2026-05-13 · Mnemosyne v2.8.0

TL;DR: Honcho is the only memory system with genuine reasoning about memory -- not just store-and-retrieve. It uses a background reasoning model (Neuromancer) that "dreams" about stored data to build representations, summaries, and logical deductions. Mnemosyne is a lightweight, deterministic memory layer. Honcho is what you choose when you want the system to think about what it knows; Mnemosyne is what you choose when you want fast, predictable storage and retrieval without standing up a reasoning pipeline.


Architecture

DimensionMnemosyneHoncho Self-Hosted
Process modelIn-process Python libraryDocker containers (FastAPI + PostgreSQL + background Neuromancer worker)
IPC overheadZero (direct function calls)HTTP + JSON serialization to Honcho API
DatabaseSQLite (single file, WAL mode)PostgreSQL + pgvector
Embedding modelfastembed ONNX -- BAAI/bge-small-en-v1.5 (~67MB)OpenAI-compatible embedding endpoint (configurable)
Vector searchsqlite-vec (int8/bit/float32) or numpy fallbackpgvector (cosine similarity)
Cold startInstant (if models cached locally)~10--20s (Docker containers + PostgreSQL init + Neuromancer warm-up)
Runtime memory~10--20MB per session (SQLite + ONNX)~200--500MB (PostgreSQL pool + FastAPI + Neuromancer model runtime)
Stars / communityNew (v2.8.0)~800 GitHub stars, Plastic Labs team
LicenseMITAGPL-3.0 (restrictive for commercial use)
PricingFree (MIT)Managed service: $100 free credits, $2/million tokens ingestion, reasoning fees per call. Self-hosted: free (but you pay for 3+ LLM provider keys)

Memory Model

Mnemosyne: BEAM (Bilevel Episodic-Associative Memory)

Three SQLite tables:

TierPurposeBehavior
Working memoryHot, recent contextAuto-injected into prompts. TTL-based eviction (default 24h). Max 10,000 items. FTS5 indexed.
Episodic memoryLong-term consolidated storagePopulated by sleep() consolidation. Hybrid vector + FTS5 search.
ScratchpadTemporary agent workspaceNot searchable, not consolidated. Cleared explicitly. Max 1,000 items.

Additional: TripleStore -- temporal knowledge graph with valid_from/valid_until for point-in-time queries.

Core operations: remember(), recall(), sleep() -- intentionally simple.

Honcho: Entity-Centric Peer Paradigm

Honcho models all participants -- both humans and AIs -- as Peers. Memory is organized around who said what, in which session.

TierPurposeBehavior
WorkspacesTop-level containerGroups peers and sessions. Typically one per application or project.
PeersHumans and AIsFirst-class entities. Both users and agents are peers with their own identity. Enables group chat memory out of the box.
SessionsConversationsMessages grouped into sessions. A session has multiple peers participating.
MessagesRaw interaction dataThe atomic unit of memory. Each message belongs to a peer within a session.

Background reasoning: Honcho runs Neuromancer -- a dedicated reasoning model -- as a background worker. Neuromancer "dreams" about stored data during idle time, producing:

  • Representations: Dense summaries and embeddings of conversational data
  • Summaries: Compressed versions of long exchanges
  • Logical deductions: Inferred facts and relationships that were never explicitly stated

Core operations: create_message(), get_chat_response() (NL Q&A), search(), get_context() (token-budgeted retrieval).

Key difference: Mnemosyne stores what you give it and retrieves it on demand. Honcho reasons about what you give it in the background, building a derived understanding that goes beyond the raw data. This is the only open-source memory system with genuine deductive reasoning.


Retrieval

FeatureMnemosyneHoncho Self-Hosted
Vector searchsqlite-vec (cosine distance)pgvector (cosine similarity)
Keyword searchSQLite FTS5PostgreSQL full-text (via Search API)
NL Q&ANot built-in (LLM caller constructs prompt from recall results)Chat API -- natural language question answering over stored memory. Honcho formulates queries and synthesizes answers.
Token-budgeted contextNot built-in. Caller controls how many recall results to include.Context API -- returns relevant memory chunks trimmed to fit a specified token budget. Ideal for injecting into LLM prompts.
Semantic searchBuilt into recall() via vector scoringDedicated Search API with configurable similarity thresholds
Graph searchTripleStore (subject-predicate-object, temporal)No explicit knowledge graph. Neuromancer's logical deductions serve a similar role for inferred relationships.
Temporal searchtemporal_weight + temporal_halflife params on recall()Message-level timestamps. Session-based grouping. No explicit temporal decay scoring.
Scoring formulavec_weight × vec + fts_weight × fts + importance_weight × importance, then × recency decayCosine similarity for Search API. Chat API uses LLM reasoning (not raw scoring) to determine relevance.
RerankingNone (single-pass)Neuromancer reasoning acts as implicit reranking -- the model evaluates relevance semantically rather than scoring vectors.

Reasoning: Honcho's Defining Feature

This is where Honcho is genuinely unique. No other open-source memory system runs a background reasoning model that derives new knowledge from stored data.

FeatureMnemosyneHoncho
Background reasoningNo. sleep() is developer-triggered summarization only.Yes. Neuromancer runs continuously as a background worker, "dreaming" about stored data.
Logical deductionNo. Mnemosyne stores facts you give it. It does not infer new ones.Yes. Neuromancer derives inferred relationships -- e.g., "User prefers morning meetings" from scattered scheduling messages.
Token efficiencyCaller controls what goes into prompts via recall limits.Neuromancer achieves 60--90% token savings by compressing raw messages into dense representations.
SOTA benchmarksNot benchmarked on memory reasoning tasks.90.4% on LongMem, 89.9% on LoCoMo (state-of-the-art for memory reasoning).
Group chat handlingNo native concept of participants. Caller can tag memories by agent/user manually.Peer paradigm handles multi-party conversations natively. Each message is attributed to a peer in a session.
Model requirementsWorks with any local or remote model for optional fact extraction.Requires 3 LLM provider keys: (1) embedding model, (2) chat/completion model for Chat API, (3) Neuromancer reasoning model for background dreaming.

Honest assessment: Honcho's reasoning capability is genuinely novel. It is the only system that can tell you something it was never explicitly told -- it deduces it. This is powerful for long-running agents that need to build a model of the user over time. But it comes at a significant infrastructure cost: PostgreSQL + pgvector + background worker + three separate LLM API keys. For many use cases, deterministic store-and-retrieve is sufficient.


Entity and Fact Handling

FeatureMnemosyneHoncho
Entity modelRegex patterns + Levenshtein fuzzy matching. Entities extracted into TripleStore as (memory_id, relation, entity) triples.Peers are first-class entities with identity. Humans and AIs are treated symmetrically. No automatic entity extraction from message text.
Fact extractionLLM-driven via extract=True on remember(). Parses 2--5 factual statements, stores in TripleStore.Neuromancer deduces facts during background reasoning. Facts are derived, not extracted -- they emerge from reasoning over message history.
Conflict handlingManual via invalidate(). No automatic contradiction detection.Neuromancer can detect contradictions during reasoning and resolve them.
ProvenanceMemory-level: each fact links back to its source memory via memory_id.Message-level: all deductions trace back to the messages that produced them.

Integrations

Mnemosyne Integration

Mnemosyne provides an MCP server with 6 tools and 2 transports:

ToolDescription
mnemosyne_rememberStore a memory (supports entity extraction, fact extraction, bank selection)
mnemosyne_recallSearch memories with hybrid scoring and configurable weights
mnemosyne_sleepRun consolidation cycle
mnemosyne_scratchpad_readRead agent scratchpad
mnemosyne_scratchpad_writeWrite to scratchpad
mnemosyne_get_statsGet memory statistics
mnemosyne mcp                          # stdio transport (Claude Desktop, etc.)
mnemosyne mcp --transport sse --port 8080  # SSE transport (web clients)
mnemosyne mcp --bank project_a            # scoped to a specific bank

Honcho Integration

Honcho provides a REST API and Python SDK. Integration is via HTTP calls to the Honcho server. The Neuromancer background worker runs independently.

MnemosyneHoncho
HermesNative (in-process, no serialization)HTTP client to Honcho REST API
OpenClawPlanned (adapter not yet built)HTTP client (custom integration required)
MCP6 tools, stdio + SSENot MCP-native. REST API only.
Cross-machineExport/import JSON onlyREST API -- any machine with HTTP access to the Honcho server
Python SDKfrom mnemosyne import Mnemosyne (direct import)pip install honcho -- REST client to Honcho server

Memory Banks / Workspaces

FeatureMnemosyneHoncho
Named isolationBankManager -- create, list, delete, rename banksWorkspaces -- create, list, delete. Each workspace is a top-level container.
IsolationPer-bank SQLite file under data_dir/banks/<name>/Per-workspace PostgreSQL rows. Peers and sessions are scoped to a workspace.
UsageMnemosyne(bank="work") or mnemosyne mcp --bank workAPI-level workspace selection. Each API call specifies a workspace_id.
Multi-tenancyNo access controlWorkspace-level isolation. No built-in access control for self-hosted.

Additional Features

Mnemosyne-specific (not in Honcho)

FeatureModuleDescription
Streamingcore/streaming.pyMemoryStream with push (callbacks) and pull (iterator) patterns. Thread-safe event buffer.
Delta synccore/streaming.pyDeltaSync -- incremental synchronization between Mnemosyne instances with checkpointed resume.
Pattern detectioncore/patterns.pyPatternDetector -- temporal (hour/weekday), content (keyword frequency, co-occurrence), sequence patterns.
Memory compressioncore/patterns.pyMemoryCompressor -- dictionary-based, RLE, and semantic compression strategies.
Plugin systemcore/plugins.pyMnemosynePlugin base class with 4 lifecycle hooks. Discovers plugins from ~/.hermes/mnemosyne/plugins/.
Diagnosticsdiagnose.pyPII-safe health check -- dependencies, database state, vector readiness. No memory content or API keys.
Temporal knowledge graphTripleStoreSubject-predicate-object triples with valid_from/valid_until for point-in-time queries.
Hybrid retrievalrecall()Three-signal hybrid (vector + FTS5 + importance) with configurable weights and recency decay.

Honcho-specific (not in Mnemosyne)

FeatureDescription
Background reasoningNeuromancer model "dreams" about stored data during idle time. Produces representations, summaries, and logical deductions. Unique among open-source memory systems.
Logical deductionInfers facts and relationships never explicitly stated. E.g., "User is vegetarian" from messages about ordering tofu and avoiding meat restaurants.
Peer paradigmHumans and AIs are both first-class peers. Handles group chats and multi-party conversations natively.
Token-budgeted contextContext API delivers relevant chunks trimmed to a specific token count. Ideal for prompt injection with predictable token usage.
NL Q&AChat API accepts natural language questions and returns synthesized answers, not just search results.
SOTA benchmarks90.4% on LongMem, 89.9% on LoCoMo -- leading results for memory reasoning tasks.
Token savingsNeuromancer representations achieve 60--90% token reduction compared to raw message history.
Managed cloudPlastic Labs provides a hosted service with $100 free credits. No PostgreSQL to manage.

Performance Characteristics

MetricMnemosyneHoncho Self-Hosted
Recall latency (10K corpus)~2--10ms -- in-process SQLite + sqlite-vec, no HTTP overhead~50--200ms -- HTTP round-trip + PostgreSQL + pgvector
Chat Q&A latencyN/A (not built-in)~1--5s -- LLM call for natural language reasoning over retrieved context
Reasoning latencyN/A (no background reasoning)Asynchronous -- Neuromancer runs in background. Results available when complete, not on-demand.
IPC modelDirect Python function callHTTP POST to Honcho server -> JSON serialization -> response parsing
Storage footprint~50--100MB SQLite file per 10K memories~500MB--1GB PostgreSQL + pgvector per 10K messages (includes Neuromancer representations)
Model downloadOne-time ~67MB (fastembed ONNX)No local models required (all LLM calls are API-based). Embedding model configured via provider.
Runtime memory~10--20MB per session~200--500MB (PostgreSQL pool + FastAPI + background worker)
LLM calls per operation0--1 (recall is local; fact extraction calls LLM once)2+ per Chat API query (embedding + completion). Background: continuous Neuromancer calls during idle.
Startup timeInstant~10--20s (Docker + PostgreSQL + Neuromancer worker init)

Important caveat: Honcho's higher latency and cost come from its reasoning capability. Every Chat API call involves at least one LLM inference. Every stored message may eventually be processed by Neuromancer. This is not overhead -- it is the product. If you do not need reasoning about memory, you are paying for something you will not use.


When to Choose What

Choose Mnemosyne if:

  • You want pip install with zero containers and no PostgreSQL
  • You need the fastest possible recall latency for interactive agent loops
  • You're running on a resource-constrained environment (VPS, ephemeral VM, CI)
  • You're building a single-user, single-machine agent (Hermes, Claude Desktop, etc.)
  • You want an MCP-compatible memory layer (stdio + SSE)
  • You want deterministic, predictable memory behavior without autonomous reasoning
  • You want hybrid retrieval (vector + keyword + importance) with configurable weights
  • You want memory banks with per-bank SQLite isolation without standing up PostgreSQL
  • You are building a commercial product and need MIT licensing

Choose Honcho if:

  • You need genuine reasoning about memory -- the system should tell you things it was never explicitly told
  • You are building long-running agents that need to build a deep, inferred model of the user over time
  • You need group chat or multi-party conversation support with first-class peer identities
  • You want token-budgeted context injection for predictable LLM prompt costs
  • You want natural language Q&A over stored memory (not just search results)
  • You are okay with the infrastructure cost: PostgreSQL + pgvector + 3 LLM provider keys + background worker
  • You are comfortable with AGPL-3.0 licensing (or willing to pay for the managed service)
  • You have budget for ongoing LLM API costs (embedding, completion, and Neuromancer reasoning)
  • You want SOTA memory reasoning performance (90.4% LongMem, 89.9% LoCoMo)

Neither is "better." They solve fundamentally different problems. Honcho is a reasoning memory system that derives knowledge. Mnemosyne is a storage memory system that retrieves what you put in. Choose Honcho if you want the system to think about what it remembers. Choose Mnemosyne if you want fast, simple, predictable storage and retrieval.


Known Gaps in Mnemosyne (honest list)

GapSeverityWorkaround
No background reasoning or deductionMedium for long-running agentssleep() does summarization but no logical inference. Combine Mnemosyne with an LLM that reasons over recall results.
No natural language Q&A over memoryLow for most use casesCaller constructs prompt from recall() results and passes to LLM. Not a built-in API but functionally equivalent.
No token-budgeted context outputLowCaller controls how many recall results to include. Use limit param on recall() and trim if needed.
No peer/participant identity modelMedium for group chat scenariosTag memories with user IDs manually. Use context_label or structured content. No native multi-party support.
No cross-machine network APIMedium for multi-agent setupsExport/import JSON; same-machine sharing via shared SQLite file
No SOTA memory reasoning benchmarksLowMnemosyne is not designed as a reasoning system. Different product category.
No cross-encoder rerankingLow for most queriesHybrid scoring with configurable weights covers common cases
No multi-tenancy / access controlHigh for SaaS use casesUse per-bank SQLite isolation for domain separation

Every feature listed for Mnemosyne has been verified against the v2.8.0 source code. Honcho features are based on the open-source repository and Plastic Labs documentation as of May 2026. If anything here is wrong, please open an issue -- we'll fix it.