Mnemosyne vs Honcho

An honest, technical comparison for users running memory systems locally with Hermes Agent and OpenClaw.

Last updated: 2026-05-13 · Mnemosyne v2.8.0

TL;DR: Honcho is the only memory system with genuine reasoning about memory -- not just store-and-retrieve. It uses a background reasoning model (Neuromancer) that "dreams" about stored data to build representations, summaries, and logical deductions. Mnemosyne is a lightweight, deterministic memory layer. Honcho is what you choose when you want the system to think about what it knows; Mnemosyne is what you choose when you want fast, predictable storage and retrieval without standing up a reasoning pipeline.

Architecture

Dimension	Mnemosyne	Honcho Self-Hosted
Process model	In-process Python library	Docker containers (FastAPI + PostgreSQL + background Neuromancer worker)
IPC overhead	Zero (direct function calls)	HTTP + JSON serialization to Honcho API
Database	SQLite (single file, WAL mode)	PostgreSQL + pgvector
Embedding model	fastembed ONNX -- BAAI/bge-small-en-v1.5 (~67MB)	OpenAI-compatible embedding endpoint (configurable)
Vector search	sqlite-vec (int8/bit/float32) or numpy fallback	pgvector (cosine similarity)
Cold start	Instant (if models cached locally)	~10--20s (Docker containers + PostgreSQL init + Neuromancer warm-up)
Runtime memory	~10--20MB per session (SQLite + ONNX)	~200--500MB (PostgreSQL pool + FastAPI + Neuromancer model runtime)
Stars / community	New (v2.8.0)	~800 GitHub stars, Plastic Labs team
License	MIT	AGPL-3.0 (restrictive for commercial use)
Pricing	Free (MIT)	Managed service: $100 free credits, $2/million tokens ingestion, reasoning fees per call. Self-hosted: free (but you pay for 3+ LLM provider keys)

Memory Model

Mnemosyne: BEAM (Bilevel Episodic-Associative Memory)

Three SQLite tables:

Tier	Purpose	Behavior
Working memory	Hot, recent context	Auto-injected into prompts. TTL-based eviction (default 24h). Max 10,000 items. FTS5 indexed.
Episodic memory	Long-term consolidated storage	Populated by `sleep()` consolidation. Hybrid vector + FTS5 search.
Scratchpad	Temporary agent workspace	Not searchable, not consolidated. Cleared explicitly. Max 1,000 items.

Additional: TripleStore -- temporal knowledge graph with valid_from/valid_until for point-in-time queries.

Core operations: remember(), recall(), sleep() -- intentionally simple.

Honcho: Entity-Centric Peer Paradigm

Honcho models all participants -- both humans and AIs -- as Peers. Memory is organized around who said what, in which session.

Tier	Purpose	Behavior
Workspaces	Top-level container	Groups peers and sessions. Typically one per application or project.
Peers	Humans and AIs	First-class entities. Both users and agents are peers with their own identity. Enables group chat memory out of the box.
Sessions	Conversations	Messages grouped into sessions. A session has multiple peers participating.
Messages	Raw interaction data	The atomic unit of memory. Each message belongs to a peer within a session.

Background reasoning: Honcho runs Neuromancer -- a dedicated reasoning model -- as a background worker. Neuromancer "dreams" about stored data during idle time, producing:

Representations: Dense summaries and embeddings of conversational data
Summaries: Compressed versions of long exchanges
Logical deductions: Inferred facts and relationships that were never explicitly stated

Core operations: create_message(), get_chat_response() (NL Q&A), search(), get_context() (token-budgeted retrieval).

Key difference: Mnemosyne stores what you give it and retrieves it on demand. Honcho reasons about what you give it in the background, building a derived understanding that goes beyond the raw data. This is the only open-source memory system with genuine deductive reasoning.

Retrieval

Feature	Mnemosyne	Honcho Self-Hosted
Vector search	sqlite-vec (cosine distance)	pgvector (cosine similarity)
Keyword search	SQLite FTS5	PostgreSQL full-text (via Search API)
NL Q&A	Not built-in (LLM caller constructs prompt from recall results)	Chat API -- natural language question answering over stored memory. Honcho formulates queries and synthesizes answers.
Token-budgeted context	Not built-in. Caller controls how many recall results to include.	Context API -- returns relevant memory chunks trimmed to fit a specified token budget. Ideal for injecting into LLM prompts.
Semantic search	Built into `recall()` via vector scoring	Dedicated Search API with configurable similarity thresholds
Graph search	TripleStore (subject-predicate-object, temporal)	No explicit knowledge graph. Neuromancer's logical deductions serve a similar role for inferred relationships.
Temporal search	`temporal_weight` + `temporal_halflife` params on `recall()`	Message-level timestamps. Session-based grouping. No explicit temporal decay scoring.
Scoring formula	`vec_weight × vec + fts_weight × fts + importance_weight × importance`, then × recency decay	Cosine similarity for Search API. Chat API uses LLM reasoning (not raw scoring) to determine relevance.
Reranking	None (single-pass)	Neuromancer reasoning acts as implicit reranking -- the model evaluates relevance semantically rather than scoring vectors.

Reasoning: Honcho's Defining Feature

This is where Honcho is genuinely unique. No other open-source memory system runs a background reasoning model that derives new knowledge from stored data.

Feature	Mnemosyne	Honcho
Background reasoning	No. `sleep()` is developer-triggered summarization only.	Yes. Neuromancer runs continuously as a background worker, "dreaming" about stored data.
Logical deduction	No. Mnemosyne stores facts you give it. It does not infer new ones.	Yes. Neuromancer derives inferred relationships -- e.g., "User prefers morning meetings" from scattered scheduling messages.
Token efficiency	Caller controls what goes into prompts via recall limits.	Neuromancer achieves 60--90% token savings by compressing raw messages into dense representations.
SOTA benchmarks	Not benchmarked on memory reasoning tasks.	90.4% on LongMem, 89.9% on LoCoMo (state-of-the-art for memory reasoning).
Group chat handling	No native concept of participants. Caller can tag memories by agent/user manually.	Peer paradigm handles multi-party conversations natively. Each message is attributed to a peer in a session.
Model requirements	Works with any local or remote model for optional fact extraction.	Requires 3 LLM provider keys: (1) embedding model, (2) chat/completion model for Chat API, (3) Neuromancer reasoning model for background dreaming.

Honest assessment: Honcho's reasoning capability is genuinely novel. It is the only system that can tell you something it was never explicitly told -- it deduces it. This is powerful for long-running agents that need to build a model of the user over time. But it comes at a significant infrastructure cost: PostgreSQL + pgvector + background worker + three separate LLM API keys. For many use cases, deterministic store-and-retrieve is sufficient.

Entity and Fact Handling

Feature	Mnemosyne	Honcho
Entity model	Regex patterns + Levenshtein fuzzy matching. Entities extracted into TripleStore as `(memory_id, relation, entity)` triples.	Peers are first-class entities with identity. Humans and AIs are treated symmetrically. No automatic entity extraction from message text.
Fact extraction	LLM-driven via `extract=True` on `remember()`. Parses 2--5 factual statements, stores in TripleStore.	Neuromancer deduces facts during background reasoning. Facts are derived, not extracted -- they emerge from reasoning over message history.
Conflict handling	Manual via `invalidate()`. No automatic contradiction detection.	Neuromancer can detect contradictions during reasoning and resolve them.
Provenance	Memory-level: each fact links back to its source memory via `memory_id`.	Message-level: all deductions trace back to the messages that produced them.

Integrations

Mnemosyne Integration

Mnemosyne provides an MCP server with 6 tools and 2 transports:

Tool	Description
`mnemosyne_remember`	Store a memory (supports entity extraction, fact extraction, bank selection)
`mnemosyne_recall`	Search memories with hybrid scoring and configurable weights
`mnemosyne_sleep`	Run consolidation cycle
`mnemosyne_scratchpad_read`	Read agent scratchpad
`mnemosyne_scratchpad_write`	Write to scratchpad
`mnemosyne_get_stats`	Get memory statistics

mnemosyne mcp                          # stdio transport (Claude Desktop, etc.)
mnemosyne mcp --transport sse --port 8080  # SSE transport (web clients)
mnemosyne mcp --bank project_a            # scoped to a specific bank

Honcho Integration

Honcho provides a REST API and Python SDK. Integration is via HTTP calls to the Honcho server. The Neuromancer background worker runs independently.

	Mnemosyne	Honcho
Hermes	Native (in-process, no serialization)	HTTP client to Honcho REST API
OpenClaw	Planned (adapter not yet built)	HTTP client (custom integration required)
MCP	6 tools, stdio + SSE	Not MCP-native. REST API only.
Cross-machine	Export/import JSON only	REST API -- any machine with HTTP access to the Honcho server
Python SDK	`from mnemosyne import Mnemosyne` (direct import)	`pip install honcho` -- REST client to Honcho server

Memory Banks / Workspaces

Feature	Mnemosyne	Honcho
Named isolation	`BankManager` -- create, list, delete, rename banks	Workspaces -- create, list, delete. Each workspace is a top-level container.
Isolation	Per-bank SQLite file under `data_dir/banks/<name>/`	Per-workspace PostgreSQL rows. Peers and sessions are scoped to a workspace.
Usage	`Mnemosyne(bank="work")` or `mnemosyne mcp --bank work`	API-level workspace selection. Each API call specifies a `workspace_id`.
Multi-tenancy	No access control	Workspace-level isolation. No built-in access control for self-hosted.

Additional Features

Mnemosyne-specific (not in Honcho)

Feature	Module	Description
Streaming	`core/streaming.py`	`MemoryStream` with push (callbacks) and pull (iterator) patterns. Thread-safe event buffer.
Delta sync	`core/streaming.py`	`DeltaSync` -- incremental synchronization between Mnemosyne instances with checkpointed resume.
Pattern detection	`core/patterns.py`	`PatternDetector` -- temporal (hour/weekday), content (keyword frequency, co-occurrence), sequence patterns.
Memory compression	`core/patterns.py`	`MemoryCompressor` -- dictionary-based, RLE, and semantic compression strategies.
Plugin system	`core/plugins.py`	`MnemosynePlugin` base class with 4 lifecycle hooks. Discovers plugins from `~/.hermes/mnemosyne/plugins/`.
Diagnostics	`diagnose.py`	PII-safe health check -- dependencies, database state, vector readiness. No memory content or API keys.
Temporal knowledge graph	TripleStore	Subject-predicate-object triples with `valid_from`/`valid_until` for point-in-time queries.
Hybrid retrieval	`recall()`	Three-signal hybrid (vector + FTS5 + importance) with configurable weights and recency decay.

Honcho-specific (not in Mnemosyne)

Feature	Description
Background reasoning	Neuromancer model "dreams" about stored data during idle time. Produces representations, summaries, and logical deductions. Unique among open-source memory systems.
Logical deduction	Infers facts and relationships never explicitly stated. E.g., "User is vegetarian" from messages about ordering tofu and avoiding meat restaurants.
Peer paradigm	Humans and AIs are both first-class peers. Handles group chats and multi-party conversations natively.
Token-budgeted context	Context API delivers relevant chunks trimmed to a specific token count. Ideal for prompt injection with predictable token usage.
NL Q&A	Chat API accepts natural language questions and returns synthesized answers, not just search results.
SOTA benchmarks	90.4% on LongMem, 89.9% on LoCoMo -- leading results for memory reasoning tasks.
Token savings	Neuromancer representations achieve 60--90% token reduction compared to raw message history.
Managed cloud	Plastic Labs provides a hosted service with $100 free credits. No PostgreSQL to manage.

Performance Characteristics

Metric	Mnemosyne	Honcho Self-Hosted
Recall latency (10K corpus)	~2--10ms -- in-process SQLite + sqlite-vec, no HTTP overhead	~50--200ms -- HTTP round-trip + PostgreSQL + pgvector
Chat Q&A latency	N/A (not built-in)	~1--5s -- LLM call for natural language reasoning over retrieved context
Reasoning latency	N/A (no background reasoning)	Asynchronous -- Neuromancer runs in background. Results available when complete, not on-demand.
IPC model	Direct Python function call	HTTP POST to Honcho server -> JSON serialization -> response parsing
Storage footprint	~50--100MB SQLite file per 10K memories	~500MB--1GB PostgreSQL + pgvector per 10K messages (includes Neuromancer representations)
Model download	One-time ~67MB (fastembed ONNX)	No local models required (all LLM calls are API-based). Embedding model configured via provider.
Runtime memory	~10--20MB per session	~200--500MB (PostgreSQL pool + FastAPI + background worker)
LLM calls per operation	0--1 (recall is local; fact extraction calls LLM once)	2+ per Chat API query (embedding + completion). Background: continuous Neuromancer calls during idle.
Startup time	Instant	~10--20s (Docker + PostgreSQL + Neuromancer worker init)

Important caveat: Honcho's higher latency and cost come from its reasoning capability. Every Chat API call involves at least one LLM inference. Every stored message may eventually be processed by Neuromancer. This is not overhead -- it is the product. If you do not need reasoning about memory, you are paying for something you will not use.

When to Choose What

Choose Mnemosyne if:

You want pip install with zero containers and no PostgreSQL
You need the fastest possible recall latency for interactive agent loops
You're running on a resource-constrained environment (VPS, ephemeral VM, CI)
You're building a single-user, single-machine agent (Hermes, Claude Desktop, etc.)
You want an MCP-compatible memory layer (stdio + SSE)
You want deterministic, predictable memory behavior without autonomous reasoning
You want hybrid retrieval (vector + keyword + importance) with configurable weights
You want memory banks with per-bank SQLite isolation without standing up PostgreSQL
You are building a commercial product and need MIT licensing

Choose Honcho if:

You need genuine reasoning about memory -- the system should tell you things it was never explicitly told
You are building long-running agents that need to build a deep, inferred model of the user over time
You need group chat or multi-party conversation support with first-class peer identities
You want token-budgeted context injection for predictable LLM prompt costs
You want natural language Q&A over stored memory (not just search results)
You are okay with the infrastructure cost: PostgreSQL + pgvector + 3 LLM provider keys + background worker
You are comfortable with AGPL-3.0 licensing (or willing to pay for the managed service)
You have budget for ongoing LLM API costs (embedding, completion, and Neuromancer reasoning)
You want SOTA memory reasoning performance (90.4% LongMem, 89.9% LoCoMo)

Neither is "better." They solve fundamentally different problems. Honcho is a reasoning memory system that derives knowledge. Mnemosyne is a storage memory system that retrieves what you put in. Choose Honcho if you want the system to think about what it remembers. Choose Mnemosyne if you want fast, simple, predictable storage and retrieval.

Known Gaps in Mnemosyne (honest list)

Gap	Severity	Workaround
No background reasoning or deduction	Medium for long-running agents	`sleep()` does summarization but no logical inference. Combine Mnemosyne with an LLM that reasons over recall results.
No natural language Q&A over memory	Low for most use cases	Caller constructs prompt from `recall()` results and passes to LLM. Not a built-in API but functionally equivalent.
No token-budgeted context output	Low	Caller controls how many recall results to include. Use `limit` param on `recall()` and trim if needed.
No peer/participant identity model	Medium for group chat scenarios	Tag memories with user IDs manually. Use `context_label` or structured content. No native multi-party support.
No cross-machine network API	Medium for multi-agent setups	Export/import JSON; same-machine sharing via shared SQLite file
No SOTA memory reasoning benchmarks	Low	Mnemosyne is not designed as a reasoning system. Different product category.
No cross-encoder reranking	Low for most queries	Hybrid scoring with configurable weights covers common cases
No multi-tenancy / access control	High for SaaS use cases	Use per-bank SQLite isolation for domain separation

Every feature listed for Mnemosyne has been verified against the v2.8.0 source code. Honcho features are based on the open-source repository and Plastic Labs documentation as of May 2026. If anything here is wrong, please open an issue -- we'll fix it.