Mnemosyne vs Honcho
An honest, technical comparison for users running memory systems locally with Hermes Agent and OpenClaw.
Last updated: 2026-05-13 · Mnemosyne v2.8.0
TL;DR: Honcho is the only memory system with genuine reasoning about memory -- not just store-and-retrieve. It uses a background reasoning model (Neuromancer) that "dreams" about stored data to build representations, summaries, and logical deductions. Mnemosyne is a lightweight, deterministic memory layer. Honcho is what you choose when you want the system to think about what it knows; Mnemosyne is what you choose when you want fast, predictable storage and retrieval without standing up a reasoning pipeline.
Architecture
| Dimension | Mnemosyne | Honcho Self-Hosted |
|---|---|---|
| Process model | In-process Python library | Docker containers (FastAPI + PostgreSQL + background Neuromancer worker) |
| IPC overhead | Zero (direct function calls) | HTTP + JSON serialization to Honcho API |
| Database | SQLite (single file, WAL mode) | PostgreSQL + pgvector |
| Embedding model | fastembed ONNX -- BAAI/bge-small-en-v1.5 (~67MB) | OpenAI-compatible embedding endpoint (configurable) |
| Vector search | sqlite-vec (int8/bit/float32) or numpy fallback | pgvector (cosine similarity) |
| Cold start | Instant (if models cached locally) | ~10--20s (Docker containers + PostgreSQL init + Neuromancer warm-up) |
| Runtime memory | ~10--20MB per session (SQLite + ONNX) | ~200--500MB (PostgreSQL pool + FastAPI + Neuromancer model runtime) |
| Stars / community | New (v2.8.0) | ~800 GitHub stars, Plastic Labs team |
| License | MIT | AGPL-3.0 (restrictive for commercial use) |
| Pricing | Free (MIT) | Managed service: $100 free credits, $2/million tokens ingestion, reasoning fees per call. Self-hosted: free (but you pay for 3+ LLM provider keys) |
Memory Model
Mnemosyne: BEAM (Bilevel Episodic-Associative Memory)
Three SQLite tables:
| Tier | Purpose | Behavior |
|---|---|---|
| Working memory | Hot, recent context | Auto-injected into prompts. TTL-based eviction (default 24h). Max 10,000 items. FTS5 indexed. |
| Episodic memory | Long-term consolidated storage | Populated by sleep() consolidation. Hybrid vector + FTS5 search. |
| Scratchpad | Temporary agent workspace | Not searchable, not consolidated. Cleared explicitly. Max 1,000 items. |
Additional: TripleStore -- temporal knowledge graph with valid_from/valid_until for point-in-time queries.
Core operations: remember(), recall(), sleep() -- intentionally simple.
Honcho: Entity-Centric Peer Paradigm
Honcho models all participants -- both humans and AIs -- as Peers. Memory is organized around who said what, in which session.
| Tier | Purpose | Behavior |
|---|---|---|
| Workspaces | Top-level container | Groups peers and sessions. Typically one per application or project. |
| Peers | Humans and AIs | First-class entities. Both users and agents are peers with their own identity. Enables group chat memory out of the box. |
| Sessions | Conversations | Messages grouped into sessions. A session has multiple peers participating. |
| Messages | Raw interaction data | The atomic unit of memory. Each message belongs to a peer within a session. |
Background reasoning: Honcho runs Neuromancer -- a dedicated reasoning model -- as a background worker. Neuromancer "dreams" about stored data during idle time, producing:
- Representations: Dense summaries and embeddings of conversational data
- Summaries: Compressed versions of long exchanges
- Logical deductions: Inferred facts and relationships that were never explicitly stated
Core operations: create_message(), get_chat_response() (NL Q&A), search(), get_context() (token-budgeted retrieval).
Key difference: Mnemosyne stores what you give it and retrieves it on demand. Honcho reasons about what you give it in the background, building a derived understanding that goes beyond the raw data. This is the only open-source memory system with genuine deductive reasoning.
Retrieval
| Feature | Mnemosyne | Honcho Self-Hosted |
|---|---|---|
| Vector search | sqlite-vec (cosine distance) | pgvector (cosine similarity) |
| Keyword search | SQLite FTS5 | PostgreSQL full-text (via Search API) |
| NL Q&A | Not built-in (LLM caller constructs prompt from recall results) | Chat API -- natural language question answering over stored memory. Honcho formulates queries and synthesizes answers. |
| Token-budgeted context | Not built-in. Caller controls how many recall results to include. | Context API -- returns relevant memory chunks trimmed to fit a specified token budget. Ideal for injecting into LLM prompts. |
| Semantic search | Built into recall() via vector scoring | Dedicated Search API with configurable similarity thresholds |
| Graph search | TripleStore (subject-predicate-object, temporal) | No explicit knowledge graph. Neuromancer's logical deductions serve a similar role for inferred relationships. |
| Temporal search | temporal_weight + temporal_halflife params on recall() | Message-level timestamps. Session-based grouping. No explicit temporal decay scoring. |
| Scoring formula | vec_weight × vec + fts_weight × fts + importance_weight × importance, then × recency decay | Cosine similarity for Search API. Chat API uses LLM reasoning (not raw scoring) to determine relevance. |
| Reranking | None (single-pass) | Neuromancer reasoning acts as implicit reranking -- the model evaluates relevance semantically rather than scoring vectors. |
Reasoning: Honcho's Defining Feature
This is where Honcho is genuinely unique. No other open-source memory system runs a background reasoning model that derives new knowledge from stored data.
| Feature | Mnemosyne | Honcho |
|---|---|---|
| Background reasoning | No. sleep() is developer-triggered summarization only. | Yes. Neuromancer runs continuously as a background worker, "dreaming" about stored data. |
| Logical deduction | No. Mnemosyne stores facts you give it. It does not infer new ones. | Yes. Neuromancer derives inferred relationships -- e.g., "User prefers morning meetings" from scattered scheduling messages. |
| Token efficiency | Caller controls what goes into prompts via recall limits. | Neuromancer achieves 60--90% token savings by compressing raw messages into dense representations. |
| SOTA benchmarks | Not benchmarked on memory reasoning tasks. | 90.4% on LongMem, 89.9% on LoCoMo (state-of-the-art for memory reasoning). |
| Group chat handling | No native concept of participants. Caller can tag memories by agent/user manually. | Peer paradigm handles multi-party conversations natively. Each message is attributed to a peer in a session. |
| Model requirements | Works with any local or remote model for optional fact extraction. | Requires 3 LLM provider keys: (1) embedding model, (2) chat/completion model for Chat API, (3) Neuromancer reasoning model for background dreaming. |
Honest assessment: Honcho's reasoning capability is genuinely novel. It is the only system that can tell you something it was never explicitly told -- it deduces it. This is powerful for long-running agents that need to build a model of the user over time. But it comes at a significant infrastructure cost: PostgreSQL + pgvector + background worker + three separate LLM API keys. For many use cases, deterministic store-and-retrieve is sufficient.
Entity and Fact Handling
| Feature | Mnemosyne | Honcho |
|---|---|---|
| Entity model | Regex patterns + Levenshtein fuzzy matching. Entities extracted into TripleStore as (memory_id, relation, entity) triples. | Peers are first-class entities with identity. Humans and AIs are treated symmetrically. No automatic entity extraction from message text. |
| Fact extraction | LLM-driven via extract=True on remember(). Parses 2--5 factual statements, stores in TripleStore. | Neuromancer deduces facts during background reasoning. Facts are derived, not extracted -- they emerge from reasoning over message history. |
| Conflict handling | Manual via invalidate(). No automatic contradiction detection. | Neuromancer can detect contradictions during reasoning and resolve them. |
| Provenance | Memory-level: each fact links back to its source memory via memory_id. | Message-level: all deductions trace back to the messages that produced them. |
Integrations
Mnemosyne Integration
Mnemosyne provides an MCP server with 6 tools and 2 transports:
| Tool | Description |
|---|---|
mnemosyne_remember | Store a memory (supports entity extraction, fact extraction, bank selection) |
mnemosyne_recall | Search memories with hybrid scoring and configurable weights |
mnemosyne_sleep | Run consolidation cycle |
mnemosyne_scratchpad_read | Read agent scratchpad |
mnemosyne_scratchpad_write | Write to scratchpad |
mnemosyne_get_stats | Get memory statistics |
mnemosyne mcp # stdio transport (Claude Desktop, etc.)
mnemosyne mcp --transport sse --port 8080 # SSE transport (web clients)
mnemosyne mcp --bank project_a # scoped to a specific bank
Honcho Integration
Honcho provides a REST API and Python SDK. Integration is via HTTP calls to the Honcho server. The Neuromancer background worker runs independently.
| Mnemosyne | Honcho | |
|---|---|---|
| Hermes | Native (in-process, no serialization) | HTTP client to Honcho REST API |
| OpenClaw | Planned (adapter not yet built) | HTTP client (custom integration required) |
| MCP | 6 tools, stdio + SSE | Not MCP-native. REST API only. |
| Cross-machine | Export/import JSON only | REST API -- any machine with HTTP access to the Honcho server |
| Python SDK | from mnemosyne import Mnemosyne (direct import) | pip install honcho -- REST client to Honcho server |
Memory Banks / Workspaces
| Feature | Mnemosyne | Honcho |
|---|---|---|
| Named isolation | BankManager -- create, list, delete, rename banks | Workspaces -- create, list, delete. Each workspace is a top-level container. |
| Isolation | Per-bank SQLite file under data_dir/banks/<name>/ | Per-workspace PostgreSQL rows. Peers and sessions are scoped to a workspace. |
| Usage | Mnemosyne(bank="work") or mnemosyne mcp --bank work | API-level workspace selection. Each API call specifies a workspace_id. |
| Multi-tenancy | No access control | Workspace-level isolation. No built-in access control for self-hosted. |
Additional Features
Mnemosyne-specific (not in Honcho)
| Feature | Module | Description |
|---|---|---|
| Streaming | core/streaming.py | MemoryStream with push (callbacks) and pull (iterator) patterns. Thread-safe event buffer. |
| Delta sync | core/streaming.py | DeltaSync -- incremental synchronization between Mnemosyne instances with checkpointed resume. |
| Pattern detection | core/patterns.py | PatternDetector -- temporal (hour/weekday), content (keyword frequency, co-occurrence), sequence patterns. |
| Memory compression | core/patterns.py | MemoryCompressor -- dictionary-based, RLE, and semantic compression strategies. |
| Plugin system | core/plugins.py | MnemosynePlugin base class with 4 lifecycle hooks. Discovers plugins from ~/.hermes/mnemosyne/plugins/. |
| Diagnostics | diagnose.py | PII-safe health check -- dependencies, database state, vector readiness. No memory content or API keys. |
| Temporal knowledge graph | TripleStore | Subject-predicate-object triples with valid_from/valid_until for point-in-time queries. |
| Hybrid retrieval | recall() | Three-signal hybrid (vector + FTS5 + importance) with configurable weights and recency decay. |
Honcho-specific (not in Mnemosyne)
| Feature | Description |
|---|---|
| Background reasoning | Neuromancer model "dreams" about stored data during idle time. Produces representations, summaries, and logical deductions. Unique among open-source memory systems. |
| Logical deduction | Infers facts and relationships never explicitly stated. E.g., "User is vegetarian" from messages about ordering tofu and avoiding meat restaurants. |
| Peer paradigm | Humans and AIs are both first-class peers. Handles group chats and multi-party conversations natively. |
| Token-budgeted context | Context API delivers relevant chunks trimmed to a specific token count. Ideal for prompt injection with predictable token usage. |
| NL Q&A | Chat API accepts natural language questions and returns synthesized answers, not just search results. |
| SOTA benchmarks | 90.4% on LongMem, 89.9% on LoCoMo -- leading results for memory reasoning tasks. |
| Token savings | Neuromancer representations achieve 60--90% token reduction compared to raw message history. |
| Managed cloud | Plastic Labs provides a hosted service with $100 free credits. No PostgreSQL to manage. |
Performance Characteristics
| Metric | Mnemosyne | Honcho Self-Hosted |
|---|---|---|
| Recall latency (10K corpus) | ~2--10ms -- in-process SQLite + sqlite-vec, no HTTP overhead | ~50--200ms -- HTTP round-trip + PostgreSQL + pgvector |
| Chat Q&A latency | N/A (not built-in) | ~1--5s -- LLM call for natural language reasoning over retrieved context |
| Reasoning latency | N/A (no background reasoning) | Asynchronous -- Neuromancer runs in background. Results available when complete, not on-demand. |
| IPC model | Direct Python function call | HTTP POST to Honcho server -> JSON serialization -> response parsing |
| Storage footprint | ~50--100MB SQLite file per 10K memories | ~500MB--1GB PostgreSQL + pgvector per 10K messages (includes Neuromancer representations) |
| Model download | One-time ~67MB (fastembed ONNX) | No local models required (all LLM calls are API-based). Embedding model configured via provider. |
| Runtime memory | ~10--20MB per session | ~200--500MB (PostgreSQL pool + FastAPI + background worker) |
| LLM calls per operation | 0--1 (recall is local; fact extraction calls LLM once) | 2+ per Chat API query (embedding + completion). Background: continuous Neuromancer calls during idle. |
| Startup time | Instant | ~10--20s (Docker + PostgreSQL + Neuromancer worker init) |
Important caveat: Honcho's higher latency and cost come from its reasoning capability. Every Chat API call involves at least one LLM inference. Every stored message may eventually be processed by Neuromancer. This is not overhead -- it is the product. If you do not need reasoning about memory, you are paying for something you will not use.
When to Choose What
Choose Mnemosyne if:
- You want
pip installwith zero containers and no PostgreSQL - You need the fastest possible recall latency for interactive agent loops
- You're running on a resource-constrained environment (VPS, ephemeral VM, CI)
- You're building a single-user, single-machine agent (Hermes, Claude Desktop, etc.)
- You want an MCP-compatible memory layer (stdio + SSE)
- You want deterministic, predictable memory behavior without autonomous reasoning
- You want hybrid retrieval (vector + keyword + importance) with configurable weights
- You want memory banks with per-bank SQLite isolation without standing up PostgreSQL
- You are building a commercial product and need MIT licensing
Choose Honcho if:
- You need genuine reasoning about memory -- the system should tell you things it was never explicitly told
- You are building long-running agents that need to build a deep, inferred model of the user over time
- You need group chat or multi-party conversation support with first-class peer identities
- You want token-budgeted context injection for predictable LLM prompt costs
- You want natural language Q&A over stored memory (not just search results)
- You are okay with the infrastructure cost: PostgreSQL + pgvector + 3 LLM provider keys + background worker
- You are comfortable with AGPL-3.0 licensing (or willing to pay for the managed service)
- You have budget for ongoing LLM API costs (embedding, completion, and Neuromancer reasoning)
- You want SOTA memory reasoning performance (90.4% LongMem, 89.9% LoCoMo)
Neither is "better." They solve fundamentally different problems. Honcho is a reasoning memory system that derives knowledge. Mnemosyne is a storage memory system that retrieves what you put in. Choose Honcho if you want the system to think about what it remembers. Choose Mnemosyne if you want fast, simple, predictable storage and retrieval.
Known Gaps in Mnemosyne (honest list)
| Gap | Severity | Workaround |
|---|---|---|
| No background reasoning or deduction | Medium for long-running agents | sleep() does summarization but no logical inference. Combine Mnemosyne with an LLM that reasons over recall results. |
| No natural language Q&A over memory | Low for most use cases | Caller constructs prompt from recall() results and passes to LLM. Not a built-in API but functionally equivalent. |
| No token-budgeted context output | Low | Caller controls how many recall results to include. Use limit param on recall() and trim if needed. |
| No peer/participant identity model | Medium for group chat scenarios | Tag memories with user IDs manually. Use context_label or structured content. No native multi-party support. |
| No cross-machine network API | Medium for multi-agent setups | Export/import JSON; same-machine sharing via shared SQLite file |
| No SOTA memory reasoning benchmarks | Low | Mnemosyne is not designed as a reasoning system. Different product category. |
| No cross-encoder reranking | Low for most queries | Hybrid scoring with configurable weights covers common cases |
| No multi-tenancy / access control | High for SaaS use cases | Use per-bank SQLite isolation for domain separation |
Every feature listed for Mnemosyne has been verified against the v2.8.0 source code. Honcho features are based on the open-source repository and Plastic Labs documentation as of May 2026. If anything here is wrong, please open an issue -- we'll fix it.
Mnemosyne