BEAM Architecture Overview
BEAM (Biological-inspired Episodic-Associative Memory) is the core architecture powering Mnemosyne. It models human memory organization with three tiers plus TripleStore, each optimized for different retention and retrieval patterns.
The Three Memory Tiers + TripleStore
graph TB
subgraph Input["Agent Input"]
I1[Observation]
I2[Conversation]
I3[Tool Result]
end
subgraph BEAM["BEAM Memory System"]
WM[Working Memory
Short-term, up to 10K entries
<100ms access]
EM[Episodic Memory
Long-term experiences
Hybrid retrieval]
SP[Scratchpad
Temporary workspace
Up to 1K entries]
end
I1 -->|remember| WM
I2 -->|remember| WM
I3 -->|remember| WM
WM -->|sleep consolidation| EM
EM -->|recall| Agent
style WM fill:#e0f2fe,stroke:#0284c7
style EM fill:#fef3c7,stroke:#d97706
style SP fill:#faf5ff,stroke:#9333ea
Working Memory
The agent's immediate context window. Holds recent observations, conversation history, and active tool results. Fastest access (<100ms), limited capacity (up to 10,000 entries), and subject to eviction via sleep consolidation.
Key characteristics:
- Latency: Sub-100ms median
- Capacity: Up to 10,000 entries (configurable via
MNEMOSYNE_WM_MAX_ITEMS, default 10,000) - TTL: 24 hours (configurable via
MNEMOSYNE_WM_TTL_HOURS, default 24) - Persistence: Survives process restarts via SQLite
Episodic Memory
Long-term storage for experiences, conversations, and events. Organized chronologically with rich metadata. Uses hybrid retrieval combining vector similarity and full-text search.
Key characteristics:
- Retention: Permanent (until explicit deletion)
- Retrieval: Hybrid (vector + FTS5) with configurable scoring weights (
vec_weight,fts_weight,importance_weight, defaulting to 50/30/20) - Temporal decay: Tunable via
temporal_weightandtemporal_halflifeparameters onrecall() - Consolidation: Promoted from Working Memory via
sleep()
TripleStore (Knowledge Graph)
Structured knowledge stored as subject-predicate-object triples. Managed by the separate TripleStore class (not part of the main Mnemosyne class).
Key characteristics:
- Format: Subject-predicate-object triples with confidence and validity timestamps
- Query: Filter by subject, predicate, or object via
TripleStore.query() - Access: Separate
TripleStoreclass or module-leveladd_triple()/query_triples()helpers - Schema: Flexible, no predefined ontology
Scratchpad
A temporary workspace for reasoning, planning, and intermediate computations. Analogous to human working memory's manipulation buffer.
Key characteristics:
- Capacity: Up to 1,000 entries (configurable via
MNEMOSYNE_SP_MAX, default 1,000) - Use case: Chain-of-thought, planning, scratch calculations
- API:
scratchpad_write(),scratchpad_read(),scratchpad_clear() - Lifetime: Session-bound
Data Flow
sequenceDiagram participant Agent participant WM as Working Memory participant SP as Scratchpad participant EM as Episodic Memory participant SM as TripleStore (Knowledge Graph) Agent->>WM: remember(content) WM->>WM: store + index Agent->>SP: scratchpad_write(thought) SP->>SP: store Agent->>EM: recall(query) EM->>Agent: return results Agent->>SM: add_triple(s, p, o) SM->>SM: store triple Agent->>SM: query_triples(s, p, o) SM->>Agent: return triples
Sleep Consolidation
Call mem.sleep() to run a consolidation cycle that:
- Select Working Memory entries older than TTL/2 (12 hours by default)
- Group candidates by source
- Summarize each group via LLM or AAAK text substitution
- Promote summaries to Episodic Memory
- Evict the original Working Memory entries
Sleep is triggered explicitly — it does not run automatically on a timer.
V2 Features
Mnemosyne v2.8.0 builds on the core BEAM architecture with these capabilities:
Entity Extraction
remember() accepts extract_entities=True to automatically detect named entities (mentions, hashtags, proper nouns) using regex patterns and Levenshtein fuzzy matching. Extracted entities are stored as TripleStore triples.
Fact Extraction
remember() accepts extract=True to send content through an LLM pipeline that extracts 2–5 structured factual statements, stored as TripleStore triples. The extraction pipeline gracefully falls back through remote API → local GGUF → skip.
Temporal Recall
recall() now supports temporal_weight, query_time, and temporal_halflife parameters for fine-grained control over how recency influences retrieval. Point-in-time queries allow retrieving memories as they existed at a specific timestamp.
Configurable Scoring
The hybrid scoring weights (vec_weight, fts_weight, importance_weight) are now configurable per-query instead of fixed at compile time. Default: 50/30/20.
Memory Banks
Named banks provide isolated SQLite databases per project or context. Create via Mnemosyne(bank="project-a") or mnemosyne mcp --bank project-a.
Streaming & Patterns
MemoryStream provides push/pull event patterns for real-time notifications. PatternDetector discovers temporal, content, and sequence patterns across stored memories. DeltaSync enables incremental synchronization between instances.
Plugin System
A plugin base class (MnemosynePlugin) with 4 lifecycle hooks. Built-in plugins: LoggingPlugin, MetricsPlugin, FilterPlugin. Discovers plugins from ~/.hermes/mnemosyne/plugins/.
Design Philosophy
Mnemosyne's architecture prioritizes:
- Locality: All data in SQLite — no network calls for retrieval
- Transparency: Every operation is inspectable and debuggable
- Efficiency: Sub-100ms retrieval for typical workloads
- Flexibility: Works standalone or as a Hermes plugin
- Privacy: No data leaves your machine by default
Unlike pure vector databases (Pinecone, Weaviate), Mnemosyne combines dense embeddings with structured storage, full-text search, and temporal relationships. This hybrid approach delivers more precise retrieval for agent use cases where exact facts matter as much as semantic similarity.
Mnemosyne