Hybrid Search
Mnemosyne's flagship retrieval capability — combining dense vector similarity with SQLite FTS5 full-text search for results that are both semantically relevant and precisely matched.
Why Hybrid?
| Approach | Strengths | Weaknesses |
|---|---|---|
| Vector only | Semantic similarity, fuzzy matching | Misses exact terms, no boolean logic |
| Text only | Exact matching, boolean queries | No semantic understanding, synonym blind |
| Hybrid | Best of both | Slightly higher latency |
Fusion Algorithm
The recall() method performs hybrid search using a configurable scoring formula with tunable weights:
# Step 1: Weighted combination of scores (weights default to 50/30/20)
base_score = (vector_score * vec_weight +
fts_score * fts_weight +
importance * importance_weight) / (vec_weight + fts_weight + importance_weight)
# Step 2: Apply temporal recency decay
final_score = base_score * (1.0 - temporal_weight + temporal_weight * recency_decay)
Default weights:
- Vector (
vec_weight): 50.0 — semantic similarity via BAAI/bge-small-en-v1.5, 384 dims - FTS5 (
fts_weight): 30.0 — BM25 text relevance - Importance (
importance_weight): 20.0 — memory importance rating, 0.0–1.0 - Temporal (
temporal_weight): 0.3 — controls how much recency decay influences the final score
The recency decay uses an exponential half-life (temporal_halflife, default 168 hours / 7 days).
All weights are configurable per-query — see usage examples below.
Usage
from mnemosyne import Mnemosyne
mem = Mnemosyne()
# Default hybrid search — balanced 50/30/20 weights
results = mem.recall("What database does the project use?", top_k=5)
for r in results:
print(f"Score: {r['score']:.3f}")
print(f" Content: {r['content'][:100]}")
print(f" Source: {r['source']}")
Tuning Retrieval Weights
Override the default weights per query to match your use case:
# Boost exact text matching for error codes, identifiers
results = mem.recall(
"error code E501 in linter",
vec_weight=20.0,
fts_weight=60.0,
importance_weight=20.0,
)
# Maximize semantic similarity for conceptual queries
results = mem.recall(
"how does authentication work?",
vec_weight=70.0,
fts_weight=20.0,
importance_weight=10.0,
)
# Emphasize recency for "what happened lately" queries
results = mem.recall(
"what did we discuss this week?",
temporal_weight=0.6,
temporal_halflife=48.0, # 48-hour half-life
)
# Point-in-time temporal query
results = mem.recall(
"decisions from January",
query_time="2026-01-31T23:59:59Z",
temporal_weight=0.1, # minimal recency bias
)
Result Format
[
{
"id": "mem_abc123",
"content": "We decided to use PostgreSQL for the primary database...",
"score": 0.92,
"source": "decision",
"importance": 0.8,
"created_at": 1714060800,
},
...
]
Working Memory Retrieval
When searching Working Memory (recent, non-consolidated entries), a keyword-based fallback is used with different weights than the episodic formula. This handles the case where vector embeddings may not yet be available for very recent entries.
Performance
| Query Type | Median Latency | p99 Latency |
|---|---|---|
| Vector only | 35ms | 120ms |
| FTS5 only | 15ms | 45ms |
| Hybrid (default) | 65ms | 180ms |
Hybrid search requires both vector and FTS5 indices to be built. These are created automatically on first use. Initial index build may take a few seconds for large datasets.
Mnemosyne