Hybrid Search

Mnemosyne's flagship retrieval capability — combining dense vector similarity with SQLite FTS5 full-text search for results that are both semantically relevant and precisely matched.

Why Hybrid?

ApproachStrengthsWeaknesses
Vector onlySemantic similarity, fuzzy matchingMisses exact terms, no boolean logic
Text onlyExact matching, boolean queriesNo semantic understanding, synonym blind
HybridBest of bothSlightly higher latency

Fusion Algorithm

The recall() method performs hybrid search using a configurable scoring formula with tunable weights:

# Step 1: Weighted combination of scores (weights default to 50/30/20)
base_score = (vector_score * vec_weight +
           fts_score * fts_weight +
           importance * importance_weight) / (vec_weight + fts_weight + importance_weight)

# Step 2: Apply temporal recency decay
final_score = base_score * (1.0 - temporal_weight + temporal_weight * recency_decay)

Default weights:

  • Vector (vec_weight): 50.0 — semantic similarity via BAAI/bge-small-en-v1.5, 384 dims
  • FTS5 (fts_weight): 30.0 — BM25 text relevance
  • Importance (importance_weight): 20.0 — memory importance rating, 0.0–1.0
  • Temporal (temporal_weight): 0.3 — controls how much recency decay influences the final score

The recency decay uses an exponential half-life (temporal_halflife, default 168 hours / 7 days).

All weights are configurable per-query — see usage examples below.

Usage

from mnemosyne import Mnemosyne

mem = Mnemosyne()

# Default hybrid search — balanced 50/30/20 weights
results = mem.recall("What database does the project use?", top_k=5)

for r in results:
  print(f"Score: {r['score']:.3f}")
  print(f"  Content: {r['content'][:100]}")
  print(f"  Source: {r['source']}")

Tuning Retrieval Weights

Override the default weights per query to match your use case:

# Boost exact text matching for error codes, identifiers
results = mem.recall(
  "error code E501 in linter",
  vec_weight=20.0,
  fts_weight=60.0,
  importance_weight=20.0,
)

# Maximize semantic similarity for conceptual queries
results = mem.recall(
  "how does authentication work?",
  vec_weight=70.0,
  fts_weight=20.0,
  importance_weight=10.0,
)

# Emphasize recency for "what happened lately" queries
results = mem.recall(
  "what did we discuss this week?",
  temporal_weight=0.6,
  temporal_halflife=48.0,  # 48-hour half-life
)

# Point-in-time temporal query
results = mem.recall(
  "decisions from January",
  query_time="2026-01-31T23:59:59Z",
  temporal_weight=0.1,  # minimal recency bias
)

Result Format

[
  {
      "id": "mem_abc123",
      "content": "We decided to use PostgreSQL for the primary database...",
      "score": 0.92,
      "source": "decision",
      "importance": 0.8,
      "created_at": 1714060800,
  },
  ...
]

Working Memory Retrieval

When searching Working Memory (recent, non-consolidated entries), a keyword-based fallback is used with different weights than the episodic formula. This handles the case where vector embeddings may not yet be available for very recent entries.

Performance

Query TypeMedian Latencyp99 Latency
Vector only35ms120ms
FTS5 only15ms45ms
Hybrid (default)65ms180ms
Index Requirements

Hybrid search requires both vector and FTS5 indices to be built. These are created automatically on first use. Initial index build may take a few seconds for large datasets.