Hybrid Search

Mnemosyne's flagship retrieval capability — combining dense vector similarity with SQLite FTS5 full-text search for results that are both semantically relevant and precisely matched.

Why Hybrid?

Approach	Strengths	Weaknesses
Vector only	Semantic similarity, fuzzy matching	Misses exact terms, no boolean logic
Text only	Exact matching, boolean queries	No semantic understanding, synonym blind
Hybrid	Best of both	Slightly higher latency

Fusion Algorithm

The recall() method performs hybrid search using a configurable scoring formula with tunable weights:

# Step 1: Weighted combination of scores (weights default to 50/30/20)
base_score = (vector_score * vec_weight +
           fts_score * fts_weight +
           importance * importance_weight) / (vec_weight + fts_weight + importance_weight)

# Step 2: Apply temporal recency decay
final_score = base_score * (1.0 - temporal_weight + temporal_weight * recency_decay)

Default weights:

Vector (vec_weight): 50.0 — semantic similarity via BAAI/bge-small-en-v1.5, 384 dims
FTS5 (fts_weight): 30.0 — BM25 text relevance
Importance (importance_weight): 20.0 — memory importance rating, 0.0–1.0
Temporal (temporal_weight): 0.3 — controls how much recency decay influences the final score

The recency decay uses an exponential half-life (temporal_halflife, default 168 hours / 7 days).

All weights are configurable per-query — see usage examples below.

Usage

from mnemosyne import Mnemosyne

mem = Mnemosyne()

# Default hybrid search — balanced 50/30/20 weights
results = mem.recall("What database does the project use?", top_k=5)

for r in results:
  print(f"Score: {r['score']:.3f}")
  print(f"  Content: {r['content'][:100]}")
  print(f"  Source: {r['source']}")

Tuning Retrieval Weights

Override the default weights per query to match your use case:

# Boost exact text matching for error codes, identifiers
results = mem.recall(
  "error code E501 in linter",
  vec_weight=20.0,
  fts_weight=60.0,
  importance_weight=20.0,
)

# Maximize semantic similarity for conceptual queries
results = mem.recall(
  "how does authentication work?",
  vec_weight=70.0,
  fts_weight=20.0,
  importance_weight=10.0,
)

# Emphasize recency for "what happened lately" queries
results = mem.recall(
  "what did we discuss this week?",
  temporal_weight=0.6,
  temporal_halflife=48.0,  # 48-hour half-life
)

# Point-in-time temporal query
results = mem.recall(
  "decisions from January",
  query_time="2026-01-31T23:59:59Z",
  temporal_weight=0.1,  # minimal recency bias
)

Result Format

[
  {
      "id": "mem_abc123",
      "content": "We decided to use PostgreSQL for the primary database...",
      "score": 0.92,
      "source": "decision",
      "importance": 0.8,
      "created_at": 1714060800,
  },
  ...
]

Working Memory Retrieval

When searching Working Memory (recent, non-consolidated entries), a keyword-based fallback is used with different weights than the episodic formula. This handles the case where vector embeddings may not yet be available for very recent entries.

Performance

Query Type	Median Latency	p99 Latency
Vector only	35ms	120ms
FTS5 only	15ms	45ms
Hybrid (default)	65ms	180ms

Index Requirements

Hybrid search requires both vector and FTS5 indices to be built. These are created automatically on first use. Initial index build may take a few seconds for large datasets.

Vector Search

Deep dive into Mnemosyne's vector search: how embeddings are generated, stored, and queried for sema...

FTS5 Search

Understand Mnemosyne's FTS5 full-text search implementation: indexing strategies, query syntax, rank...

Ranking & Relevance

Learn how Mnemosyne scores and ranks retrieved memories using recency weighting, relevance scoring, ...