Hybrid Search - Combining Sparse and Dense Retrieval
Core Principle
Hybrid search runs BM25 - Best Matching 25 Ranking Function and Embedding Models for Semantic Similarity in parallel on the same query, then fuses their result lists into a single ranked output. The insight is that keyword search and semantic search fail in complementary ways — BM25 misses paraphrases, embeddings miss exact terms — so combining them covers both failure modes.
Why Not Just Use One?
Each retrieval method has a blind spot the other covers:
- BM25 alone — searching “staying focused” won’t surface a note titled “Attention and Distraction Management” unless those exact words appear
- Embeddings alone — searching for a specific Docker container name or a person’s name may get diluted by semantically similar but wrong results, because embeddings compress meaning and lose exact tokens
Hybrid search ensures that a note containing the exact keyword and a note that’s semantically related both have a path into the candidate set.
Reciprocal Rank Fusion (RRF)
The standard fusion method is RRF, which avoids the problem of BM25 scores and cosine similarity scores being on completely different scales. Instead of combining raw scores, RRF only uses rank position:
For each document, sum 1 / (k + rank) across all retrieval methods, where k is a constant (typically 60). A document ranked #1 by both methods gets the highest fused score. A document ranked #1 by one and #50 by the other still gets credit for its strong showing.
This is more robust than score averaging because rank positions are comparable across methods while raw scores are not — BM25 scores range from 0-25+ while cosine similarity ranges 0-1.
When Hybrid Outperforms
Hybrid search shows the largest gains over single-method retrieval when:
- The corpus contains mix of jargon-heavy technical content and natural language prose
- Queries vary between exact lookups (“PERM green card”) and conceptual searches (“immigration process for work sponsorship”)
- Documents use inconsistent vocabulary for the same concepts (common in personal notes written over months/years)
Where It Fits in a Pipeline
Hybrid search is a first-stage retriever. Its output (typically top 50-100 candidates) feeds into a Reranker - Cross-Encoder Rescoring for fine-grained relevance scoring. The consensus best pipeline as of 2026 is: hybrid retrieval → reranker → LLM for answer generation.
Related Ideas
- BM25 - Best Matching 25 Ranking Function
- Embedding Models for Semantic Similarity
- Reranker - Cross-Encoder Rescoring
- Reciprocal Rank Fusion
References
- Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). “Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods.” SIGIR.