BM25 - Best Matching 25 Ranking Function
Core Principle
BM25 is a sparse retrieval function that scores documents by how well their words match a query, building on two intuitions: a word that appears frequently in a document is more likely to be relevant (term frequency), and a word that appears rarely across the corpus is more informative (inverse document frequency). It is the 25th iteration in a family of ranking functions developed by Stephen Robertson and colleagues in the early 1990s.
Why It Works (First Principles)
BM25 improves on raw TF-IDF through two key corrections:
-
Term frequency saturation — the score contribution of a word diminishes after the first few occurrences. Mentioning “python” 50 times doesn’t make a document 50x more relevant than mentioning it once. BM25 applies a logarithmic-style saturation curve controlled by parameter
k1(typically 1.2-2.0). -
Document length normalization — longer documents naturally contain more word occurrences, which would bias raw TF in their favor. Parameter
b(typically 0.75) controls how much to penalize documents for being longer than average.
The formula at its core asks: for each query term, how much evidence does this document provide that it’s relevant, adjusted for how rare/informative that term is across the whole corpus?
Strengths
- No model to train or embeddings to compute — works directly on an inverted index
- Interpretable: you can trace exactly why a document scored high
- Excels at exact keyword matching, especially for jargon-dense or domain-specific corpora
- Extremely fast — lookup is O(number of query terms), not O(corpus size)
- Still competitive with dense retrieval on many benchmarks decades after invention
Limitations
- Purely lexical — “staying focused” will not match a document about “attention” unless that exact word appears
- No understanding of synonyms, paraphrasing, or semantic meaning
- Treats words independently (bag-of-words assumption, no phrase awareness)
- Scoring depends on corpus statistics, so performance shifts if corpus composition changes
Where It Fits in a Modern Pipeline
BM25 is typically used as a fast first-pass retriever that narrows a large corpus to a candidate set of 50-100 documents. These candidates then get rescored by a Reranker - Cross-Encoder Rescoring or combined with Embedding Models for Semantic Similarity via Hybrid Search - Combining Sparse and Dense Retrieval.
Despite being 30+ years old, it remains a standard component in production search systems because no dense model has fully replaced its speed and exact-match reliability.
Related Ideas
- TF-IDF
- Embedding Models for Semantic Similarity
- Hybrid Search - Combining Sparse and Dense Retrieval
- Inverted Index
- SPLADE
References
- Robertson, S. E., & Zaragoza, H. (2009). “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval.
- Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). “Okapi at TREC-3.” TREC.