Sparse retrieval that scores documents by how well their words match a query, building on two intuitions: a word that appears frequently in a document is more likely to be relevant (term frequency, TF), and a word that appears rarely across the corpus is more informative (inverse document frequency, IDF). It is the 25th iteration in a family of probabilistic ranking functions from the Okapi project (Robertson et al., early 1990s).

BM25 improves on raw TF-IDF through two corrections. Term frequency saturation: the score contribution of a word diminishes after the first few occurrences, controlled by parameter k1 (typically 1.2-2.0). Mentioning “python” 50 times doesn’t make a document 50x more relevant than mentioning it once. Document length normalization: longer documents naturally contain more occurrences, which would bias raw TF in their favor; parameter b (typically 0.75) controls how much to penalize length above the corpus average. The formula at its core asks, for each query term, how much evidence does this document provide that it’s relevant, adjusted for how rare the term is across the corpus.

The trade-offs are sharp. Strengths: no model to train, no embeddings to compute, works directly off an inverted index, interpretable (you can trace exactly why a document scored high), excels at exact keyword matching especially for jargon-dense corpora, and is O(query terms) per lookup rather than O(corpus size). Limitations: purely lexical, “staying focused” won’t match a document about “attention” unless the exact word appears, no synonym or paraphrase awareness, treats words independently (bag-of-words assumption), and scoring depends on corpus statistics so performance shifts as the corpus changes.

BM25 typically sits as the fast first-pass retriever in a modern search pipeline, narrowing a large corpus to 50-100 candidates that then get rescored by a Reranker or fused with dense retrieval via Hybrid Search. Despite being 30+ years old it remains a standard component because no dense model has fully replaced its speed and exact-match reliability. See Robertson and Zaragoza (2009), “The Probabilistic Relevance Framework: BM25 and Beyond”, for the canonical reference.