BM25 - Best Matching 25 Ranking Function

Core Principle

BM25 is a sparse retrieval function that scores documents by how well their words match a query, building on two intuitions: a word that appears frequently in a document is more likely to be relevant (term frequency), and a word that appears rarely across the corpus is more informative (inverse document frequency). It is the 25th iteration in a family of ranking functions developed by Stephen Robertson and colleagues in the early 1990s.

Why It Works (First Principles)

BM25 improves on raw TF-IDF through two key corrections:

  1. Term frequency saturation — the score contribution of a word diminishes after the first few occurrences. Mentioning “python” 50 times doesn’t make a document 50x more relevant than mentioning it once. BM25 applies a logarithmic-style saturation curve controlled by parameter k1 (typically 1.2-2.0).

  2. Document length normalization — longer documents naturally contain more word occurrences, which would bias raw TF in their favor. Parameter b (typically 0.75) controls how much to penalize documents for being longer than average.

The formula at its core asks: for each query term, how much evidence does this document provide that it’s relevant, adjusted for how rare/informative that term is across the whole corpus?

Strengths

  • No model to train or embeddings to compute — works directly on an inverted index
  • Interpretable: you can trace exactly why a document scored high
  • Excels at exact keyword matching, especially for jargon-dense or domain-specific corpora
  • Extremely fast — lookup is O(number of query terms), not O(corpus size)
  • Still competitive with dense retrieval on many benchmarks decades after invention

Limitations

  • Purely lexical — “staying focused” will not match a document about “attention” unless that exact word appears
  • No understanding of synonyms, paraphrasing, or semantic meaning
  • Treats words independently (bag-of-words assumption, no phrase awareness)
  • Scoring depends on corpus statistics, so performance shifts if corpus composition changes

Where It Fits in a Modern Pipeline

BM25 is typically used as a fast first-pass retriever that narrows a large corpus to a candidate set of 50-100 documents. These candidates then get rescored by a Reranker - Cross-Encoder Rescoring or combined with Embedding Models for Semantic Similarity via Hybrid Search - Combining Sparse and Dense Retrieval.

Despite being 30+ years old, it remains a standard component in production search systems because no dense model has fully replaced its speed and exact-match reliability.

References

  • Robertson, S. E., & Zaragoza, H. (2009). “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval.
  • Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). “Okapi at TREC-3.” TREC.