BM25 (Wikipedia) also known as theOkapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.BM25Retrieverretriever uses therank_bm25package.
Create new retriever with texts
Create a new retriever with documents
You can now create a new retriever with the documents you created.Use retriever
We can now use the retriever!Preprocessing function
Pass a custom preprocessing function to the retriever to improve search results. Tokenizing text at the word level can enhance retrieval, especially when using vector stores like Chroma, Pinecone, or Faiss for chunked documents.BM25Plus variant
-
BM25Retrieveralso supports the BM25Plus variant, which is designed to reduce the bias against short documents present in standard BM25. - BM25Plus ensures that matched terms always contribute a positive score, which can improve recall for short texts, passages, or chunked documents commonly used in retrieval-augmented generation (RAG) workflows.
BM25Retriever uses standard BM25 (BM25Okapi). BM25Plus must be explicitly enabled.
Example: Using BM25Plus
- Short documents or passages
- Chunked text in RAG systems
- Corpora with highly variable document lengths