BM25

BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query. BM25Retriever retriever uses the rank_bm25 package.

pip install -qU  rank_bm25

from langchain_community.retrievers import BM25Retriever

Create new retriever with texts

retriever = BM25Retriever.from_texts(["foo", "bar", "world", "hello", "foo bar"])

Create a new retriever with documents

You can now create a new retriever with the documents you created.

from langchain_core.documents import Document

retriever = BM25Retriever.from_documents(
    [
        Document(page_content="foo"),
        Document(page_content="bar"),
        Document(page_content="world"),
        Document(page_content="hello"),
        Document(page_content="foo bar"),
    ]
)

Use retriever

We can now use the retriever!

result = retriever.invoke("foo")

result

[Document(metadata={}, page_content='foo'),
 Document(metadata={}, page_content='foo bar'),
 Document(metadata={}, page_content='hello'),
 Document(metadata={}, page_content='world')]

Preprocessing function

Pass a custom preprocessing function to the retriever to improve search results. Tokenizing text at the word level can enhance retrieval, especially when using vector stores like Chroma, Pinecone, or Faiss for chunked documents.

import nltk

nltk.download("punkt_tab")

from nltk.tokenize import word_tokenize

retriever = BM25Retriever.from_documents(
    [
        Document(page_content="foo"),
        Document(page_content="bar"),
        Document(page_content="world"),
        Document(page_content="hello"),
        Document(page_content="foo bar"),
    ],
    k=2,
    preprocess_func=word_tokenize,
)

result = retriever.invoke("bar")
result

[Document(metadata={}, page_content='bar'),
 Document(metadata={}, page_content='foo bar')]

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Create new retriever with texts

Create a new retriever with documents

Use retriever

Preprocessing function

Popular Providers

Integrations by component

​Create new retriever with texts

​Create a new retriever with documents

​Use retriever

​Preprocessing function

Create new retriever with texts

Create a new retriever with documents

Use retriever

Preprocessing function