Skip to main content
NVIDIARetriever connects LangChain to a running NVIDIA RAG Blueprint server and retrieves relevant documents via the /v1/search endpoint. It supports sync and async retrieval, reranking, query rewriting, and metadata filtering.

Overview

Integration details

ClassPackageLocalSerializableJS supportDownloadsVersion
NVIDIARetrieverlangchain-nvidia-ai-endpointsbetaPyPI - DownloadsPyPI - Version

Setup

NVIDIARetriever requires a running NVIDIA RAG Blueprint server. Refer to the NVIDIA RAG Blueprint documentation for deployment instructions. By default the server listens on http://localhost:8081 and expects at least one ingested collection in its vector database. No API key is required for the retriever; authentication is handled by the RAG server.

Installation

pip install -qU langchain-nvidia-ai-endpoints

Instantiation

from langchain_nvidia_ai_endpoints import NVIDIARetriever

retriever = NVIDIARetriever(
    base_url="http://localhost:8081",
    k=4,
    collection_names=["my_collection"],
)
Key parameters:
ParameterTypeDefaultDescription
base_urlstrBase URL of the RAG Blueprint server
kint10Number of documents to return (0–25)
collection_nameslist[str]["multimodal_data"]Vector database collections to search
vdb_top_kint100Results retrieved before reranking (0–400)
enable_rerankerboolTrueEnable reranking of retrieved results
enable_query_rewritingboolFalseEnable query rewriting before search
confidence_thresholdfloat0.0Minimum relevance score (0.0–1.0) to include a document
timeoutfloat60HTTP request timeout in seconds

Usage

docs = retriever.invoke("What is NVIDIA NIM?")
for doc in docs:
    print(doc.page_content)
Async retrieval is also supported:
docs = await retriever.ainvoke("What is NVIDIA NIM?")

Use within a chain

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIARetriever

retriever = NVIDIARetriever(base_url="http://localhost:8081", k=4)
llm = ChatNVIDIA(model="meta/llama3-8b-instruct")

prompt = ChatPromptTemplate.from_template(
    "Answer the question based only on the following context:\n{context}\n\nQuestion: {question}"
)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke("What is NVIDIA NIM?")

API reference

For detailed documentation of all NVIDIARetriever features and configurations head to the API reference.