NVIDIARetriever connects LangChain to a running NVIDIA RAG Blueprint server and retrieves relevant documents via the /v1/search endpoint. It supports sync and async retrieval, reranking, query rewriting, and metadata filtering.
Overview
Integration details
| Class | Package | Local | Serializable | JS support | Downloads | Version |
|---|---|---|---|---|---|---|
| NVIDIARetriever | langchain-nvidia-ai-endpoints | ✅ | beta | ❌ |
Setup
NVIDIARetriever requires a running NVIDIA RAG Blueprint server. Refer to the NVIDIA RAG Blueprint documentation for deployment instructions. By default the server listens on http://localhost:8081 and expects at least one ingested collection in its vector database.
No API key is required for the retriever; authentication is handled by the RAG server.
Installation
Instantiation
| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | — | Base URL of the RAG Blueprint server |
k | int | 10 | Number of documents to return (0–25) |
collection_names | list[str] | ["multimodal_data"] | Vector database collections to search |
vdb_top_k | int | 100 | Results retrieved before reranking (0–400) |
enable_reranker | bool | True | Enable reranking of retrieved results |
enable_query_rewriting | bool | False | Enable query rewriting before search |
confidence_threshold | float | 0.0 | Minimum relevance score (0.0–1.0) to include a document |
timeout | float | 60 | HTTP request timeout in seconds |
Usage
Use within a chain
API reference
For detailed documentation of allNVIDIARetriever features and configurations head to the API reference.
Related topics
- NVIDIA Provider Page
ChatNVIDIAintegrationNVIDIAEmbeddingsintegration- NVIDIA RAG Blueprint documentation
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

