> ## Documentation Index
> Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
> Use this file to discover all available pages before exploring further.

# NVIDIA

> Integrate with NVIDIA using LangChain Python.

LangChain and NVIDIA have partnered to accelerate agents through four mechanisms:

1. [Components](#components)
2. [LangGraph acceleration primitives](#accelerate-langgraph-with-nvidia)
3. [NeMo Agent Toolkit optimizations](#nemo-agent-toolkit-optimizations-with-langsmith-telemetry)
4. [Full Stack blueprints](#full-stack-blueprints)

## Components

The `langchain-nvidia-ai-endpoints` package provides LangChain integrations for chat, embeddings, reranking, and retrieval powered by NVIDIA AI—including [Nemotron](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/), NVIDIA's open model family built for agentic AI, and hundreds of community models on the [NVIDIA API Catalog](https://build.nvidia.com/).

Models run on NVIDIA NIM microservices: container images that expose a standard OpenAI-compatible API, optimized with TensorRT-LLM for peak throughput on NVIDIA hardware. They can be accessed via the hosted API Catalog or self-hosted on-premises.

| Component     | Class                                                 | Description                                                     |
| :------------ | :---------------------------------------------------- | :-------------------------------------------------------------- |
| Chat          | [`ChatNVIDIA`](#chat-chatnvidia)                      | Chat completions with any NVIDIA-hosted model or local NIM      |
| Chat (Dynamo) | [`ChatNVIDIADynamo`](#chat-chatnvidiadynamo)          | `ChatNVIDIA` with KV cache routing hints for Dynamo deployments |
| Embeddings    | [`NVIDIAEmbeddings`](#embeddings-nvidiaembeddings)    | Dense vector embeddings for semantic search and RAG             |
| Reranking     | [`NVIDIARerank`](#reranking-nvidiarerank)             | Document reranking by query relevance                           |
| Retrieval     | [`NVIDIARAGRetriever`](#retrieval-nvidiaragretriever) | Retrieval from an NVIDIA RAG Blueprint server                   |

### Chat: ChatNVIDIA

`ChatNVIDIA` provides chat completions over NVIDIA-hosted models and local NIM deployments. It supports tool calling, structured output, image inputs, and streaming.

#### Install

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
pip install -qU langchain-nvidia-ai-endpoints
```

#### Access the NVIDIA API Catalog

1. Create a free account on the [NVIDIA API Catalog](https://build.nvidia.com/) and log in.
2. Click your profile icon, then **API Keys** > **Generate API Key**.
3. Copy and save the key as `NVIDIA_API_KEY`.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
import getpass
import os

if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith(
        "nvapi-"
    ), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key
```

#### Nemotron: featured models for agentic AI

[Nemotron](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/) is NVIDIA's open model family designed for agentic AI. The models use a hybrid Mamba-Transformer mixture-of-experts architecture that delivers leading benchmark performance with high throughput and support for up to 1M token context windows. Nemotron model weights, training data, and implementation recipes are published openly under the NVIDIA Open Model License.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_ai_endpoints import ChatNVIDIA

# Nemotron 3 Super — efficient reasoning and agentic tasks
llm = ChatNVIDIA(model="nvidia/nemotron-3-super-120b-a12b")
result = llm.invoke("Plan a three-step research workflow for competitive analysis.")
print(result.content)
```

See the [`ChatNVIDIA` integration page](/oss/python/integrations/chat/nvidia_ai_endpoints) for full documentation including tool calling, multimodal inputs, and Nemotron-specific examples.

### Chat: ChatNVIDIADynamo

`ChatNVIDIADynamo` is a drop-in replacement for `ChatNVIDIA` for use with [NVIDIA Dynamo](https://developer.nvidia.com/dynamo) deployments. It automatically injects KV cache routing hints into every request, allowing the Dynamo scheduler to optimize memory allocation, load routing, and request priority.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_ai_endpoints import ChatNVIDIADynamo

llm = ChatNVIDIADynamo(
    base_url="http://localhost:8099/v1",
    model="nvidia/nemotron-3-super-120b-a12b",
    osl=512,             # expected output sequence length (tokens)
    iat=250,             # expected inter-arrival time (ms)
    latency_sensitivity=1.0,
    priority=1,
)
result = llm.invoke("Summarize KV cache routing in one sentence.")
print(result.content)
```

See the [`ChatNVIDIA` integration page](/oss/python/integrations/chat/nvidia_ai_endpoints#use-with-nvidia-dynamo) for the full `ChatNVIDIADynamo` reference including per-invocation overrides and streaming.

### Embeddings: NVIDIAEmbeddings

`NVIDIAEmbeddings` generates dense vector embeddings for use in semantic search and RAG pipelines.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embedder = NVIDIAEmbeddings(model="NV-Embed-QA")
embedder.embed_query("What's the temperature today?")
```

See the [`NVIDIAEmbeddings` integration page](/oss/python/integrations/embeddings/nvidia_ai_endpoints) for full documentation.

### Reranking: NVIDIARerank

`NVIDIARerank` reranks a list of documents by relevance to a query using a NeMo Retriever reranking NIM.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_core.documents import Document
from langchain_nvidia_ai_endpoints import NVIDIARerank

ranker = NVIDIARerank(model="nvidia/llama-3.2-nv-rerankqa-1b-v1")
docs = ranker.compress_documents(
    query="What is GPU memory bandwidth?",
    documents=[Document(page_content=p) for p in passages],
)
```

### Retrieval: NVIDIARAGRetriever

`NVIDIARAGRetriever` connects LangChain to a running [NVIDIA RAG Blueprint](https://docs.nvidia.com/rag/latest/index.html) server and retrieves relevant documents via the `/v1/search` endpoint. It supports reranking, query rewriting, and metadata filtering.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_ai_endpoints import NVIDIARAGRetriever

retriever = NVIDIARAGRetriever(base_url="http://localhost:8081", k=4)
docs = retriever.invoke("What is NVIDIA NIM?")
```

See the [`NVIDIARAGRetriever` integration page](/oss/python/integrations/retrievers/nvidia) for full documentation.

### Self-host with NVIDIA NIM Microservices

When you are ready to deploy your AI application, you can self-host models with NVIDIA NIM. For more information, refer to [NVIDIA NIM Microservices](https://www.nvidia.com/en-us/ai-data-science/products/nim-microservices/).

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank

# connect to a chat NIM running at localhost:8000, specifying a model
llm = ChatNVIDIA(base_url="http://localhost:8000/v1", model="nvidia/nemotron-3-super-120b-a12b")

# connect to an embedding NIM running at localhost:8080
embedder = NVIDIAEmbeddings(base_url="http://localhost:8080/v1")

# connect to a reranking NIM running at localhost:2016
ranker = NVIDIARerank(base_url="http://localhost:2016/v1")
```

## Accelerate LangGraph with NVIDIA

The `langchain-nvidia-langgraph` package provides NVIDIA-optimized execution strategies for LangGraph graphs. It offers two complementary optimizations applied at compile time:

* **Parallel execution**: independent nodes are automatically identified and run concurrently, eliminating unnecessary sequential bottlenecks.
* **Speculative execution**: both branches of a conditional edge run simultaneously; the wrong branch is discarded once the routing condition resolves.

Neither optimization requires changes to node logic or graph edges.

### Install

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
pip install -qU langchain-nvidia-langgraph
```

### Parallel execution

Replace `StateGraph` from LangGraph with `StateGraph` from `langchain_nvidia_langgraph.graph`. The rest of your graph definition stays the same.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_langgraph.graph import StateGraph, OptimizationConfig
from langgraph.graph import END
from typing import TypedDict

class AgentState(TypedDict):
  ...

graph = StateGraph(AgentState)
app = graph.compile(optimization=OptimizationConfig(enable_parallel=True))
```

Or wrap an existing `StateGraph`:

```
from langgraph.graph import StateGraph as LangGraphStateGraph
graph = LangGraphStateGraph(AgentState)
app = with_app_compile(graph).compile(optimization=OptimizationConfig(enable_parallel=True))
```

Decorators give explicit control over which nodes participate in optimization:

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_nvidia_langgraph.graph import sequential, depends_on, speculation_unsafe

# Prevent a node from being parallelized (e.g., it writes to shared state)
@sequential
def write_to_db(state):
    ...

# Declare a dependency not expressed in graph edges
@depends_on("write_to_db")
def next_action(state):
    ...
```

### Speculative execution

Enable speculation via `OptimizationConfig` at compile time. The executor runs conditional branches in parallel and keeps the result that matches the routing decision.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
app = graph.compile(optimization=OptimizationConfig(enable_speculation=True))
```

## NeMo Agent Toolkit Optimizations with LangSmith Telemetry

The NVIDIA NeMo Agent Toolkit is an open-source AI toolkit for building, profiling, and optimizing agents. Developers can use LangChain with NeMo Agent Toolkit with minimal code changes to enable profiling, evaluation, GPU capacity plans, and automated optimization. NeMo Agent Toolkit is interoperable with LangSmith.

* [Get Started with NeMo Agent Toolkit and LangChain](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/examples/frameworks/auto_wrapper/langchain_deep_research/langgraph_deep_research.ipynb)

* [Optimize LangChain with NeMo Agent Toolkit and LangSmith](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/run-workflows/observe/observe-workflow-with-langsmith.md)

## Full Stack Blueprints

NVIDIA and LangChain have collaborated on [full stack examples](https://github.com/langchain-ai/deepagents/tree/main/examples) showing how all these components are combined for two enterprise use cases, with focus on production readiness:

* [NVIDIA AI-Q](https://github.com/NVIDIA-AI-Blueprints/aiq/tree/develop) is a blueprint for deep research across enterprise data sources using LangChain Deep Agents
* [NVIDIA VSS](https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization) is a blueprint for video search and summarization using LangChain and LangGraph

## Additional Resources

* [`langchain-nvidia-ai-endpoints` package README](https://github.com/langchain-ai/langchain-nvidia/blob/main/libs/ai-endpoints/README.md)
* [`langchain-nvidia-langgraph` package](https://github.com/langchain-ai/langchain-nvidia/tree/main/libs/langgraph)
* [Nemotron model family](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/)
* [Overview of NVIDIA NIM for Large Language Models (LLMs)](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html)
* [Overview of NeMo Retriever Embedding NIM](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html)
* [Overview of NeMo Retriever Reranking NIM](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html)
* [`ChatNVIDIA` Model](/oss/python/integrations/chat/nvidia_ai_endpoints)
* [`NVIDIAEmbeddings` Model for RAG Workflows](/oss/python/integrations/embeddings/nvidia_ai_endpoints)
* [`NVIDIARAGRetriever`](/oss/python/integrations/retrievers/nvidia)
* [NVIDIA Dynamo](https://developer.nvidia.com/dynamo)

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/oss/python/integrations/providers/nvidia.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>
