Hugging Face integration - Docs by LangChain

LangChain supports three ways to use Hugging Face embedding models:

Local inference via HuggingFaceEmbeddings: downloads the model and runs it in-process with Sentence Transformers.
Inference Providers and dedicated Inference Endpoints via HuggingFaceEndpointEmbeddings: serverless or dedicated hosted inference through Hugging Face.
Self-hosted at scale via Text Embeddings Inference (TEI): Hugging Face’s production inference server, pointed at by HuggingFaceEndpointEmbeddings.

All three use the same Embeddings interface, so you can start local and graduate to a hosted or self-hosted deployment without changing the rest of your application.

Setup

pip install -qU langchain-huggingface

Local embeddings

Generate embeddings locally via sentence-transformers. This downloads the model weights the first time you run it.

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    encode_kwargs={"normalize_embeddings": True},
)

query_result = embeddings.embed_query("This is a test document.")
doc_result = embeddings.embed_documents(["This is a test document."])

See the dedicated Sentence Transformers guide for model selection, GPU configuration, and query/document prompts.

Hugging Face Inference Endpoints and Providers

If you prefer not to download models locally, you can access embedding models through Hugging Face Inference Providers or a dedicated Inference Endpoint. Both expose open-source embedding models on Hugging Face’s scalable serverless infrastructure. First, get a token from your Hugging Face settings:

import os
from getpass import getpass

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass()

Then use HuggingFaceEndpointEmbeddings:

from langchain_huggingface import HuggingFaceEndpointEmbeddings

embeddings = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2"
)

query_result = embeddings.embed_query("This is a test document.")

To route through a specific Inference Provider (e.g., hf-inference, sambanova, together), pass provider=:

embeddings = HuggingFaceEndpointEmbeddings(
    model="BAAI/bge-m3",
    provider="hf-inference",
)

The full list of providers and their supported models is in the Inference Providers documentation.

Self-hosted with Text Embeddings Inference

For production-scale serving of Sentence Transformers models on your own infrastructure, use Text Embeddings Inference (TEI). TEI handles batching, GPU acceleration, and exposes an OpenAI-compatible API. See the TEI integration guide for a walkthrough.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Documentation Index

​Setup

​Local embeddings

​Hugging Face Inference Endpoints and Providers

​Self-hosted with Text Embeddings Inference

Setup

Local embeddings

Hugging Face Inference Endpoints and Providers

Self-hosted with Text Embeddings Inference