Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.langchain.com/llms.txt

Use this file to discover all available pages before exploring further.

LangChain supports three ways to use Hugging Face embedding models:
  • Local inference via HuggingFaceEmbeddings: downloads the model and runs it in-process with Sentence Transformers.
  • Inference Providers and dedicated Inference Endpoints via HuggingFaceEndpointEmbeddings: serverless or dedicated hosted inference through Hugging Face.
  • Self-hosted at scale via Text Embeddings Inference (TEI): Hugging Face’s production inference server, pointed at by HuggingFaceEndpointEmbeddings.
All three use the same Embeddings interface, so you can start local and graduate to a hosted or self-hosted deployment without changing the rest of your application.

Setup

pip install -qU langchain-huggingface

Local embeddings

Generate embeddings locally via sentence-transformers. This downloads the model weights the first time you run it.
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    encode_kwargs={"normalize_embeddings": True},
)

query_result = embeddings.embed_query("This is a test document.")
doc_result = embeddings.embed_documents(["This is a test document."])
See the dedicated Sentence Transformers guide for model selection, GPU configuration, and query/document prompts.

Hugging Face Inference Endpoints and Providers

If you prefer not to download models locally, you can access embedding models through Hugging Face Inference Providers or a dedicated Inference Endpoint. Both expose open-source embedding models on Hugging Face’s scalable serverless infrastructure. First, get a token from your Hugging Face settings:
import os
from getpass import getpass

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass()
Then use HuggingFaceEndpointEmbeddings:
from langchain_huggingface import HuggingFaceEndpointEmbeddings

embeddings = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2"
)

query_result = embeddings.embed_query("This is a test document.")
To route through a specific Inference Provider (e.g., hf-inference, sambanova, together), pass provider=:
embeddings = HuggingFaceEndpointEmbeddings(
    model="BAAI/bge-m3",
    provider="hf-inference",
)
The full list of providers and their supported models is in the Inference Providers documentation.

Self-hosted with Text Embeddings Inference

For production-scale serving of Sentence Transformers models on your own infrastructure, use Text Embeddings Inference (TEI). TEI handles batching, GPU acceleration, and exposes an OpenAI-compatible API. See the TEI integration guide for a walkthrough.