LangChain supports three ways to use Hugging Face embedding models:Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
- Local inference via
HuggingFaceEmbeddings: downloads the model and runs it in-process with Sentence Transformers. - Inference Providers and dedicated Inference Endpoints via
HuggingFaceEndpointEmbeddings: serverless or dedicated hosted inference through Hugging Face. - Self-hosted at scale via Text Embeddings Inference (TEI): Hugging Face’s production inference server, pointed at by
HuggingFaceEndpointEmbeddings.
Embeddings interface, so you can start local and graduate to a hosted or self-hosted deployment without changing the rest of your application.
Setup
Local embeddings
Generate embeddings locally viasentence-transformers. This downloads the model weights the first time you run it.
Hugging Face Inference Endpoints and Providers
If you prefer not to download models locally, you can access embedding models through Hugging Face Inference Providers or a dedicated Inference Endpoint. Both expose open-source embedding models on Hugging Face’s scalable serverless infrastructure. First, get a token from your Hugging Face settings:HuggingFaceEndpointEmbeddings:
hf-inference, sambanova, together), pass provider=:
Self-hosted with Text Embeddings Inference
For production-scale serving of Sentence Transformers models on your own infrastructure, use Text Embeddings Inference (TEI). TEI handles batching, GPU acceleration, and exposes an OpenAI-compatible API. See the TEI integration guide for a walkthrough.Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

