Skip to main content

Local Embeddings

You can generate embeddings locally using the HuggingFaceEmbeddings class. This utilizes the sentence_transformers library to download the model weights and run them directly on your machine. Let’s load the Hugging Face Embedding class.
pip install -qU  langchain langchain-huggingface sentence_transformers
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
text = "This is a test document."
query_result = embeddings.embed_query(text)
query_result[:3]
[-0.04895168915390968, -0.03986193612217903, -0.021562768146395683]
doc_result = embeddings.embed_documents([text])

Hugging Face Inference Endpoints (Serverless API)

If you prefer not to download models locally, you can access embedding models via the Inference Endpoints, which let us use open-source models on Hugging Face’s scalable serverless infrastructure. Ensure you have huggingface_hub installed, which is usually included with langchain-huggingface
!pip install huggingface_hub
First, we need to get a read-only API key from Hugging Face.
import os
from getpass import getpass

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass()
Now we can use the HuggingFaceEndpointEmbeddings class to run open-source embedding models remotely via the API.
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings
embeddings = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-MiniLM-L6-v2"
)
text = "This is a test document."
query_result = embeddings.embed_query(text)
query_result[:3]