IPEX-LLM - Local BGE Embeddings on Intel CPU

IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.

This example goes over how to use LangChain to conduct embedding tasks with ipex-llm optimizations on Intel CPU. This would be helpful in applications such as RAG, document QA, etc.

Setup

pip install -qU langchain langchain-community

Install IPEX-LLM for optimizations on Intel CPU, as well as sentence-transformers.

pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip install sentence-transformers

Note For Windows users, --extra-index-url https://download.pytorch.org/whl/cpu when install ipex-llm is not required.

Basic Usage

from langchain_community.embeddings import IpexLLMBgeEmbeddings

embedding_model = IpexLLMBgeEmbeddings(
    model_name="BAAI/bge-large-en-v1.5",
    model_kwargs={},
    encode_kwargs={"normalize_embeddings": True},
)

API reference

IpexLLMBgeEmbeddings

sentence = "IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency."
query = "What is IPEX-LLM?"

text_embeddings = embedding_model.embed_documents([sentence, query])
print(f"text_embeddings[0][:10]: {text_embeddings[0][:10]}")
print(f"text_embeddings[1][:10]: {text_embeddings[1][:10]}")

query_embedding = embedding_model.embed_query(query)
print(f"query_embedding[:10]: {query_embedding[:10]}")

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

IPEX-LLM - Local BGE Embeddings on Intel CPU

Setup

Basic Usage

API reference

Popular Providers

Integrations by component

​Setup

​Basic Usage

​API reference

Setup

Basic Usage

API reference