IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.This example goes over how to use LangChain to conduct embedding tasks with
ipex-llm
optimizations on Intel GPU. This would be helpful in applications such as RAG, document QA, etc.
Note It is recommended that only Windows users with Intel Arc A-Series GPU (except for Intel Arc A300-Series or Pro A60) run this Jupyter notebook directly. For other cases (e.g. Linux users, Intel iGPU, etc.), it is recommended to run the code with Python scripts in terminal for best experiences.
sentence-transformers
.
Note
You can also use https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
as the extra-indel-url.
Note For the first time that each model runs on Intel iGPU/Intel Arc A300-Series or Pro A60, it may take several minutes to compile. For other GPU type, please refer to here for Windows users, and here for Linux users.
device
to "xpu"
in model_kwargs
when initializing IpexLLMBgeEmbeddings
will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations: