HuggingFacePipeline
class.
The Hugging Face Model Hub hosts over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.
These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class.
To use, you should have the transformers
python package installed, as well as pytorch. You can also install xformer
for a more memory-efficient attention implementation.
from_model_id
method.
transformers
pipeline directly
skip_prompt=True
with LLM.
device=n
parameter to put the model on the specified device.
Defaults to -1
for CPU inference.
If you have multiple-GPUs and/or the model is too large for a single GPU, you can specify device_map="auto"
, which requires and uses the Accelerate library to automatically determine how to load the model weights.
Note: both device
and device_map
should not be specified together and can lead to unexpected behavior.
backend="openvino"
parameter to trigger OpenVINO as backend inference framework.
If you have an Intel GPU, you can specify model_kwargs={"device": "GPU"}
to run inference on it.
--weight-format
:
ov_config
as follows: