HuggingFacePipeline
class. To deploy a model with OpenVINO, you can specify the backend="openvino"
parameter to trigger OpenVINO as backend inference framework.
To use, you should have the optimum-intel
with OpenVINO Accelerator python package installed.
from_model_id
method.
If you have an Intel GPU, you can specify model_kwargs={"device": "GPU"}
to run inference on it.
optimum-intel
pipeline directly
skip_prompt=True
with LLM.
--weight-format
:
ov_config
as follows:
stream
method to get a streaming of LLM output,