LLM
s with the Llama2Chat
wrapper to support the Llama-2 chat prompt format. Several LLM
implementations in LangChain can be used as interface to Llama-2 chat models. These include ChatHuggingFace, LlamaCpp, GPT4All, …, to mention a few examples.
Llama2Chat
is a generic wrapper that implements BaseChatModel
and can therefore be used in applications as chat model. Llama2Chat
converts a list of Messages into the required chat prompt format and forwards the formatted prompt as str
to the wrapped LLM
.
prompt_template
:
HuggingFaceTextGenInference
LLM--num_shard
value to the number of GPUs available. The HF_API_TOKEN
environment variable holds the Hugging Face API token.
HuggingFaceTextGenInference
instance that connects to the local inference server and wrap it into Llama2Chat
.
model
together with prompt_template
and conversation memory
in an LLMChain
.
LlamaCPP
LLMLMM
, install the llama-cpp-python
library using these installation instructions. The following example uses a quantized llama-2-7b-chat.Q4_0.gguf model stored locally at ~/Models/llama-2-7b-chat.Q4_0.gguf
.
After creating a LlamaCpp
instance, the llm
is again wrapped into Llama2Chat