CompatibilityOnly available on Node.js.
Setup
You’ll need to install major version3
of the node-llama-cpp module to communicate with your local model.
npm
npm
node-llama-cpp
is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at node-llama-cpp.
A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable LLAMA_PATH
.
Guide to installing Llama3
Getting a local Llama3 model running on your machine is a pre-req so this is a quick guide to getting and building Llama 3.1-8B (the smallest) and then quantizing it so that it will run comfortably on a laptop. To do this you will needpython3
on your machine (3.11 is recommended), also gcc
and make
so that llama.cpp
can be built.
Getting the Llama3 models
To get a copy of Llama3 you need to visit Meta AI and request access to their models. Once Meta AI grant you access, you will receive an email containing a unique URL to access the files, this will be needed in the next steps. Now create a directory to work in, for example:llama-models
repo, which can be found here. In the repo, there are instructions to download the model of your choice, and you should use the unique URL that was received in your email.
The rest of the tutorial assumes that you have downloaded Llama3.1-8B
, but any model from here on out should work. Upon downloading the model, make sure to save the model download path, this will be used for later.
Converting and quantizing the model
In this step we need to usellama.cpp
so we need to download that repo.
llama.cpp
tools and set up our python
environment. In these steps it’s assumed that your install of python can be run using python3
and that the virtual environment can be called llama3
, adjust accordingly for your own situation.
(llama3)
prefixing your command prompt to let you know this is the active environment. Note: if you need to come back to build another model or re-quantize the model don’t forget to activate the environment again also if you update llama.cpp
you will need to rebuild the tools and possibly install new or updated dependencies! Now that we have an active python environment, we need to install the python dependencies.
llama.cpp
. A conversion to a Hugging Face model is needed, followed by a conversion to a GGUF model.
First, we need to locate the path with the following script convert_llama_weights_to_hf.py
. Copy and paste this script into your current working directory. Note that using the script may need you to pip install extra dependencies, do so as needed.
Then, we need to convert the model, prior to the conversion let’s create directories to store our Hugging Face conversion and our final model.
models\8B-GGUF
directory, this one called gguf-llama3-Q4_0.bin
, this is the model we can use with langchain. You can validate this model is working by testing it using the llama.cpp
tools.