This module is based on the node-llama-cpp Node.js bindings for llama.cpp, allowing you to work with a locally running LLM. This allows you to work with a much smaller quantized model capable of running on a laptop environment, ideal for testing and scratch padding ideas without running up a bill!Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Setup
You’ll need to install major version3 of the node-llama-cpp module to communicate with your local model.
npm
npm
node-llama-cpp is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at node-llama-cpp.
For advice on getting and preparing llama3 see the documentation for the LLM version of this module.
A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable LLAMA_PATH.
Usage
Basic use
We need to provide a path to our local Llama3 model, also theembeddings property is always set to true in this module.
Document embedding
Related
- Embedding model conceptual guide
- Embedding model how-to guides
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

