Skip to main content
Ollama allows you to run open-source large language models, such as gpt-oss, locally. Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage. For a complete list of supported models and model variants, see the Ollama model library.

Installation and Setup

Ollama installation

Follow these instructions to set up and run a local Ollama instance. Ollama will start as a background service automatically, if this is disabled, run:
# export OLLAMA_HOST=127.0.0.1 # environment variable to set ollama host
# export OLLAMA_PORT=11434 # environment variable to set the ollama port
ollama serve
After starting ollama, run ollama pull <name-of-model> to download a model from the Ollama model library:
ollama pull gpt-oss:20b
  • This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.
  • To view all pulled (downloaded) models, use ollama list
We’re now ready to install the langchain-ollama partner package and run a model.

Ollama LangChain partner package install

Install the integration package with:
pip install langchain-ollama

LLM

from langchain_ollama.llms import OllamaLLM
See the notebook example here.

Chat Models

Chat Ollama

from langchain_ollama.chat_models import ChatOllama
See the notebook example here.

Ollama tool calling

Ollama tool calling uses the OpenAI compatible web server specification, and can be used with the default BaseChatModel.bind_tools() methods as described here. Make sure to select an ollama model that supports tool calling.

Embedding models

from langchain_community.embeddings import OllamaEmbeddings
See the notebook example here.
I