You are viewing the v1 docs for LangChain, which is currently under active development. Learn more.
This page covers the core concepts and usage of models in LangChain. If you’re looking for provider-specific integrations, head over to the integrations page.

Overview

LLMs are powerful AI tools that can interpret and generate text like humans. They’re versatile enough to write content, translate languages, summarize, and answer questions without needing specialized training for each task. In addition to text generation, many models support:
  • Tool calling - calling external tools (like databases queries or API calls) and use results in their responses.
  • Structured output - where the model’s response is constrained follow a defined format.
  • Multimodal - process and return data other than text, such as images, audio, and video.
  • Reasoning - models perform multi-step reasoning to arrive at a conclusion.

Basic usage

The easiest way to get started with a model in LangChain is to use initChatModel to initialize one from a provider of your choice.
Initialize a chat model
import { initChatModel } from "langchain";

const model = await initChatModel("openai:gpt-5-nano");
const response = await model.invoke("Why do parrots talk?");
See initChatModel for more detail.

Key methods

Invoke

The model takes messages as input and outputs messages after generating a complete response.

Stream

Invoke the model, but stream the output as it is generated in real-time.

Batch

Send multiple requests to a model in a batch for more efficient processing.
In addition to chat models, LangChain provides support for other adjacent technologies, such as embedding models and vector stores. See the integrations page for details.

Parameters

A chat model takes parameters that can be used to configure its behavior. The full set of supported parameters vary by model and provider, but standard ones include:
model
string
required
The name or identifier of the specific model you want to use with a provider.
apiKey
string
The key required for authenticating with the model’s provider. This is usually issued when you sign up for access to the model. Can often be accessed by setting an .
temperature
number
Controls the randomness of the model’s output. A higher number makes responses more creative; lower ones makes them more deterministic.
stop
string[]
A sequence of characters that indicates when the model should stop generating its output.
timeout
number
The maximum time (in seconds) to wait for a response from the model before canceling the request.
maxTokens
number
Limits the total number of in the response, effectively controlling how long the output can be.
maxRetries
number
The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits.
To find all the parameters supported by a given chat model, head to the reference docs.

Invocation

A chat model must be invoked to generate an output. There are three primary invocation methods, each suited to different use cases.

Invoke

The most straightforward way to call a model is to use invoke() with a single message or a list of messages.
Single message
const response = await model.invoke("Why do parrots have colorful feathers?");
console.log(response);
A list of messages can be provided to a model to represent conversation history. Each message has a role that models use to indicate who sent the message in the conversation. See the messages guide for more detail on roles, types, and content.
Conversation history
import { HumanMessage, AIMessage, SystemMessage } from "langchain";

const conversation = [
    new SystemMessage("You are a helpful assistant that translates English to French."),
    new HumanMessage("Translate: I love programming."),
    new AIMessage("J'adore la programmation."),
    new HumanMessage("Translate: I love building applications.")
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

Stream

Most models can stream their output content while it is being generated. By displaying output progressively, streaming significantly improves user experience, particularly for longer responses. Calling stream() returns an that yields output chunks as they are produced. You can use a loop to process each chunk in real-time:
const stream = await model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
    console.log(chunk.text)
}
As opposed to invoke(), which returns a single AIMessage after the model has finished generating its full response, stream() returns multiple AIMessageChunk objects, each containing a portion of the output text. Importantly, each chunk in a stream is designed to be gathered into a full message via summation:
Construct AIMessage
let full: AIMessageChunk | null = null;
for await (const chunk of stream) {
    full = full ? full.concat(chunk) : chunk;
    console.log(full.text);
}

// The
// The sky
// The sky is
// The sky is typically
// The sky is typically blue
// ...

console.log(full.contentBlocks);
// [{"type": "text", "text": "The sky is typically blue..."}]
The resulting message can be treated the same as a message that was generated with invoke() - for example, it can be aggregated into a message history and passed back to the model as conversational context.
Streaming only works if all steps in the program know how to process an stream of chunks. For instance, an application that isn’t streaming-capable would be one that needs to store the entire output in memory before it can be processed.

Batch

Batching a collection of independent requests to a model can significantly improve performance and reduce costs, as the processing can be done in parallel:
Batch
const responses = await model.batch([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
]);
for (const response of responses) {
    console.log(response);
}
When processing a large number of inputs using batch(), you may want to control the maximum number of parallel calls. This can be done by setting the maxConcurrency attribute in the RunnableConfig dictionary.
Batch with max concurrency
model.batch(
    listOfInputs,
    {
        maxConcurrency: 5,  // Limit to 5 parallel calls
    }
)
See the RunnableConfig reference for a full list of supported attributes.
For more details on batching, see the @reference.

Tool calling

Models can request to call tools that perform tasks such as fetching data from a database, searching the web, or running code. Tools are pairings of:
  1. A schema, including the name of the tool, a description, and/or argument definitions (often a JSON schema)
  2. A function or to execute.
You may hear the term “function calling”. We use this interchangeably with “tool calling”.
To make tools that you have defined available for use by a model, you must bind them using bindTools(). In subsequent invocations, the model can choose to call any of the bound tools as needed. Some model providers offer built-in tools that can be enabled via model parameters. Check the respective provider reference for details.
See the tools guide for details and other options for creating tools.
Binding user tools
import { tool } from "langchain";
import { z } from "zod";
import { ChatOpenAI } from "@langchain/openai";

const getWeather = tool(
    (input) => {
        return `It's sunny in ${input.location}.`
    },
    {
        name: "get_weather",
        description: "Get the weather at a location.",
        schema: z.object({
            location: z.string().describe("The location to get the weather for")
        })
    }
)

const model = new ChatOpenAI({ model: "gpt-4o" })
const modelWithTools = model.bindTools([getWeather])

const response = await modelWithTools.invoke("What's the weather like in Boston?")
const toolCalls = response.tool_calls || []
for (const tool_call of toolCalls) {
    // View tool calls made by the model
    console.log(`Tool: ${tool_call.name}`);
    console.log(`Args: ${tool_call.args}`);
}
When binding user-defined tools, the model’s response includes a request to execute a tool. It is up to you to perform the requested action and return the result back to the model for use in subsequent reasoning. Below, we show some common ways you can use tool calling.

Structured outputs

Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured outputs.
A zod schema is the preferred method of defining an output schema. Note that when a zod schema is provided, the model output will also be validated against the schema using zod’s parse methods.
import { z } from "zod";

const Movie = z.object({
    title: z.string().describe("The title of the movie"),
    year: z.number().describe("The year the movie was released"),
    director: z.string().describe("The director of the movie"),
    rating: z.number().describe("The movie's rating out of 10"),
});

const modelWithStructure = model.withStructuredOutput(Movie);

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//     title: "Inception",
//     year: 2010,
//     director: "Christopher Nolan",
//     rating: 8.8,
// }
Key considerations for structured outputs:
  • Method parameter: Some providers support different methods ('jsonSchema', 'functionCalling', 'jsonMode')
  • Include raw: Use includeRaw=true to get both the parsed output and the raw AI message
  • Validation: Zod models provide automatic validation, while JSON Schema requires manual validation

Supported models

LangChain supports all major model providers, including OpenAI, Anthropic, Google, Azure, AWS Bedrock, and more. Each provider offers a variety of models with different capabilities. For a full list of supported models in LangChain, see the integrations page.

Advanced configuration

Multimodal

Certain models can process and return non-textual data such as images, audio, and video. You can pass non-textual data to a model by providing content blocks.
All LangChain chat models with underlying multimodal capabilities support:
  1. Data in the cross-provider standard format (see our messages guide)
  2. OpenAI chat completions format
  3. Any format that is native to that specific provider (e.g., Anthropic models accept Anthropic native format)
See the multimodal section of the messages guide for details. Some models can also return multimodal data as part of their response. In such cases, the resulting AIMessage will have content blocks with multimodal types.
Multimodal output
const response = await model.invoke("Create a picture of a cat");
console.log(response.contentBlocks);
// [
//   { type: "text", text: "Here's a picture of a cat" },
//   { type: "image", data: "...", mimeType: "image/jpeg" },
// ]
See the integrations page for details on specific providers.

Reasoning

Newer models are capable of performing multi-step reasoning to arrive at a conclusion. This involves breaking down complex problems into smaller, more manageable steps. If supported by the underlying model, you can surface this reasoning process to better understand how the model arrived at its final answer.
const stream = model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
    const reasoningSteps = chunk.contentBlocks.filter(b => b.type === "reasoning");
    console.log(reasoningSteps.length > 0 ? reasoningSteps : chunk.text);
}
Depending on the model, you can sometimes specify the level of effort it should put into reasoning. Alternatively, you can request that the model turn off reasoning entirely. This may take the form of categorical “tiers” of reasoning (e.g., 'low' or 'high') or integer token budgets. For details, see the relevant chat model in the integrations page.

Local models

LangChain supports running models locally on your own hardware. This is useful for scenarios where data privacy is critical, or when you want to avoid the cost of using a cloud-based model. Ollama is one of the easiest ways to run models locally. See the full list of local integrations on the integrations page.

Caching

Chat model APIs can be slow and expensive to call. To help mitigate this, LangChain provides an optional caching layer for chat model integrations.

Base URL or proxy

For many chat model integrations, you can configure the base URL for API requests, which allows you to use model providers that have OpenAI-compatible APIs or to use a proxy server.

Token usage

A number of model providers return token usage information as part of the invocation response. When available, this information will be included on the AIMessage objects produced by the corresponding model. For more details, see the messages guide.
Some provider APIs, notably OpenAI and Azure OpenAI chat completions, require users opt-in to receiving token usage data in streaming contexts. See the streaming usage metadata section of the integration guide for details.

Invocation config

When invoking a model, you can pass additional configuration through the config parameter using a RunnableConfig object. This provides run-time control over execution behavior, callbacks, and metadata tracking. Common configuration options include:
Invocation with config
const response = await model.invoke(
    "Tell me a joke",
    {
        runName: "joke_generation",      // Custom name for this run
        tags: ["humor", "demo"],          // Tags for categorization
        metadata: {"user_id": "123"},     // Custom metadata
        callbacks: [my_callback_handler], // Callback handlers
    }
)
These configuration values are particularly useful when:
  • Debugging with LangSmith tracing
  • Implementing custom logging or monitoring
  • Controlling resource usage in production
  • Tracking invocations across complex pipelines
For more information on all supported RunnableConfig attributes, see the RunnableConfig reference.