initChatModel
to initialize one from a provider of your choice.
initChatModel
for more detail.
invoke()
with a single message or a list of messages.
stream()
returns an that yields output chunks as they are produced. You can use a loop to process each chunk in real-time:
invoke()
, which returns a single AIMessage
after the model has finished generating its full response, stream()
returns multiple AIMessageChunk
objects, each containing a portion of the output text. Importantly, each chunk in a stream is designed to be gathered into a full message via summation:
invoke()
- for example, it can be aggregated into a message history and passed back to the model as conversational context.
Advanced streaming topics
"Auto-streaming" chat models
model.invoke()
within nodes, but LangChain will automatically delegate to streaming if running in a streaming mode.invoke()
a chat model, LangChain will automatically switch to an internal streaming mode if it detects that you are trying to stream the overall application. The result of the invocation will be the same as far as the code that was using invoke is concerned; however, while the chat model is being streamed, LangChain will take care of invoking on_llm_new_token
events in LangChain’s callback system.Callback events allow LangGraph stream()
and streamEvents()
to surface the chat model’s output in real-time.Streaming events
streamEvents()
][BaseChatModel.streamEvents].This simplifies filtering based on event types and other metadata, and will aggregate the full message in the background. See below for an example.streamEvents()
reference for event types and other details.batch()
, you may want to control the maximum number of parallel calls. This can be done by setting the maxConcurrency
attribute in the RunnableConfig
dictionary.RunnableConfig
reference for a full list of supported attributes.bindTools()
. In subsequent invocations, the model can choose to call any of the bound tools as needed.
Some model providers offer built-in tools that can be enabled via model parameters. Check the respective provider reference for details.
Tool execution loop
ToolMessage
returned by the tool includes a tool_call_id
that matches the original tool call, helping the model correlate results with requests.Forcing tool calls
Parallel tool calls
Streaming tool calls
ToolCallChunk
. This allows you to see tool calls as they’re being generated rather than waiting for the complete response.'jsonSchema'
, 'functionCalling'
, 'jsonMode'
)includeRaw=true
to get both the parsed output and the raw AI messageExample: Message output alongside parsed structure
AIMessage
object alongside the parsed representation to access response metadata such as token counts:Example: Nested structures
AIMessage
will have content blocks with multimodal types.
'low'
or 'high'
) or integer token budgets.
For details, see the relevant chat model in the integrations page.
Enable caching for your model
In Memory Cache
Base URL
initChatModel
with these providers by specifying the appropriate base_url
parameter:AIMessage
objects produced by the corresponding model. For more details, see the messages guide.
config
parameter using a RunnableConfig
object. This provides run-time control over execution behavior, callbacks, and metadata tracking.
Common configuration options include:
Key configuration attributes
batch()
.RunnableConfig
attributes, see the RunnableConfig
reference.