init_chat_model
to initialize one from a model provider of your choice.
init_chat_model
for more detail.
'a'
For example: ainvoke()
, astream()
, abatch()
A full list of async methods can be found in the reference.invoke()
with a single message or a list of messages.
stream()
returns an
that yields output chunks as they are produced. You can use a loop to process
each chunk in real-time:
invoke()
, which returns a single AIMessage
after the model has
finished generating its full response, stream()
returns multiple
AIMessageChunk
objects, each containing a portion of the output text.
Importantly, each AIMessageChunk
in a stream is designed to be gathered into a
full message via summation:
"Auto-Streaming" Chat Models
model.invoke()
within nodes, but LangChain will automatically
delegate to streaming if running in a streaming mode.invoke()
a chat model, LangChain will automatically switch to
an internal streaming mode if it detects that you are trying to stream the
overall application. The result of the invocation will be the same as far as
the code that was using invoke is concerned; however, while the chat model
is being streamed, LangChain will take care of invoking on_llm_new_token
events in LangChain’s callback system.Callback events allow LangGraph stream()
and astream_events()
to surface the chat model’s output in real-time.Stream events
astream_events()
.This simplifies filtering based on event types and other metadata, and will
aggregate the full message in the background. See below for an example.astream_events()
reference for event types and other details.batch()
, which parallelizes model
calls client-side. It is distinct from batch APIs supported by inference
providers.batch()
will only return the final output for the entire batch. If you want to receive
the output for each individual input as it is finishes generating, you can
stream results with batch_as_completed()
:
batch_as_completed()
, results may arrive out of order. Each
includes the input index for matching to reconstruct the original order if
needed.batch()
or
batch_as_completed()
, you may want to control the maximum number of
parallel calls. This can be done by setting the max_concurrency
attribute
in the RunnableConfig
dictionary.RunnableConfig
reference for a full list of supported attributes.function calling
. We use this term
interchangeably with tool calling
.bind_tools()
.
In subsequent invocations, the model can choose to call any of the bound tools
as needed.
Some model providers offer built-in tools that can be enabled via model
parameters. Check the respective
provider reference for details.
Tool execution loop
ToolMessage
returned by the tool includes a tool_call_id
that
matches the original tool call, helping the model correlate results with
requests.Forcing tool calls
Parallel tool calls
Streaming tool calls
ToolCallChunk
. This allows you to see tool calls as they’re being
generated rather than waiting for the complete response.'json_schema'
, 'function_calling'
, 'json_mode'
)include_raw=True
to get both the parsed output and
the raw AI messageTypedDict
and JSON Schema require manual validationExample: get `AIMessage` output alongside parsed structure
AIMessage
object alongside the parsed
representation, to access response metadata such as token counts and other
information.Example: nested structures
AIMessage
will have content blocks with multimodal types.
'low'
or 'high'
) or integer token budgets.
For details, see the relevant chat model in the
integrations page.
Enable caching for your model
In Memory Cache
SQLite Cache
rate_limiter
parameter that can be provided during initialization to control the rate at
which requests are made.
Initialize and use a rate limiter
Base URL
init_chat_model
with these providers by specifying the appropriate base_url
parameter:Proxy configuration
logprobs
parameter when initializing a model:
AIMessage
objects produced by the corresponding model. For more details, see
the messages guide.
config
parameter using a RunnableConfig
dictionary. This provides run-time control over execution behavior, callbacks,
and metadata tracking.
Common configuration options include:
Key configuration attributes
batch()
or
batch_as_completed()
.RunnableConfig
attributes, see the
RunnableConfig
reference.
configurable_fields
. If you don’t specify a model value, then 'model'
and
'model_provider'
will be configurable by default.
Configurable model with default values
Using a configurable model declaratively
bind_tools
, with_structured_output
,
with_configurable
, etc. on a configurable model and chain a configurable model
in the same way that we would a regularly instantiated chat model object.