This guide provides a quick overview for getting started with OpenAI chat models. For detailed documentation of all ChatOpenAI features and configurations head to the API reference.OpenAI has several chat models. You can find information about their latest models and their costs, context windows, and supported input types in the OpenAI docs.
Now we can instantiate our model object and generate chat completions:
Copy
from langchain_openai import ChatOpenAIllm = ChatOpenAI( model="gpt-5-nano", # stream_usage=True, # temperature=None, # max_tokens=None, # timeout=None, # reasoning_effort="low", # max_retries=2, # api_key="...", # if you prefer to pass api key in directly instaed of using env vars # base_url="...", # organization="...", # other params...)
See API Reference
for the full set of available parameters.
messages = [ ( "system", "You are a helpful assistant that translates English to French. Translate the user sentence.", ), ("human", "I love programming."),]ai_msg = llm.invoke(messages)ai_msg
OpenAI’s Chat Completions API does not stream token usage statistics by default
(see API reference
here).
To recover token counts when streaming with ChatOpenAI or AzureChatOpenAI, set
stream_usage=True as an initialization parameter or on invocation:
Copy
from langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4.1-mini", stream_usage=True)
OpenAI has a tool calling (we use “tool calling” and “function calling” interchangeably here) API that lets you describe tools and their arguments, and have the model return a JSON object with a tool to invoke and the inputs to that tool. tool-calling is extremely useful for building tool-using chains and agents, and for getting structured outputs from models more generally.
With ChatOpenAI.bind_tools, we can easily pass in Pydantic classes, dict schemas, LangChain tools, or even functions as tools to the model. Under the hood these are converted to an OpenAI tool schemas, which looks like:
from pydantic import BaseModel, Fieldclass GetWeather(BaseModel): """Get the current weather in a given location""" location: str = Field(..., description="The city and state, e.g. San Francisco, CA")llm_with_tools = llm.bind_tools([GetWeather])
Copy
ai_msg = llm_with_tools.invoke( "what is the weather like in San Francisco",)ai_msg
As of Aug 6, 2024, OpenAI supports a strict argument when calling tools that will enforce that the tool argument schema is respected by the model. See more here: platform.openai.com/docs/guides/function-callingNote: If strict=True the tool definition will also be validated, and a subset of JSON schema are accepted. Crucially, schema cannot have optional args (those with default values). Read the full docs on what types of schema are supported here: platform.openai.com/docs/guides/structured-outputs/supported-schemas.
Copy
llm_with_tools = llm.bind_tools([GetWeather], strict=True)ai_msg = llm_with_tools.invoke( "what is the weather like in San Francisco",)ai_msg
OpenAI’s structured output feature can be used simultaneously with tool-calling. The model will either generate tool calls or a response adhering to a desired schema. See example below:
Copy
from langchain_openai import ChatOpenAIfrom pydantic import BaseModeldef get_weather(location: str) -> None: """Get weather at a location.""" return "It's sunny."class OutputSchema(BaseModel): """Schema for response.""" answer: str justification: strllm = ChatOpenAI(model="gpt-4.1")structured_llm = llm.bind_tools( [get_weather], response_format=OutputSchema, strict=True,)# Response contains tool calls:tool_call_response = structured_llm.invoke("What is the weather in SF?")# structured_response.additional_kwargs["parsed"] contains parsed outputstructured_response = structured_llm.invoke( "What weighs more, a pound of feathers or a pound of gold?")
OpenAI supports the specification of a context-free grammar for custom tool inputs in lark or regex format. See OpenAI docs for details. The format parameter can be passed into @custom_tool as shown below:
OpenAI supports a Responses API that is oriented toward building agentic applications. It includes a suite of built-in tools, including web and file search. It also supports management of conversation state, allowing you to continue a conversational thread without explicitly passing in previous messages, as well as the output from reasoning processes.ChatOpenAI will route to the Responses API if one of these features is used. You can also specify use_responses_api=True when instantiating ChatOpenAI.
from langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4.1-mini")tool = {"type": "web_search_preview"}llm_with_tools = llm.bind_tools([tool])response = llm_with_tools.invoke("What was a positive news story from today?")
Note that the response includes structured content blocks that include both the text of the response and OpenAI annotations citing its sources. The output message will also contain information from any tool invocations:
from langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4.1-mini")tool = {"type": "image_generation", "quality": "low"}llm_with_tools = llm.bind_tools([tool])ai_message = llm_with_tools.invoke( "Draw a picture of a cute fuzzy cat with an umbrella")
Copy
import base64from IPython.display import Imageimage = next( item for item in ai_message.content_blocks if item["type"] == "image")Image(base64.b64decode(image["base64"]), width=200)
To trigger a file search, pass a file search tool to the model as you would another tool. You will need to populate an OpenAI-managed vector store and include the vector store ID in the tool definition. See OpenAI documentation for more detail.
Copy
from langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4.1-mini")openai_vector_store_ids = [ "vs_...", # your IDs here]tool = { "type": "file_search", "vector_store_ids": openai_vector_store_ids,}llm_with_tools = llm.bind_tools([tool])response = llm_with_tools.invoke("What is deep research by OpenAI?")print(response.text)
Copy
Deep Research by OpenAI is...
As with web search, the response will include content blocks with citations:
Copy
for block in response.content_blocks: if block["type"] == "non_standard": print(block["value"].get("type")) else: print(block["type"])
Copy
file_search_calltext
Copy
text_block = next(block for block in response.content_blocks if block["type"] == "text")text_block["annotations"][:2]
ChatOpenAI supports the "computer-use-preview" model, which is a specialized model for the built-in computer use tool. To enable, pass a computer use tool as you would pass another tool.Currently, tool outputs for computer use are present in the message content field. To reply to the computer use tool call, construct a ToolMessage with {"type": "computer_call_output"} in its additional_kwargs. The content of the message will be a screenshot. Below, we demonstrate a simple example.First, load two screenshots:
Copy
import base64def load_png_as_base64(file_path): with open(file_path, "rb") as image_file: encoded_string = base64.b64encode(image_file.read()) return encoded_string.decode("utf-8")screenshot_1_base64 = load_png_as_base64( "/path/to/screenshot_1.png") # perhaps a screenshot of an applicationscreenshot_2_base64 = load_png_as_base64( "/path/to/screenshot_2.png") # perhaps a screenshot of the Desktop
Copy
from langchain_openai import ChatOpenAI# Initialize modelllm = ChatOpenAI(model="computer-use-preview", truncation="auto")# Bind computer-use tooltool = { "type": "computer_use_preview", "display_width": 1024, "display_height": 768, "environment": "browser",}llm_with_tools = llm.bind_tools([tool])# Construct input messageinput_message = { "role": "user", "content": [ { "type": "text", "text": ( "Click the red X to close and reveal my Desktop. " "Proceed, no confirmation needed." ), }, { "type": "input_image", "image_url": f"data:image/png;base64,{screenshot_1_base64}", }, ],}# Invoke modelresponse = llm_with_tools.invoke( [input_message], reasoning={ "generate_summary": "concise", },)
The response will include a call to the computer-use tool in its content:
OpenAI implements a code interpreter tool to support the sandboxed generation and execution of code.Example use:
Copy
from langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4.1-mini")llm_with_tools = llm.bind_tools( [ { "type": "code_interpreter", # Create a new container "container": {"type": "auto"}, } ])response = llm_with_tools.invoke( "Write and run code to answer the question: what is 3^3?")
Note that the above command created a new container. We can also specify an existing container ID:
Copy
code_interpreter_calls = [ item for item in response.content if item["type"] == "code_interpreter_call"]assert len(code_interpreter_calls) == 1container_id = code_interpreter_calls[0]["extras"]["container_id"]llm_with_tools = llm.bind_tools( [ { "type": "code_interpreter", # Use an existing container "container": container_id, } ])
OpenAI implements a remote MCP tool that allows for model-generated calls to MCP servers.Example use:
Copy
from langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4.1-mini")llm_with_tools = llm.bind_tools( [ { "type": "mcp", "server_label": "deepwiki", "server_url": "https://mcp.deepwiki.com/mcp", "require_approval": "never", } ])response = llm_with_tools.invoke( "What transport protocols does the 2025-03-26 version of the MCP " "spec (modelcontextprotocol/modelcontextprotocol) support?")
MCP Approvals
OpenAI will at times request approval before sharing data with a remote MCP server.In the above command, we instructed the model to never require approval. We can also configure the model to always request approval, or to always request approval for specific tools:
Copy
llm_with_tools = llm.bind_tools( [ { "type": "mcp", "server_label": "deepwiki", "server_url": "https://mcp.deepwiki.com/mcp", "require_approval": { "always": { "tool_names": ["read_wiki_structure"] } } } ])response = llm_with_tools.invoke( "What transport protocols does the 2025-03-26 version of the MCP " "spec (modelcontextprotocol/modelcontextprotocol) support?")
Responses may then include blocks with type "mcp_approval_request".To submit approvals for an approval request, structure it into a content block in an input message:
Hi Bob! Nice to meet you. How can I assist you today?
Copy
second_query = "What is my name?"messages.extend( [ response, {"role": "user", "content": second_query}, ])second_response = llm.invoke(messages)print(second_response.text)
Copy
You mentioned that your name is Bob. How can I assist you further, Bob?
You can use LangGraph to manage conversational threads for you in a variety of backends, including in-memory and Postgres. See this tutorial to get started.
When using the Responses API, LangChain messages will include an "id" field in its metadata. Passing this ID to subsequent invocations will continue the conversation. Note that this is equivalent to manually passing in messages from a billing perspective.
Copy
second_response = llm.invoke( "What is my name?", previous_response_id=response.id,)print(second_response.text)
Copy
Your name is Bob. How can I help you today, Bob?
ChatOpenAI can also automatically specify previous_response_id using the last response in a message sequence:
Copy
from langchain_openai import ChatOpenAIllm = ChatOpenAI( model="gpt-4.1-mini", use_previous_response_id=True,)
If we set use_previous_response_id=True, input messages up to the most recent response will be dropped from request payloads, and previous_response_id will be set using the ID of the most recent response.That is,
Copy
llm.invoke( [ HumanMessage("Hello"), AIMessage("Hi there!", id="resp_123"), HumanMessage("How are you?"), ])
is equivalent to:
Copy
llm.invoke([HumanMessage("How are you?")], previous_response_id="resp_123")
Some OpenAI models will generate separate text content illustrating their reasoning process. See OpenAI’s reasoning documentation for details.OpenAI can return a summary of the model’s reasoning (although it doesn’t expose the raw reasoning tokens). To configure ChatOpenAI to return this summary, specify the reasoning parameter. ChatOpenAI will automatically route to the Responses API if this parameter is set.
Copy
from langchain_openai import ChatOpenAIreasoning = { "effort": "medium", # 'low', 'medium', or 'high' "summary": "auto", # 'detailed', 'auto', or None}llm = ChatOpenAI(model="gpt-5-nano", reasoning=reasoning)response = llm.invoke("What is 3^3?")# Outputresponse.text
Copy
'3³ = 3 × 3 × 3 = 27.'
Copy
# Reasoningfor block in response.content_blocks: if block["type"] == "reasoning": print(block["reasoning"])
Copy
**Calculating the power of three**The user is asking about 3 raised to the power of 3. That's a pretty simple calculation! I know that 3^3 equals 27, so I can say, "3 to the power of 3 equals 27." I might also include a quick explanation that it's 3 multiplied by itself three times: 3 × 3 × 3 = 27. So, the answer is definitely 27.
You can call fine-tuned OpenAI models by passing in your corresponding modelName parameter.This generally takes the form of ft:{OPENAI_MODEL_NAME}:{ORG_NAME}::{MODEL_ID}. For example:
OpenAI has models that support multimodal inputs. You can pass in images, PDFs, or audio to these models. For more information on how to do this in LangChain, head to the multimodal inputs docs.You can see the list of models that support different modalities in OpenAI’s documentation.For all modalities, LangChain supports both its cross-provider standard as well as OpenAI’s native content-block format.To pass multimodal data into ChatOpenAI, create a content block containing the data and incorporate it into a message, e.g., as below:
Copy
message = { "role": "user", "content": [ { "type": "text", # Update prompt as desired "text": "Describe the (image / PDF / audio...)", }, content_block, ],}
Note: OpenAI requires file-names be specified for PDF inputs. When using LangChain’s format, include the filename key.Read more here.Refer to examples in the how-to guide here.In-line base64 data:
Some OpenAI models (such as their gpt-4o and gpt-4o-mini series) support Predicted Outputs, which allow you to pass in a known portion of the LLM’s expected output ahead of time to reduce latency. This is useful for cases such as editing text or code, where only a small part of the model’s output will change.Here’s an example:
Copy
code = """/// <summary>/// Represents a user with a first name, last name, and username./// </summary>public class User{ /// <summary> /// Gets or sets the user's first name. /// </summary> public string FirstName { get; set; } /// <summary> /// Gets or sets the user's last name. /// </summary> public string LastName { get; set; } /// <summary> /// Gets or sets the user's username. /// </summary> public string Username { get; set; }}"""llm = ChatOpenAI(model="gpt-4o")query = ( "Replace the Username property with an Email property. " "Respond only with code, and with no markdown formatting.")response = llm.invoke( [{"role": "user", "content": query}, {"role": "user", "content": code}], prediction={"type": "content", "content": code},)print(response.content)print(response.response_metadata)
Copy
/// <summary>/// Represents a user with a first name, last name, and email./// </summary>public class User{ /// <summary> /// Gets or sets the user's first name. /// </summary> public string FirstName { get; set; } /// <summary> /// Gets or sets the user's last name. /// </summary> public string LastName { get; set; } /// <summary> /// Gets or sets the user's email. /// </summary> public string Email { get; set; }}{'token_usage': {'completion_tokens': 226, 'prompt_tokens': 166, 'total_tokens': 392, 'completion_tokens_details': {'accepted_prediction_tokens': 49, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 107}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_45cf54deae', 'finish_reason': 'stop', 'logprobs': None}
Note that currently predictions are billed as additional tokens and may increase your usage and costs in exchange for this reduced latency.
and the format will be what was passed in model_kwargs['audio']['format'].We can also pass this message with audio data back to the model as part of a message history before openai expires_at is reached.
**Output audio is stored under the audio key in AIMessage.additional_kwargs, but input content blocks are typed with an input_audio type and key in HumanMessage.content lists. **For more information, see OpenAI’s audio docs.
Copy
history = [ ("human", "Are you made by OpenAI? Just answer yes or no"), output_message, ("human", "And what is your name? Just give your name."),]second_output_message = llm.invoke(history)
OpenAI offers a variety of service tiers. The “flex” tier offers cheaper pricing for requests, with the trade-off that responses may take longer and resources might not always be available. This approach is best suited for non-critical tasks, including model testing, data enhancement, or jobs that can be run asynchronously.To use it, initialize the model with service_tier="flex":