Integrate with the ChatGoogleGenerativeAI chat model using LangChain Python.
Access Google’s Generative AI models, including the Gemini family, via the Gemini Developer API or Vertex AI. The Gemini Developer API offers quick setup with API keys, ideal for individual developers. Vertex AI provides enterprise features and integrates with Google Cloud Platform.For information on the latest models, model IDs, their features, context windows, etc. head to the Google AI docs.
Vertex AI consolidation & compatibilityAs of langchain-google-genai 4.0.0, this package uses the consolidated google-genai SDK instead of the legacy google-ai-generativelanguage SDK.This migration brings support for Gemini models both via the Gemini Developer API and Gemini API in Vertex AI, superseding certain classes in langchain-google-vertexai, such as ChatVertexAI.Read the full announcement and migration guide.
API ReferenceFor detailed documentation of all features and configuration options, head to the ChatGoogleGenerativeAI API reference.
To access Google AI models you’ll need to create a Google Account, get a Google AI API key, and install the langchain-google-genai integration package.
Now we can instantiate our model object and generate responses:
Gemini Developer API
Vertex AI
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI( model="gemini-3-pro-preview", temperature=1.0, # Gemini 3.0+ defaults to 1.0 max_tokens=None, timeout=None, max_retries=2, # other params...)
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI( model="gemini-3-pro-preview", project="your-project-id", location="us-central1", # Optional, defaults to us-central1 temperature=1.0, # Gemini 3.0+ defaults to 1.0 max_tokens=None, timeout=None, max_retries=2, # other params...)
Providing project automatically selects the Vertex AI backend unless you explicitly set vertexai=False.
Temperature for Gemini 3.0+ modelsIf temperature is not explicitly set and the model is Gemini 3.0 or later, it will be automatically set to 1.0 instead of the default 0.7 per Google GenAI API best practices. Using 0.7 with Gemini 3.0+ can cause infinite loops, degraded reasoning performance, and failure on complex tasks.
See the ChatGoogleGenerativeAI API Reference for the full set of available model parameters.
messages = [ ( "system", "You are a helpful assistant that translates English to French. Translate the user sentence.", ), ("human", "I love programming."),]ai_msg = model.invoke(messages)ai_msg
Certain models can generate text and images inline. See Gemini API docs for details.
Copy
import base64from IPython.display import Image, displayfrom langchain.messages import AIMessagefrom langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-2.5-flash-image") response = model.invoke("Generate a photorealistic image of a cuddly cat wearing a hat.")def _get_image_base64(response: AIMessage) -> None: image_block = next( block for block in response.content if isinstance(block, dict) and block.get("image_url") ) return image_block["image_url"].get("url").split(",")[-1]image_base64 = _get_image_base64(response)display(Image(data=base64.b64decode(image_base64), width=300))
Use image_config to control image dimensions and quality (see genai.types.ImageConfig). It can be set at instantiation (applies to all calls) or at invocation (per-call override):
Copy
from langchain_google_genai import ChatGoogleGenerativeAI# Set at instantiation (applies to all calls)model = ChatGoogleGenerativeAI( model="gemini-2.5-flash-image", image_config={"aspect_ratio": "16:9"}, )# Or override per callresponse = model.invoke( "Generate a photorealistic image of a cuddly cat wearing a hat.", image_config={"aspect_ratio": "1:1"}, )
By default, image generation models may return both text and images (e.g. “Ok! Here’s an image of a…”).You can request that the model only return images by setting the response_modalities parameter:
Copy
from langchain_google_genai import ChatGoogleGenerativeAI, Modalitymodel = ChatGoogleGenerativeAI( model="gemini-2.5-flash-image", response_modalities=[Modality.IMAGE], )# All invocations will return only imagesresponse = model.invoke("Generate a photorealistic image of a cuddly cat wearing a hat.")
Certain models can generate audio files. See Gemini API docs for details.
Vertex AI LimitationAudio generation models are currently in limited preview on Vertex AI and may require allowlist access. If you encounter an INVALID_ARGUMENT error when using TTS models with vertexai=True, your GCP project may need to be allowlisted.For more details, see this Google AI forum discussion.
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-2.5-flash-preview-tts") response = model.invoke("Please say The quick brown fox jumps over the lazy dog")# Base64 encoded binary data of the audiowav_data = response.additional_kwargs.get("audio")with open("output.wav", "wb") as f: f.write(wav_data)
from langchain.tools import toolfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAI# Define the tool@tool(description="Get the current weather in a given location")def get_weather(location: str) -> str: return "It's sunny."# Initialize and bind (potentially multiple) tools to the modelmodel_with_tools = ChatGoogleGenerativeAI(model="gemini-3-pro-preview").bind_tools([get_weather])# Step 1: Model generates tool callsmessages = [HumanMessage("What's the weather in Boston?")]ai_msg = model_with_tools.invoke(messages)messages.append(ai_msg)# Check the tool calls in the responseprint(ai_msg.tool_calls)# Step 2: Execute tools and collect resultsfor tool_call in ai_msg.tool_calls: # Execute the tool with the generated arguments tool_result = get_weather.invoke(tool_call) messages.append(tool_result)# Step 3: Pass results back to model for final responsefinal_response = model_with_tools.invoke(messages)final_response
method="json_schema" (default): Uses Gemini’s native structured output. Recommended for better reliability, as it constrains the model’s generation process directly rather than relying on post-processing tool calls.
method="function_calling": Uses tool calling to extract structured data.
When using with_structured_output(method="function_calling"), do not pass additional tools (like Google Search) in the same call.To get structured output and search grounding in a single call, use .bind() with response_mime_type and response_schema instead of with_structured_output:
Copy
from langchain_google_genai import ChatGoogleGenerativeAIfrom pydantic import BaseModelclass MatchResult(BaseModel): winner: str final_match_score: str scorers: list[str]llm = ChatGoogleGenerativeAI(model="gemini-3-pro-preview")llm_with_search = llm.bind( tools=[{"google_search": {}}], response_mime_type="application/json", response_schema=MatchResult.model_json_schema(),)response = llm_with_search.invoke( "Search for details of the latest Euro championship final match.")
This uses Gemini’s native JSON schema mode for structuring the output while allowing tools like Google Search for grounding — all in a single LLM call.
Access token usage information from the response metadata.
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-3-pro-preview")result = model.invoke("Explain the concept of prompt engineering in one sentence.")print(result.content)print("\nUsage Metadata:")print(result.usage_metadata)
Copy
Prompt engineering is the art and science of crafting effective text prompts to elicit desired and accurate responses from large language models.Usage Metadata:{'input_tokens': 10, 'output_tokens': 24, 'total_tokens': 34, 'input_token_details': {'cache_read': 0}}
Certain Gemini models support configurable thinking depth. The parameter depends on the model version:
Model family
Parameter
Values
Gemini 3+
thinking_level
"minimal", "low", "medium", "high" (default for Pro)
Gemini 2.5
thinking_budget
0 (off), -1 (dynamic), or a positive integer (token limit)
Copy
from langchain_google_genai import ChatGoogleGenerativeAI# Gemini 3+: use thinking_levelllm = ChatGoogleGenerativeAI( model="gemini-3-pro-preview", thinking_level="low", )response = llm.invoke("How many O's are in Google?")
To see a thinking model’s reasoning, set include_thoughts=True:
Copy
from langchain_google_genai import ChatGoogleGenerativeAIllm = ChatGoogleGenerativeAI( model="gemini-3-pro-preview", include_thoughts=True, )response = llm.invoke("How many O's are in Google? How did you verify your answer?")reasoning_tokens = response.usage_metadata["output_token_details"]["reasoning"]print("Response:", response.content)print("Reasoning tokens used:", reasoning_tokens)
Thought signatures are encrypted representations of the model’s reasoning. They enable Gemini to maintain thought context across multi-turn conversations, since the API is stateless.
Gemini 3 may raise 4xx errors if thought signatures are not passed back with tool call responses. Upgrade to langchain-google-genai >= 3.1.0 to ensure this is handled correctly.
Signatures appear in AIMessage responses:
Text blocks: extras.signature within the content block
For multi-turn conversations, pass the full AIMessage back to the model so signatures are preserved. This happens automatically when you append the AIMessage to your messages list (as shown in the tool calling example above).
Don’t reconstruct messages manually. If you create a new AIMessage instead of passing the original object, the signatures will be lost and the API may reject the request.
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-3-pro-preview")model_with_search = model.bind_tools([{"google_search": {}}]) response = model_with_search.invoke("When is the next total solar eclipse in US?")response.content_blocks
Copy
[{'type': 'text', 'text': 'The next total solar eclipse visible in the contiguous United States will occur on...', 'annotations': [{'type': 'citation', 'id': 'abc123', 'url': '<url for source 1>', 'title': '<source 1 title>', 'start_index': 0, 'end_index': 99, 'cited_text': 'The next total solar eclipse...', 'extras': {'google_ai_metadata': {'web_search_queries': ['next total solar eclipse in US'], 'grounding_chunk_index': 0, 'confidence_scores': []}}}, ...
Certain models support grounding using Google Maps. Maps grounding connects Gemini’s generative capabilities with Google Maps’ current, factual location data. This enables location-aware applications that provide accurate, geographically specific responses. See Gemini docs for detail.
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-2.5-pro")model_with_maps = model.bind_tools([{"google_maps": {}}]) response = model_with_maps.invoke( "What are some good Italian restaurants near the Eiffel Tower in Paris?")
The response will include grounding metadata with location information from Google Maps.You can optionally provide a specific location context using tool_config with lat_lng. This is useful when you want to ground queries relative to a specific geographic point.
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-2.5-pro")# Provide location context (latitude and longitude)model_with_maps = model.bind_tools( [{"google_maps": {}}], tool_config={ "retrieval_config": { # Eiffel Tower "lat_lng": { "latitude": 48.858844, "longitude": 2.294351, } } },)response = model_with_maps.invoke( "What Italian restaurants are within a 5 minute walk from here?")
The URL context tool enables the model to access and analyze content from URLs you provide in your prompt. This is useful for tasks like summarizing web pages, extracting data from multiple sources, or answering questions about online content. See Gemini docs for detail and limitations.
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-2.5-flash")model_with_url_context = model.bind_tools([{"url_context": {}}]) response = model_with_url_context.invoke( "Summarize the content at https://docs.langchain.com")
The Gemini 2.5 Computer Use model (gemini-2.5-computer-use-preview-10-2025) can interact with browser environments to automate web tasks like clicking, typing, and scrolling.
Preview model limitationsThe Computer Use model is in preview and may produce unexpected behavior. Always supervise automated tasks and avoid use with sensitive data or critical operations. See the Gemini API docs for safety best practices.
Copy
from langchain_google_genai import ChatGoogleGenerativeAImodel = ChatGoogleGenerativeAI(model="gemini-2.5-computer-use-preview-10-2025") model_with_computer = model.bind_tools([{"computer_use": {}}]) response = model_with_computer.invoke("Please navigate to example.com")response.content_blocks
You can configure the environment and exclude specific UI actions:
Advanced configuration
Copy
from langchain_google_genai import ChatGoogleGenerativeAI, Environmentmodel = ChatGoogleGenerativeAI(model="gemini-2.5-computer-use-preview-10-2025") # Specify the environment (browser is default)model_with_computer = model.bind_tools( [{"computer_use": {"environment": Environment.ENVIRONMENT_BROWSER}}] )# Exclude specific UI actionsmodel_with_computer = model.bind_tools( [ { "computer_use": { "environment": Environment.ENVIRONMENT_BROWSER, "excludedPredefinedFunctions": [ "drag_and_drop", "key_combination", ], } } ])response = model_with_computer.invoke("Search for Python tutorials")
The model returns function calls for UI actions (like click_at, type_text_at, scroll) with normalized coordinates. You’ll need to implement the actual execution of these actions in your browser automation framework.
Gemini models have default safety settings that can be overridden. If you are receiving lots of 'Safety Warnings' from your models, you can try tweaking the safety_settings attribute of the model. For example, to turn off safety blocking for dangerous content, you can construct your LLM as follows:
Context caching allows you to store and reuse content (e.g., PDFs, images) for faster processing. The cached_content parameter accepts a cache name created via the Google Generative AI API.
Single file example
This caches a single file and queries it.
Copy
import timefrom google import genaifrom google.genai import typesfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAIclient = genai.Client()# Upload filefile = client.files.upload(file="path/to/your/file")while file.state.name == "PROCESSING": time.sleep(2) file = client.files.get(name=file.name)# Create cachemodel = "gemini-3-pro-preview"cache = client.caches.create( model=model, config=types.CreateCachedContentConfig( display_name="Cached Content", system_instruction=( "You are an expert content analyzer, and your job is to answer " "the user's query based on the file you have access to." ), contents=[file], ttl="300s", ),)# Query with LangChainllm = ChatGoogleGenerativeAI( model=model, cached_content=cache.name,)message = HumanMessage(content="Summarize the main points of the content.")llm.invoke([message])
Multiple files example
This caches two files using Part and queries them together.
Copy
import timefrom google import genaifrom google.genai.types import CreateCachedContentConfig, Content, Partfrom langchain.messages import HumanMessagefrom langchain_google_genai import ChatGoogleGenerativeAIclient = genai.Client()# Upload filesfile_1 = client.files.upload(file="./file1")while file_1.state.name == "PROCESSING": time.sleep(2) file_1 = client.files.get(name=file_1.name)file_2 = client.files.upload(file="./file2")while file_2.state.name == "PROCESSING": time.sleep(2) file_2 = client.files.get(name=file_2.name)# Create cache with multiple filescontents = [ Content( role="user", parts=[ Part.from_uri(file_uri=file_1.uri, mime_type=file_1.mime_type), Part.from_uri(file_uri=file_2.uri, mime_type=file_2.mime_type), ], )]model = "gemini-3-pro-preview"cache = client.caches.create( model=model, config=CreateCachedContentConfig( display_name="Cached Contents", system_instruction=( "You are an expert content analyzer, and your job is to answer " "the user's query based on the files you have access to." ), contents=contents, ttl="300s", ),)# Query with LangChainllm = ChatGoogleGenerativeAI( model=model, cached_content=cache.name,)message = HumanMessage( content="Provide a summary of the key information across both files.")llm.invoke([message])
See the Gemini API docs on context caching for more information.