Skip to main content
You might need to rebuild your graph with a different configuration for a new run. For example, you might want to load different tools depending on the user’s credentials. This guide shows how you can do this using ServerRuntime.
In most cases, customization is best handled by conditioning on the config within individual nodes rather than dynamically changing the whole graph structure. This makes it easier to test and manage.

Prerequisites

  • Make sure to check out this how-to guide on setting up your app for deployment first.
  • ServerRuntime requires langgraph-api >= 0.7.31 and langgraph-sdk >= 0.3.5. Prior to that, graph factories only accepted a single config: RunnableConfig argument.

Define graphs

Let’s say you have an app with a simple graph that calls an LLM and returns the response to the user. The app file directory looks like the following:
my-app/
|-- langgraph.json
|-- my_project/
|   |-- __init__.py
|   |-- agents.py     # code for your graph
|-- pyproject.toml
where the graph is defined in agents.py.

No rebuild

The most common way to deploy your Agent Server is to reference a compiled graph instance that’s defined at the top level of your file. An example is below:
# my_project/agents.py
from langgraph.graph import StateGraph, MessagesState, START

async def model(state: MessagesState):
    return {"messages": [{"role": "assistant", "content": "Hi, there!"}]}

graph_workflow = StateGraph(MessagesState)
graph_workflow.add_node("model", model)
graph_workflow.add_edge(START, "model")
agent = graph_workflow.compile()
To make the server aware of your graph, you need to specify a path to the variable that contains the CompiledStateGraph instance in your LangGraph API configuration (langgraph.json), e.g.:
{
    "$schema": "https://langgra.ph/schema.json",
    "dependencies": ["."],
    "graphs": {
        "chat_agent": "my_project.agents:agent",
    }
}

Rebuild

To rebuild your graph on each new run, provide a factory function that returns (or yields) a graph. The factory can optionally accept a ServerRuntime parameter or a RunnableConfig. The server inspects your function’s type annotations to determine which arguments to inject, so make sure to include the correct type hints. The server’s queue workers will call your factory function any time they need to process a run. The function will also be called for certain other endpoints to update state, read state, or to fetch assistant schemas. The ServerRuntime tells you which context triggered the call.
ServerRuntime is in beta and may change in future releases.

Simple factory

The simplest form is a plain async def that returns a compiled graph:
from langchain_openai import ChatOpenAI
from langgraph.graph import START, StateGraph
from langchain_core.runnables import RunnableConfig
from langgraph_sdk.runtime import ServerRuntime

from my_agent.utils.state import AgentState

model = ChatOpenAI(model="gpt-5.2")


def make_graph_for_user(user_id: str):
    """Build a graph customized per user."""
    graph_workflow = StateGraph(AgentState)

    async def call_model(state):
        return {"messages": [await model.ainvoke(state["messages"])]}

    graph_workflow.add_node("agent", call_model)
    graph_workflow.add_edge(START, "agent")
    return graph_workflow.compile()


async def make_graph(config: RunnableConfig, runtime: ServerRuntime):
    user = runtime.ensure_user()
    return make_graph_for_user(user.identity)

Context manager factory

If you need to set up and tear down resources (database connections, load MCP tools, etc.), use an async context manager. Use runtime.execution_runtime to check whether the graph is being called for actual execution or just for introspection (schemas, visualization):
import contextlib

from langchain_openai import ChatOpenAI
from langgraph.graph import START, StateGraph
from langchain_core.runnables import RunnableConfig
from langgraph_sdk.runtime import ServerRuntime

from my_agent.utils.state import AgentState

model = ChatOpenAI(model="gpt-5.2")


def make_agent_graph(tools: list):
    """Make a simple LLM agent."""
    graph_workflow = StateGraph(AgentState)
    bound = model.bind_tools(tools)

    async def call_model(state):
        return {"messages": [await bound.ainvoke(state["messages"])]}

    graph_workflow.add_node("agent", call_model)
    graph_workflow.add_edge(START, "agent")
    return graph_workflow.compile()


@contextlib.asynccontextmanager
async def make_graph(runtime: ServerRuntime):
    if ert := runtime.execution_runtime:
        # Only set up expensive resources during actual execution.
        # Introspection calls (get_schema, get_graph, ...) skip this.
        mcp_tools = await connect_mcp(ert.ensure_user())  # your setup logic
        yield make_agent_graph(tools=mcp_tools)
        await disconnect_mcp()  # your teardown logic
    else:
        # For schema/state reads, return a graph with the same
        # topology but no expensive resource setup.
        yield make_agent_graph(tools=[])
Finally, specify the path to your factory in langgraph.json:
{
    "$schema": "https://langgra.ph/schema.json",
    "dependencies": ["."],
    "graphs": {
        "chat_agent": "my_project.agents:make_graph",
    }
}

ServerRuntime reference

Your factory function receives a ServerRuntime instance with the following attributes:
AttributeTypeDescription
access_contextstrWhy the factory was called: "threads.create_run", "threads.update", "threads.read", or "assistants.read".
userBaseUser | NoneThe authenticated user, or None if no custom auth is configured.
storeBaseStoreThe store instance for persistence and memory.
Methods:
MethodDescription
ensure_user()Returns the authenticated user. Raises PermissionError if no user is provided.
execution_runtimeReturns the execution runtime when access_context is "threads.create_run", or None otherwise. Use this to conditionally set up expensive resources only during execution.

Access contexts

The server calls your factory in several contexts beyond just executing runs. In all contexts, the returned graph should have the same topology (nodes, edges, state schema). A mismatched topology in write contexts (threads.create_run, threads.update) can cause incorrect state updates. In read contexts (threads.read, assistants.read), a mismatch affects reported pending tasks, schemas, and visualizations but won’t corrupt data. Use execution_runtime to conditionally set up expensive resources without changing the graph structure.
ContextDescription
threads.create_runFull graph execution. execution_runtime is available.
threads.updateState update via aupdate_state. Does not execute node functions, but it can change the pending tasks.
threads.readState reads via aget_state / aget_state_history.
assistants.readSchema and graph introspection for visualization, MCP, A2A, etc.
See more info on LangGraph API configuration file here.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.