Assistants
Threads
Runs
Cron jobs
Application structure
To deploy an Agent Server application, you need to specify the graph(s) you want to deploy, as well as any relevant configuration settings, such as dependencies and environment variables. Read the application structure guide to learn how to structure your LangGraph application for deployment.LangSmith cloud manages the database for you. If you’re deploying on your own infrastructure, you’ll need to set it up yourself.
Parts of a deployment
When you deploy Agent Server, you are deploying one or more graphs, a database for persistence, and a task queue.Graphs
When you deploy a graph with Agent Server, you are deploying a “blueprint” for an Assistant. A graph most commonly implements an agent, but it does not have to. For example, a graph could implement a simple chatbot that only supports back-and-forth conversation, without the ability to influence any application control flow. In reality, as applications get more complex, a graph will often implement a more complex flow that may use multiple agents working in tandem. Graphs don’t have to be written with LangGraph. You can also deploy agents built with other frameworks—such as Strands or Google ADK—using the LangGraph Functional API. For details, refer to Deploy other frameworks.Graph loading and compilation
How and when your graph is compiled depends on how you register it in your application structure:- Compiled graph (recommended): Export an already-compiled
CompiledGraphinstance. The server loads it once at container startup and reuses it for every run—no compilation overhead per request. - Factory function: Export an agent factory function that the server invokes each time it needs the graph. Use this only when you need per-run graph customization (for example, choosing different models or tools based on the assistant config). Keep factory functions lightweight, since they run on every invocation.
Persistence
Agent Server persists three types of data, all backed by PostgreSQL by default:- Core resource data: assistants, threads, runs, and cron jobs. Always stored in PostgreSQL.
- Checkpoints (short-term memory): snapshots of graph execution state written at each step. They make runs durable: if a worker is interrupted, the run can resume from the last checkpoint rather than from the beginning. Durability mode controls checkpoint frequency—
async(default) writes after each step;exitstores only the final state. LangSmith stores this in PostgreSQL by default; but you can switch to MongoDB or a custom implementation. For details, refer to Configure checkpointer backend. - Store (long-term memory): memory that persists across threads, enabling agents to retain information between separate conversations. Stored in PostgreSQL by default but can be replaced with a custom implementation. For details, refer to Add custom store.
Task queue
When a client creates a run, the API server enqueues it and a queue worker picks it up for execution. Workers can also be signaled to cancel a run in progress, and publish output events that open/stream connections forward to the client in real time.
Redis handles the signaling, cancellation, and streaming pub/sub between API servers and queue workers. It stores only ephemeral data—no user or run data persists in Redis. Run data itself is always read from and written to PostgreSQL.
For more information on how to set up and manage these components, review the hosting options guide.
Runtime architecture
Deployment modes
Agent Server supports three runtime configurations:- Single host: The API server manages the task queue directly with no separate queue workers. This is the default for self-hosted deployments and is suitable for development and low-traffic use cases.
- Split API and queue: Dedicated queue workers handle run execution on separate hosts from the API server. For self-hosted deployments, enable this by setting
queue.enabled: truein your configuration. Each tier scales independently—API servers scale on request volume, queue workers scale on pending run count. - Distributed runtime: The API and queue processes are again run separately, but instead of a single queue process handling both the orchestration and execution of your graph, the distributed runtime uses one process for orchestration and one process for execution. Use this for large-scale deployments with high concurrency requirements.
Container architecture
A typical deployment consists of two kinds of long-running containers, both built from the same Docker image (a base image with your project code installed on top):- API servers handle client requests (creating runs, reading thread state, streaming results) but do not execute agent code themselves.
- Queue workers are the execution engine. They listen to the durable task queue, execute your graph code, and write checkpoints.
Run execution lifecycle
When you invoke a run, the request flows through several components:- A client sends a request to an API server, which creates a pending run in the durable task queue.
- A queue worker picks up the run, acquires a lease on it, loads the appropriate graph, and begins execution. The queue enforces that at most 1 run can be executed for a given thread at one time.
- As the graph executes, the worker writes checkpoints to the persistence layer (the frequency depends on the durability mode) and broadcasts streaming events over the configured pubsub provider.
- If the client opened a
/streamconnection, the API server subscribes to the pubsub channel and forwards events to the client via server-sent events in real time. - When execution completes, the worker updates the run status and releases its slot for the next run.
N_JOBS_PER_WORKER runs concurrently (default: 10), so a single worker container serves many runs in parallel. See Configure Agent Server for scale for tuning guidance.
Learn more
- Application Structure guide explains how to structure your application for deployment.
- The API Reference provides detailed information on the API endpoints and data models.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

