Overview
Agents use information from memory and their execution environment to accomplish tasks. In production, there are a few primitives that determine how information is shared and accessed:- Thread: a single conversation. Message history and scratch files are scoped to the thread by default and don’t carry over.
- User: someone interacting with your agent. Memory and files can be private to a user or shared across users. Identity and authorization comes from your auth layer.
- Assistant: a configured agent instance. Memory and files can be tied to one assistant or shared across all of them.
- LangSmith Deployments: managed infrastructure with auth, webhooks, and cron
- Production considerations: multi-tenancy, async, and durability
- Memory: persist information across conversations
- Execution environment: file storage and code execution
- Guardrails: rate limiting, error handling, and data privacy
- Frontend: connect your UI to a deployed agent
LangSmith Deployments
The fastest way to get a Deep Agent into production is LangSmith Deployments. It provisions the infrastructure your agent needs: assistants, threads, runs, a store, and a checkpointer, so you don’t have to set these up yourself. It also gives you authentication, webhooks, cron jobs, and observability out of the box, and can expose your agent via MCP or A2A. For setup instructions, see the LangSmith Deployments quickstart. All code snippets on this page use the followinglanggraph.json unless otherwise specified:
langgraph.json
langgraph.json is the configuration file that tells the LangGraph platform how to build and run your application. It lives at the root of your project and is required for both local development (with langgraph dev) and production deployment. The key fields are:
| Field | Description |
|---|---|
dependencies | Packages to install. ["."] installs the current directory as a package (reads from requirements.txt, pyproject.toml, or package.json). |
graphs | Maps graph IDs to their code locations. Each entry is "<id>": "./<file>:<variable>", where <id> is the name you use to invoke the graph via the API, and <variable> is the compiled graph or constructor function exported from <file>. |
env | Path to a .env file with environment variables (API keys, secrets). These are set at build time and available at runtime. |
Production considerations
Multi-tenancy
When your agent serves multiple users, you need to verify who each user is and control what they can access. LangSmith Deployments supports custom authentication to establish user identity and auth handlers to control access to threads and assistants. See Agent Auth to pass end-user credentials through to the agent for authenticated calls on the user’s behalf. How you scope memory and execution environments determines what information is shared between users. See the sections below for details.Async
LLM-based applications are heavily I/O-bound: calling language models, databases, and external services. Async programming lets these operations run concurrently instead of blocking, improving throughput and responsiveness.LangChain follows the convention of prefixing
a to async method names (e.g., ainvoke, abefore_agent, astream). Sync and async variants live in the same class or namespace.- Create async tools. LangChain runs sync tools in a separate thread to avoid blocking, but native async avoids the threading overhead entirely.
- Use async middleware methods. Custom middleware should implement async hooks (e.g.,
abefore_agentinstead ofbefore_agent). - Use async for external resource lifecycle. Creating sandboxes or connecting to MCP servers involves network calls and should be awaited. This is why graph factories that provision these resources are async.
Durability
Deep Agents run on LangGraph, which provides durable execution out of the box. The persistence layer checkpoints state at each step, so a run interrupted by a failure, timeout, or human-in-the-loop pause resumes from its last recorded state without reprocessing previous steps. For long-running deep agents that spawn many subagents, this means a mid-run failure doesn’t lose completed work. Checkpointing also enables:- Indefinite interrupts. Human-in-the-loop workflows can pause for minutes or days and resume exactly where they left off.
- Time travel. Every checkpointed step is a snapshot you can rewind to, letting you replay from an earlier state if something goes wrong.
- Safe handling of sensitive operations. For workflows involving payments or other irreversible actions, checkpoints provide an audit trail and a recovery point to inspect the exact state that led to an action.
Memory
Without memory, every conversation starts from scratch. Memory lets your agent retain information across conversations (user preferences, learned instructions, past experiences) so it can personalize its behavior over time. For an overview of memory types, see the memory concepts guide.Scoping
Memory is always persistent across conversations. The main question is how it’s scoped across user and assistant boundaries. The right scope depends on who should see and modify the data: The examples below are based on an organization running two assistants: an email writing assistant and a social media drafting assistant, serving multiple users.| Scope | Namespace | Use case | Example |
|---|---|---|---|
| Private | (assistant_id, user_id) | Per-user preferences within one assistant | ”Sign my emails ‘Best, Alex’” for the email assistant only |
| Assistant | (assistant_id) | Shared instructions for one assistant | ”Cap posts at 280 characters” for the social media assistant |
| User | (user_id) | Global user profile across all assistants | User’s preferred language is Spanish |
| Organization | () | Read-only policies for all users and assistants | ”Never disclose internal pricing” |
Configuration
In Deep Agents, memory is stored as files in a virtual filesystem. By default, files only last for a single conversation. To persist them, route a path like/memories/ to a StoreBackend that writes to the LangGraph Store. Use a CompositeBackend to give the agent both ephemeral scratch space and persistent long-term memory.
- Private (most common)
- Assistant
- User
- Organization
Namespace by
(assistant_id, user_id). Each user gets private memory within a given assistant.agent.py
Execution environment
Locally, agents can read and write files on disk and run shell commands directly. In production, you need to think about isolation and persistence. The right setup depends on whether your agent needs to execute code:- Filesystem backends are enough if your agent only reads and writes files. Choose a backend that matches your persistence needs: ephemeral scratch space, persistent storage, or a mix of both.
- Sandboxes add an isolated container with an
executetool for running shell commands. Use a sandbox if your agent needs to run code, install packages, or do anything beyond file I/O.
Filesystem
Choose a backend based on what needs to persist:- StateBackend (default): ephemeral scratch space, scoped to a single conversation. Checkpointed at every step, so avoid writing large files.
- StoreBackend: persistent storage that survives across conversations. Scope with a namespace factory.
- CompositeBackend: mix both. Ephemeral scratch space by default with persistent routes for specific paths like
/memories/.
Sandboxes
If your agent needs to run code (not just read and write files), use a sandbox. Sandboxes provide both a filesystem and anexecute tool for running shell commands, all inside an isolated container. This isolation also protects your host: if the agent’s code exhausts memory or crashes, only the sandbox is affected. Your server keeps running.
Lifecycle
The key decision is how long a sandbox lives. Does each conversation get a fresh one, or do conversations share a persistent environment?| Scope | Sandbox ID stored on | Lifecycle | Example use case |
|---|---|---|---|
| Thread-scoped | Thread metadata | Fresh per conversation, cleaned up on TTL | A data analysis bot where each conversation starts clean |
| Assistant-scoped | Assistant config | Shared across all conversations | A coding assistant that maintains a cloned repo across conversations |
The examples below use an async graph factory instead of a static graph because the sandbox needs the
thread_id or assistant_id from the runtime config to look up or create the correct sandbox. A graph factory receives the config on each run, so it can resolve the sandbox before building the agent. The factory is async because sandbox creation is an I/O-bound operation that requires runtime information like thread_id or assistant_id that is only available at invocation time.- Thread-scoped (most common)
- Assistant-scoped
Each conversation gets its own sandbox. The graph factory reads
thread_id from the config, so each thread automatically gets its own isolated environment. The provider’s label-based lookup handles deduplication across runs. Cleaned up when the sandbox TTL expires.agent.py
agent variable is an async function (not a compiled graph), the server treats it as a graph factory and calls it on each run, injecting the config. The factory looks up or creates the sandbox via the provider’s label-based search and returns a fresh agent graph wired to that sandbox.
Once deployed with langgraph deploy, invoke the agent from your application code using the SDK. The client-side code is the same regardless of scope. The scoping is handled entirely in the agent factory above, but the behavior differs:
- Thread-scoped
- Assistant-scoped
Each thread gets its own sandbox. Follow-up messages within the same thread reuse the same sandbox, but a new thread always starts fresh with no leftover files or installed packages from previous conversations.
client.py
File transfers
Sandboxes are isolated containers, so your application code can’t directly access files inside them. Useupload_files() and download_files() to move data across the sandbox boundary:
- Seed the sandbox before the agent runs: upload user files, skill scripts, configuration, or persistent memories so the agent has what it needs from the start
- Retrieve results after the agent finishes: download generated artifacts (reports, plots, exports) and sync updated memories back for future conversations
Example: syncing skills and memories with custom middleware
Example: syncing skills and memories with custom middleware
Skill scripts that the agent needs to execute must be uploaded into the sandbox before the agent runs. You may also want to sync memories so the agent can read and update them inside the container. Use custom middleware with
before_agent and after_agent hooks to move files across the sandbox boundary:agent.py
Guardrails
Agents in production run autonomously, which means they can loop indefinitely, hit rate limits, or process user data that contains sensitive information. Deep Agents support middleware that wraps model and tool calls to handle these concerns.Rate limiting
Rate limiting here refers to capping the agent’s own LLM and tool usage within a run, not API gateway rate limiting for incoming requests. Without limits, a confused agent can burn through your LLM API budget in minutes by looping on the same tool call or making hundreds of model calls. Set caps on both model calls and tool executions per run:run_limit to cap calls within a single invocation (resets each turn). Use thread_limit to cap calls across an entire conversation (requires a checkpointer). See ModelCallLimitMiddleware and ToolCallLimitMiddleware for the full configuration.
Handling errors
Not all errors should be handled the same way. Transient failures (network timeouts, rate limits) should be retried automatically. Errors the LLM can recover from (bad tool output, parsing failures) should be fed back to the model. Errors that need human input should pause the agent. For the full breakdown with code examples, see Handle errors appropriately. Middleware handles the transient case. Model calls and tool calls each have their own retry middleware with exponential backoff. If your primary model provider goes down entirely, the fallback middleware switches to an alternative:read_file that fails won’t benefit from a retry, but a web search that times out probably will. See ModelRetryMiddleware and ModelFallbackMiddleware for the full configuration.
Data privacy
If your agent processes user input that might contain emails, credit card numbers, or other PII, you can detect and handle it before it reaches the model or gets stored in logs:redact (replace with [REDACTED_EMAIL]), mask (partial masking like ****-****-****-1234), hash (deterministic hash), and block (raise an error). You can also write custom detectors for domain-specific patterns. See PIIMiddleware for the full configuration.
For the complete list of available middleware, see prebuilt middleware.
Frontend
Deep Agents useuseStream to connect your UI to the agent backend. useStream is a frontend hook (available for React, Vue, Svelte, and Angular) that streams messages, subagent progress, and custom state from your agent in real time.
Locally, useStream points at http://localhost:2024. In production, point it at your LangSmith Deployment and configure reconnection so users don’t lose progress if their connection drops.
reconnectOnMount picks up an in-progress run automatically. If a user refreshes while the agent is working, they’ll see it continue rather than a blank screen. fetchStateHistory loads the full conversation history for the thread, so returning users see previous messages.
For deep agent workflows that spawn many subagents, set a high recursionLimit when submitting to avoid cutting off long-running executions:
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

