Skip to main content
This guide covers considerations for taking a Deep Agent from a local prototype to a production deployment. It walks through scoping memory, configuring execution environments, adding guardrails, and connecting a frontend.

Overview

Agents use information from memory and their execution environment to accomplish tasks. In production, there are a few primitives that determine how information is shared and accessed:
  • Thread: a single conversation. Message history and scratch files are scoped to the thread by default and don’t carry over.
  • User: someone interacting with your agent. Memory and files can be private to a user or shared across users. Identity and authorization comes from your auth layer.
  • Assistant: a configured agent instance. Memory and files can be tied to one assistant or shared across all of them.
This page covers:

LangSmith Deployments

The fastest way to get a Deep Agent into production is LangSmith Deployments. It provisions the infrastructure your agent needs: assistants, threads, runs, a store, and a checkpointer, so you don’t have to set these up yourself. It also gives you authentication, webhooks, cron jobs, and observability out of the box, and can expose your agent via MCP or A2A. For setup instructions, see the LangSmith Deployments quickstart. All code snippets on this page use the following langgraph.json unless otherwise specified:
langgraph.json
{
  "dependencies": ["."],
  "graphs": {
    "agent": "./src/agent.ts:agent"
  },
  "env": ".env"
}
langgraph.json is the configuration file that tells the LangGraph platform how to build and run your application. It lives at the root of your project and is required for both local development (with langgraph dev) and production deployment. The key fields are:
FieldDescription
dependenciesPackages to install. ["."] installs the current directory as a package (reads from requirements.txt, pyproject.toml, or package.json).
graphsMaps graph IDs to their code locations. Each entry is "<id>": "./<file>:<variable>", where <id> is the name you use to invoke the graph via the API, and <variable> is the compiled graph or constructor function exported from <file>.
envPath to a .env file with environment variables (API keys, secrets). These are set at build time and available at runtime.
For the full set of configuration options (custom Docker steps, store indexing, auth handlers, and more), see application structure.

Production considerations

Multi-tenancy

When your agent serves multiple users, you need to verify who each user is and control what they can access. LangSmith Deployments supports custom authentication to establish user identity and auth handlers to control access to threads and assistants. See Agent Auth to pass end-user credentials through to the agent for authenticated calls on the user’s behalf. How you scope memory and execution environments determines what information is shared between users. See the sections below for details.

Async

LLM-based applications are heavily I/O-bound: calling language models, databases, and external services. Async programming lets these operations run concurrently instead of blocking, improving throughput and responsiveness.
LangChain follows the convention of prefixing a to async method names (e.g., ainvoke, abefore_agent, astream). Sync and async variants live in the same class or namespace.
When building for production:
  • Create async tools. LangChain runs sync tools in a separate thread to avoid blocking, but native async avoids the threading overhead entirely.
  • Use async middleware methods. Custom middleware should implement async hooks (e.g., abefore_agent instead of before_agent).
  • Use async for external resource lifecycle. Creating sandboxes or connecting to MCP servers involves network calls and should be awaited. This is why graph factories that provision these resources are async.

Durability

Deep Agents run on LangGraph, which provides durable execution out of the box. The persistence layer checkpoints state at each step, so a run interrupted by a failure, timeout, or human-in-the-loop pause resumes from its last recorded state without reprocessing previous steps. For long-running deep agents that spawn many subagents, this means a mid-run failure doesn’t lose completed work. Checkpointing also enables:
  • Indefinite interrupts. Human-in-the-loop workflows can pause for minutes or days and resume exactly where they left off.
  • Time travel. Every checkpointed step is a snapshot you can rewind to, letting you replay from an earlier state if something goes wrong.
  • Safe handling of sensitive operations. For workflows involving payments or other irreversible actions, checkpoints provide an audit trail and a recovery point to inspect the exact state that led to an action.
LangSmith Deployments configure a persistent checkpointer automatically. If you are self-hosting, see persistence for setup instructions.

Memory

Without memory, every conversation starts from scratch. Memory lets your agent retain information across conversations (user preferences, learned instructions, past experiences) so it can personalize its behavior over time. For an overview of memory types, see the memory concepts guide.

Scoping

Memory is always persistent across conversations. The main question is how it’s scoped across user and assistant boundaries. The right scope depends on who should see and modify the data: The examples below are based on an organization running two assistants: an email writing assistant and a social media drafting assistant, serving multiple users.
ScopeNamespaceUse caseExample
Private(assistant_id, user_id)Per-user preferences within one assistant”Sign my emails ‘Best, Alex’” for the email assistant only
Assistant(assistant_id)Shared instructions for one assistant”Cap posts at 280 characters” for the social media assistant
User(user_id)Global user profile across all assistantsUser’s preferred language is Spanish
Organization()Read-only policies for all users and assistants”Never disclose internal pricing”
Shared memory (assistant, user, or organization scope) is a vector for prompt injection. If one user can write to memory that another user’s conversation reads, a malicious user could inject instructions into that shared state. Enforce read-only access where appropriate. For example, make organization-wide policies writable only through application code, not by the agent itself.

Configuration

In Deep Agents, memory is stored as files in a virtual filesystem. By default, files only last for a single conversation. To persist them, route a path like /memories/ to a StoreBackend that writes to the LangGraph Store. Use a CompositeBackend to give the agent both ephemeral scratch space and persistent long-term memory.
Namespace by (assistant_id, user_id). Each user gets private memory within a given assistant.
src/agent.ts
import { getConfig } from "@langchain/langgraph";
import { createDeepAgent, CompositeBackend, StateBackend, StoreBackend } from "deepagents";

const agent = createDeepAgent({
  backend: (rt) => new CompositeBackend(
    new StateBackend(rt),
    {
      "/memories/": new StoreBackend(rt, {
        namespace: (ctx) => {
          const config = getConfig();
          return [config.metadata.assistantId, ctx.runtime.context.userId];
        },
      }),
    },
  ),
  systemPrompt: `You have persistent memory at /memories/.

  Read /memories/instructions.txt at the start of each conversation for
  accumulated knowledge and preferences. When you learn something that
  should persist, update that file.`,
});

export { agent };
You can also read and write to the store from your application code using the Store API. See accessing memories from external code for examples. For the full namespace factory API, see namespace factories. For memory patterns like self-improving instructions and knowledge bases, see long-term memory.

Execution environment

Locally, agents can read and write files on disk and run shell commands directly. In production, you need to think about isolation and persistence. The right setup depends on whether your agent needs to execute code:
  • Filesystem backends are enough if your agent only reads and writes files. Choose a backend that matches your persistence needs: ephemeral scratch space, persistent storage, or a mix of both.
  • Sandboxes add an isolated container with an execute tool for running shell commands. Use a sandbox if your agent needs to run code, install packages, or do anything beyond file I/O.

Filesystem

Choose a backend based on what needs to persist:
  • StateBackend (default): ephemeral scratch space, scoped to a single conversation. Checkpointed at every step, so avoid writing large files.
  • StoreBackend: persistent storage that survives across conversations. Scope with a namespace factory.
  • CompositeBackend: mix both. Ephemeral scratch space by default with persistent routes for specific paths like /memories/.
For the full list of backends and how to build custom ones, see backends.
FilesystemBackend and LocalShellBackend access the host directly. Don’t use them in deployed agents.

Sandboxes

If your agent needs to run code (not just read and write files), use a sandbox. Sandboxes provide both a filesystem and an execute tool for running shell commands, all inside an isolated container. This isolation also protects your host: if the agent’s code exhausts memory or crashes, only the sandbox is affected. Your server keeps running.

Lifecycle

The key decision is how long a sandbox lives. Does each conversation get a fresh one, or do conversations share a persistent environment?
ScopeSandbox ID stored onLifecycleExample use case
Thread-scopedThread metadataFresh per conversation, cleaned up on TTLA data analysis bot where each conversation starts clean
Assistant-scopedAssistant configShared across all conversationsA coding assistant that maintains a cloned repo across conversations
The examples below use an async graph factory instead of a static graph because the sandbox needs the thread_id or assistant_id from the runtime config to look up or create the correct sandbox. A graph factory receives the config on each run, so it can resolve the sandbox before building the agent. The factory is async because sandbox creation is an I/O-bound operation that requires runtime information like thread_id or assistant_id that is only available at invocation time.
Each conversation gets its own sandbox. The graph factory reads thread_id from the config, so each thread automatically gets its own isolated environment. The provider’s label-based lookup handles deduplication across runs. Cleaned up when the sandbox TTL expires.
src/agent.ts
import { Daytona } from "@daytonaio/sdk";
import { DaytonaSandbox } from "@langchain/daytona";
import { createDeepAgent } from "deepagents";
import type { RunnableConfig } from "@langchain/core/runnables";

const client = new Daytona();

export async function agent(config: RunnableConfig) {
  const threadId = config.configurable!.thread_id;
  let sandbox;
  try {
    sandbox = await client.findOne({ labels: { thread_id: threadId } });
  } catch {
    sandbox = await client.create({
      labels: { thread_id: threadId },
      autoDeleteInterval: 3600, // TTL: clean up when idle
    });
  }
  return createDeepAgent({ backend: await DaytonaSandbox.fromId(sandbox.id) });
}
Because the agent variable is an async function (not a compiled graph), the server treats it as a graph factory and calls it on each run, injecting the config. The factory looks up or creates the sandbox via the provider’s label-based search and returns a fresh agent graph wired to that sandbox. Once deployed with langgraph deploy, invoke the agent from your application code using the SDK. The client-side code is the same regardless of scope. The scoping is handled entirely in the agent factory above, but the behavior differs:
Each thread gets its own sandbox. Follow-up messages within the same thread reuse the same sandbox, but a new thread always starts fresh with no leftover files or installed packages from previous conversations.
client.ts
import { Client } from "@langchain/langgraph-sdk";

const client = new Client({ apiUrl: "<DEPLOYMENT_URL>", apiKey: "<LANGSMITH_API_KEY>" });

// Conversation 1: install pandas and analyze data
const thread1 = await client.threads.create();
for await (const chunk of client.runs.stream(
  thread1.thread_id,
  "agent",
  { input: { messages: [{ role: "human", content: "Install pandas and analyze sales_data.csv" }] } },
)) {
  console.log(chunk.data);
}

// Follow-up in the same conversation — pandas is still installed
for await (const chunk of client.runs.stream(
  thread1.thread_id,
  "agent",
  { input: { messages: [{ role: "human", content: "Now plot the results" }] } },
)) {
  console.log(chunk.data);
}

// Conversation 2: fresh sandbox — pandas is NOT installed, no files from conversation 1
const thread2 = await client.threads.create();
for await (const chunk of client.runs.stream(
  thread2.thread_id,
  "agent",
  { input: { messages: [{ role: "human", content: "What packages are installed?" }] } },
)) {
  console.log(chunk.data);
}

File transfers

Sandboxes are isolated containers, so your application code can’t directly access files inside them. Use upload_files() and download_files() to move data across the sandbox boundary:
  • Seed the sandbox before the agent runs: upload user files, skill scripts, configuration, or persistent memories so the agent has what it needs from the start
  • Retrieve results after the agent finishes: download generated artifacts (reports, plots, exports) and sync updated memories back for future conversations
For provider-specific file transfer examples, see working with files. For provider setup, security, and lifecycle patterns, see the full sandboxes guide.
Skill scripts that the agent needs to execute must be uploaded into the sandbox before the agent runs. You may also want to sync memories so the agent can read and update them inside the container. Use custom middleware with before_agent and after_agent hooks to move files across the sandbox boundary:
src/agent.ts
import { createMiddleware } from "langchain";
import { createDeepAgent, CompositeBackend, StoreBackend } from "deepagents";
import { DaytonaSandbox } from "@langchain/daytona";

function safeFilename(key: string): string {
  const name = key.split("/").pop()!;
  if (name.includes("..") || /[*?]/.test(name)) {
    throw new Error(`Invalid key: ${key}`);
  }
  return name;
}

const createSandboxSyncMiddleware = (backend: CompositeBackend) => {
  return createMiddleware({
    name: "SandboxSyncMiddleware",
    beforeAgent: async (state, runtime) => {
      // Upload skill scripts and memories into the sandbox
      const userId = runtime.context.userId;
      const store = runtime.store;
      const encoder = new TextEncoder();
      const files: [string, Uint8Array][] = [];
      for (const item of await store.search(["skills", userId])) {
        const name = safeFilename(item.key);
        files.push([`/skills/${name}`, encoder.encode(item.value.content)]);
      }
      for (const item of await store.search(["memories", userId])) {
        const name = safeFilename(item.key);
        files.push([`/memories/${name}`, encoder.encode(item.value.content)]);
      }
      if (files.length > 0) {
        await backend.uploadFiles(files);
      }
    },
    afterAgent: async (state, runtime) => {
      // Sync updated memories back to the store
      const userId = runtime.context.userId;
      const store = runtime.store;
      const items = await store.search(["memories", userId]);
      const results = await backend.downloadFiles(
        items.map((item) => `/memories/${item.key}`),
      );
      const decoder = new TextDecoder();
      for (const result of results) {
        if (result.content) {
          await store.put(
            ["memories", userId],
            result.path.split("/").pop()!,
            { content: decoder.decode(result.content) },
          );
        }
      }
    },
  });
};

const backend = new CompositeBackend(
  await DaytonaSandbox.fromId(sandbox.id),
  {
    "/skills/": new StoreBackend(rt, {
      namespace: (ctx) => ["skills", ctx.runtime.context.userId],
    }),
    "/memories/": new StoreBackend(rt, {
      namespace: (ctx) => ["memories", ctx.runtime.context.userId],
    }),
  },
);

const agent = createDeepAgent({
  backend,
  middleware: [createSandboxSyncMiddleware(backend)],
});

export { agent };

Guardrails

Agents in production run autonomously, which means they can loop indefinitely, hit rate limits, or process user data that contains sensitive information. Deep Agents support middleware that wraps model and tool calls to handle these concerns.

Rate limiting

Rate limiting here refers to capping the agent’s own LLM and tool usage within a run, not API gateway rate limiting for incoming requests. Without limits, a confused agent can burn through your LLM API budget in minutes by looping on the same tool call or making hundreds of model calls. Set caps on both model calls and tool executions per run:
import { createAgent, modelCallLimitMiddleware, toolCallLimitMiddleware } from "langchain";

const agent = createAgent({
  model: "claude-sonnet-4-6",
  middleware: [
    modelCallLimitMiddleware({ runLimit: 50 }),
    toolCallLimitMiddleware({ runLimit: 200 }),
  ],
});
Use run_limit to cap calls within a single invocation (resets each turn). Use thread_limit to cap calls across an entire conversation (requires a checkpointer). See ModelCallLimitMiddleware and ToolCallLimitMiddleware for the full configuration.

Handling errors

Not all errors should be handled the same way. Transient failures (network timeouts, rate limits) should be retried automatically. Errors the LLM can recover from (bad tool output, parsing failures) should be fed back to the model. Errors that need human input should pause the agent. For the full breakdown with code examples, see Handle errors appropriately. Middleware handles the transient case. Model calls and tool calls each have their own retry middleware with exponential backoff. If your primary model provider goes down entirely, the fallback middleware switches to an alternative:
import {
  createAgent,
  modelFallbackMiddleware,
  modelRetryMiddleware,
  toolRetryMiddleware,
} from "langchain";

const agent = createAgent({
  model: "claude-sonnet-4-6",
  middleware: [
    // Retry model calls on rate limits, timeouts, and 5xx errors
    modelRetryMiddleware({ maxRetries: 3, backoffFactor: 2.0, initialDelayMs: 1000 }),
    // If the primary model is fully down, fall back to an alternative
    modelFallbackMiddleware("gpt-4.1"),
    // Retry specific tools that hit external APIs (not all tools)
    toolRetryMiddleware({
      maxRetries: 2,
      tools: ["search", "fetch_url"],
      retryOn: [TimeoutError, TypeError],
    }),
  ],
});
Scope ToolRetryMiddleware to specific tools rather than retrying everything. A filesystem read_file that fails won’t benefit from a retry, but a web search that times out probably will. See ModelRetryMiddleware and ModelFallbackMiddleware for the full configuration.

Data privacy

If your agent processes user input that might contain emails, credit card numbers, or other PII, you can detect and handle it before it reaches the model or gets stored in logs:
import { createAgent, piiMiddleware } from "langchain";

const agent = createAgent({
  model: "claude-sonnet-4-6",
  middleware: [
    piiMiddleware("email", { strategy: "redact", applyToInput: true }),
    piiMiddleware("credit_card", { strategy: "mask", applyToInput: true }),
  ],
});
Strategies include redact (replace with [REDACTED_EMAIL]), mask (partial masking like ****-****-****-1234), hash (deterministic hash), and block (raise an error). You can also write custom detectors for domain-specific patterns. See @[PIIMiddleware] for the full configuration. For the complete list of available middleware, see prebuilt middleware.

Frontend

Deep Agents use useStream to connect your UI to the agent backend. useStream is a frontend hook (available for React, Vue, Svelte, and Angular) that streams messages, subagent progress, and custom state from your agent in real time. Locally, useStream points at http://localhost:2024. In production, point it at your LangSmith Deployment and configure reconnection so users don’t lose progress if their connection drops.
import { useStream } from "@langchain/react";

function App() {
  const stream = useStream<typeof agent>({
    apiUrl: "https://your-deployment.langsmith.dev",
    assistantId: "agent",
    reconnectOnMount: true,    // Resume stream after page refresh or navigation
    fetchStateHistory: true,   // Load full thread history on mount
  });
}
reconnectOnMount picks up an in-progress run automatically. If a user refreshes while the agent is working, they’ll see it continue rather than a blank screen. fetchStateHistory loads the full conversation history for the thread, so returning users see previous messages. For deep agent workflows that spawn many subagents, set a high recursionLimit when submitting to avoid cutting off long-running executions:
stream.submit(
  { messages: [{ type: "human", content: text }] },
  {
    streamSubgraphs: true,
    config: { recursionLimit: 10000 },
  },
);
For UI patterns specific to deep agents, such as subagent cards, todo lists, and custom state rendering, see the frontend guide.