Harness capabilities

An agent harness is a combination of several different capabilities that make building long-running agents easier:

Planning capabilities
Virtual filesystem
Task delegation (subagents)
Context and token management
Code execution
Human-in-the-loop

Alongside these capabilities, deep agents use Skills and Memory for additional context and instructions.

Planning capabilities

The harness provides a write_todos tool that agents can use to maintain a structured task list. Features:

Track multiple tasks with statuses ('pending', 'in_progress', 'completed')
Persisted in agent state
Helps agent organize complex multi-step work
Useful for long-running tasks and planning

Virtual filesystem access

The harness provides a configurable virtual filesystem which can be backed by different pluggable backends. The backends support the following file system operations:

Tool	Description
`ls`	List files in a directory with metadata (size, modified time)
`read_file`	Read file contents with line numbers, supports offset/limit for large files. Also supports reading images (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`), returning them as multimodal content blocks.
`write_file`	Create new files
`edit_file`	Perform exact string replacements in files (with global replace mode)
`glob`	Find files matching patterns (e.g., `*/.py`)
`grep`	Search file contents with multiple output modes (files only, content with context, or counts)
`execute`	Run shell commands in the environment (available with sandbox backends only)

The virtual filesystem is used by several other harness capabilities such as skills, memory, code execution, and context management. You can also use the file system when building custom tools and middleware for deep agents. For more information, see backends.

Task delegation (subagents)

The harness allows the main agent to create ephemeral “subagents” for isolated multi-step tasks. Why it’s useful:

Context isolation - Subagent’s work doesn’t clutter main agent’s context
Parallel execution - Multiple subagents can run concurrently
Specialization - Subagents can have different tools/configurations
Token efficiency - Large subtask context is compressed into a single result

How it works:

Main agent has a task tool
When invoked, it creates a fresh agent instance with its own context
Subagent executes autonomously until completion
Returns a single final report to the main agent
Subagents are stateless (can’t send multiple messages back)

Default subagent:

“general-purpose” subagent automatically available
Has filesystem tools by default
Can be customized with additional tools/middleware

Custom subagents:

Define specialized subagents with specific tools
Example: code-reviewer, web-researcher, test-runner
Configure via subagents parameter

Context management

Deep agents can handle long-running tasks by making use of effective context management. Agents have access to several kinds of context. Some sources are provided to the agent at startup; others become available during runtime, such as user input. This section provides an overview of the different kinds of context your deep agent has access to and manages.

Input context

Input context consists of sources of information provided to your deep agent at startup that are added to the prompt.

Prompts

Deep agents use system prompts to define the agent’s role, behavior, capabilities, and knowledge base. If you provide a custom system prompt, this gets prepended to the built-in system prompt which includes detailed guidance for using built-in tools like the planning tool, filesystem tools, and subagents. Middleware that adds tools (such as the filesystem middleware) automatically appends tool-specific instructions to the system prompt, creating tool prompts that explain how to use those tools effectively. The final deep agent prompt consists of the following parts:

Custom system_prompt (if provided)
Base agent prompt
To-do list prompt: Instructions for how to plan with to do lists
Memory prompt: AGENTS.md + memory usage guidelines (only when memory provided)
Skills prompt: Skills locations + list of skills with frontmatter information + usage (only when skills provided)
Virtual filesystem prompt (filesystem + execute tool docs if applicable)
Subagent prompt: Task tool usage
User-provided middleware prompts (if custom middleware is provided)
Human-in-the-loop prompt (when interrupt_on is set)
Local context prompt: Current directory, project info,… (when using CLI locally)

Runtime context

Deep agents use a pattern called context compression which works by reducing the size of the information in an agent’s working memory while preserving the details that are relevant to the task. The following techniques are the built-in features to ensure the context passed to LLMs stays within its context window limit:

Offloading large tool inputs and results
Summarization

You can also configure deep agents to use long-term memory to allow them to store information across different threads and conversations.

Offloading large tool inputs and results

Deep agents use the built-in filesystem tools to automatically offload content and to search and retrieve that offloaded content as needed. Content offloading happens in two cases:

Tool call inputs exceed 20,000 tokens (configurable via tool_token_limit_before_evict): File write and edit operations leave behind tool calls containing the complete file content in the agent’s conversation history. Since this content is already persisted to the filesystem, it’s often redundant. As the session context crosses 85% of the model’s available window, Deep agents will truncate older tool calls, replacing them with a pointer to the file on disk and reducing the size of the active context.
Tool call results exceed 20,000 tokens (configurable via tool_token_limit_before_evict): When this occurs, the deep agent offloads the response to the configured backend and substitutes it with a file path reference and a preview of the first 10 lines. Agents can then re-read or search the content as needed.

Summarization

When the context size crosses the model’s context window limit (for example 85% of max_input_tokens), and there is no more context eligible for offloading, the deep agent summarizes the message history. This process has two components:

In-context summary: An LLM generates a structured summary of the conversation—including session intent, artifacts created, and next steps—which replaces the full conversation history in the agent’s working memory.
Filesystem preservation: The complete, original conversation messages are written to the filesystem as a canonical record.

This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover specific details when needed (via filesystem search).

An example of summarization showing an agent's conversation history, where several steps get compacted

Configuration:

Triggers at 85% of the model’s max_input_tokens from its model profile
Keeps 10% of tokens as recent context
Falls back to 170,000-token trigger / 6 messages kept if model profile is unavailable
If any model call raises ContextOverflowError, Deep Agents immediately falls back to summarization and retries with summary + recent preserved messages
Older messages are summarized by the model

Why it’s useful:

Enables very long conversations without hitting context limits
Preserves recent context while compressing ancient history
Transparent to the agent (appears as a special system message)

Long-term memory

When using the default filesystem, your deep agent stores its working memory files in agent state, which only persists within a single thread. Long-term memory enables your deep agent to persist information across different threads and conversations. To use long-term memory, you must use a CompositeBackend that routes specific paths (typically /memories/) to a LangGraph Store, which provides durable cross-thread persistence. The CompositeBackend is a hybrid storage system where some files persist indefinitely while others remain scoped to a single thread. Files that the agent stores in the long-term memory path (for example, /memories/preferences.txt) survive agent restarts and can be accessed from any conversation thread. Deep agents can use these files for storing user preferences, accumulated knowledge, research progress, or any information that should persist beyond a single session. For more information, see Long-term memory.

Code execution

When you use a sandbox backend, the harness exposes an execute tool that lets the agent run shell commands in an isolated environment. This enables the agent to install dependencies, run scripts, and execute code as part of its task. How it works:

Sandbox backends implement the SandboxBackendProtocol — when detected, the harness adds the execute tool to the agent’s available tools
Without a sandbox backend, the agent only has filesystem tools (read_file, write_file, etc.) and cannot run commands
The execute tool returns combined stdout/stderr, exit code, and truncates large outputs (saving to a file for the agent to read incrementally)

Why it’s useful:

Security — Code runs in isolation, protecting your host system from the agent’s operations
Clean environments — Use specific dependencies or OS configurations without local setup
Reproducibility — Consistent execution environments across teams

For setup, providers, and file transfer APIs, see Sandboxes.

Human-in-the-loop

The harness can pause agent execution at specified tool calls to allow human approval or modification. This feature is opt-in via the interrupt_on parameter. Configuration:

Pass interrupt_on to create_deep_agent with a mapping of tool names to interrupt configurations
Example: interrupt_on={"edit_file": True} pauses before every edit
You can provide approval messages or modify tool inputs when prompted

Why it’s useful:

Safety gates for destructive operations
User verification before expensive API calls
Interactive debugging and guidance

Skills

The harness supports skills that provide specialized workflows and domain knowledge to your deep agent. How it works:

Skills follow the Agent Skills standard
Each skill is a directory containing a SKILL.md file with instructions and metadata
Skills can include additional scripts, reference docs, templates, and other resources
Skills use progressive disclosure—they are only loaded when the agent determines they’re useful for the current task
Agent reads frontmatter from each SKILL.md file at startup, then reviews full skill content when needed

Why it’s useful:

Reduces token usage by only loading relevant skills when needed
Bundles capabilities together into larger actions with additional context
Provides specialized expertise without cluttering the system prompt
Enables modular, reusable agent capabilities

For more information, see Skills.

Memory

The harness supports persistent memory files that provide extra context to your deep agent across conversations. These files often contain general coding style, preferences, conventions, and guidelines that help the agent understand how to work with your codebase and follow your preferences. How it works:

Uses AGENTS.md files to provide persistent context
Memory files are always loaded (unlike skills, which use progressive disclosure)
Pass one or more file paths to the memory parameter when creating your agent
Files are stored in the agent’s backend (StateBackend, StoreBackend, or FilesystemBackend)
The agent can update memory based on your interactions, feedback, and identified patterns

Why it’s useful:

Provides persistent context that doesn’t need to be re-specified each conversation
Useful for storing user preferences, project guidelines, or domain knowledge
Always available to the agent, ensuring consistent behavior

For configuration details and examples, see Memory.

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Get started

Core capabilities

Streaming

Deep Agents CLI

Planning capabilities

Virtual filesystem access

Task delegation (subagents)