- Planning capabilities
- Virtual filesystem
- Task delegation (subagents)
- Context and token management
- Code execution
- Human-in-the-loop
Planning capabilities
The harness provides awrite_todos tool that agents can use to maintain a structured task list.
Features:
- Track multiple tasks with statuses (
'pending','in_progress','completed') - Persisted in agent state
- Helps agent organize complex multi-step work
- Useful for long-running tasks and planning
Virtual filesystem access
The harness provides a configurable virtual filesystem which can be backed by different pluggable backends. The backends support the following file system operations:| Tool | Description |
|---|---|
ls | List files in a directory with metadata (size, modified time) |
read_file | Read file contents with line numbers, supports offset/limit for large files. Also supports reading images (.png, .jpg, .jpeg, .gif, .webp), returning them as multimodal content blocks. |
write_file | Create new files |
edit_file | Perform exact string replacements in files (with global replace mode) |
glob | Find files matching patterns (e.g., **/*.py) |
grep | Search file contents with multiple output modes (files only, content with context, or counts) |
execute | Run shell commands in the environment (available with sandbox backends only) |
Task delegation (subagents)
The harness allows the main agent to create ephemeral “subagents” for isolated multi-step tasks. Why it’s useful:- Context isolation - Subagent’s work doesn’t clutter main agent’s context
- Parallel execution - Multiple subagents can run concurrently
- Specialization - Subagents can have different tools/configurations
- Token efficiency - Large subtask context is compressed into a single result
- Main agent has a
tasktool - When invoked, it creates a fresh agent instance with its own context
- Subagent executes autonomously until completion
- Returns a single final report to the main agent
- Subagents are stateless (can’t send multiple messages back)
- “general-purpose” subagent automatically available
- Has filesystem tools by default
- Can be customized with additional tools/middleware
- Define specialized subagents with specific tools
- Example: code-reviewer, web-researcher, test-runner
- Configure via
subagentsparameter
Context management
The harness manages context so deep agents can handle long-running tasks within token limits while retaining the information they need. How it works:- Input context — System prompt, memory, skills, and tool prompts shape what the agent knows at startup
- Compression — Built-in offloading and summarization keep context within window limits as tasks progress
- Isolation — Subagents quarantine heavy work and return only results (see Task delegation)
- Long-term memory — Persistent storage across threads via the virtual filesystem
- Enables multi-step tasks that exceed a single context window
- Keeps the most relevant information in scope without manual trimming
- Reduces token usage through automatic summarization and offloading
Code execution
When you use a sandbox backend, the harness exposes anexecute tool that lets the agent run shell commands in an isolated environment. This enables the agent to install dependencies, run scripts, and execute code as part of its task.
How it works:
- Sandbox backends implement
SandboxBackendProtocol— when detected, the harness adds theexecutetool to the agent’s available tools - Without a sandbox backend, the agent only has filesystem tools (
read_file,write_file, etc.) and cannot run commands - The
executetool returns combined stdout/stderr, exit code, and truncates large outputs (saving to a file for the agent to read incrementally)
- Security — Code runs in isolation, protecting your host system from the agent’s operations
- Clean environments — Use specific dependencies or OS configurations without local setup
- Reproducibility — Consistent execution environments across teams
Human-in-the-loop
The harness can pause agent execution at specified tool calls to allow human approval or modification. This feature is opt-in via theinterrupt_on parameter.
Configuration:
- Pass
interrupt_ontocreate_deep_agentwith a mapping of tool names to interrupt configurations - Example:
interrupt_on={"edit_file": True}pauses before every edit - You can provide approval messages or modify tool inputs when prompted
- Safety gates for destructive operations
- User verification before expensive API calls
- Interactive debugging and guidance
Skills
The harness supports skills that provide specialized workflows and domain knowledge to your deep agent. How it works:- Skills follow the Agent Skills standard
- Each skill is a directory containing a
SKILL.mdfile with instructions and metadata - Skills can include additional scripts, reference docs, templates, and other resources
- Skills use progressive disclosure—they are only loaded when the agent determines they’re useful for the current task
- Agent reads frontmatter from each
SKILL.mdfile at startup, then reviews full skill content when needed
- Reduces token usage by only loading relevant skills when needed
- Bundles capabilities together into larger actions with additional context
- Provides specialized expertise without cluttering the system prompt
- Enables modular, reusable agent capabilities
Memory
The harness supports persistent memory files that provide extra context to your deep agent across conversations. These files often contain general coding style, preferences, conventions, and guidelines that help the agent understand how to work with your codebase and follow your preferences. How it works:- Uses
AGENTS.mdfiles to provide persistent context - Memory files are always loaded (unlike skills, which use progressive disclosure)
- Pass one or more file paths to the
memoryparameter when creating your agent - Files are stored in the agent’s backend (StateBackend, StoreBackend, or FilesystemBackend)
- The agent can update memory based on your interactions, feedback, and identified patterns
- Provides persistent context that doesn’t need to be re-specified each conversation
- Useful for storing user preferences, project guidelines, or domain knowledge
- Always available to the agent, ensuring consistent behavior
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

