Memory - Docs by LangChain

Memory lets your agent learn and improve across conversations. Deep Agents makes memory first class with filesystem-backed memory: the agent reads and writes memory as files, and you control where those files are stored using backends.

This page covers long-term memory: memory that persists across conversations. For short-term memory (conversation history and scratch files within a single session), see the context engineering guide. Short-term memory is managed automatically as part of the agent’s state.

Short-term memory is scoped to a single thread via checkpoints; long-term memory persists across threads via the store

How memory works

Point the agent at memory files. Pass file paths to memory= when creating the agent. You can also pass skills via skills= for procedural memory (reusable instructions that tell the agent how to perform a task). A backend controls where files are stored and who can access them.
Agent reads memory. The agent can load memory files into the system prompt at startup, or read them on demand during the conversation. For example, skills use on-demand loading: the agent reads only skill descriptions at startup, then reads the full skill file only when it matches a task. This keeps context lean until a capability is needed.
Agent updates memory (optional). When the agent learns new information, it can use its built-in edit_file tool to update memory files. Updates can happen during the conversation (the default) or in the background between conversations via background consolidation. Changes are persisted and available in the next conversation. Not all memory is writable: developer-defined skills and organization policies are typically read-only. See read-only vs writable memory for details.

The two most common patterns are agent-scoped memory (shared across all users) and user-scoped memory (isolated per user).

Scoped memory

Agent memory can be scoped so the same memory files are accessible to everyone using the agent or memory files can be individual to each user.

Agent-scoped memory

Give the agent its own persistent identity that evolves over time. Agent-scoped memory is shared across all users, so the agent builds up its own persona, accumulated knowledge, and learned preferences through every conversation. As it interacts with users, it develops expertise, refines its approach, and remembers what works. It can also learn and update skills when it has write access. The key is the backend namespace: setting it to (assistant_id,) means every conversation for this agent reads and writes to the same memory file.

Accessing rt.server_info requires deepagents>=0.5.0. On older versions, read the assistant ID from get_config()["metadata"]["assistant_id"] instead.

from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.5-flash",
    memory=["/memories/AGENTS.md"],
    skills=["/skills/"],
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": StoreBackend(
                namespace=lambda rt: (
                    rt.server_info.assistant_id,
                ),
            ),
            "/skills/": StoreBackend(
                namespace=lambda rt: (
                    rt.server_info.assistant_id,
                ),
            ),
        },
    ),
)

Full example: seed memory and invoke

Populate the store with initial memories, then invoke the agent across two threads to see it remember and update what it learns.

from langchain_core.utils.uuid import uuid7

from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
from deepagents.backends.utils import create_file_data
from langgraph.store.memory import InMemoryStore

store = InMemoryStore()  # Use platform store when deploying to LangSmith

# Seed the memory file
store.put(
    ("my-agent",),
    "/memories/AGENTS.md",
    create_file_data("""## Response style
- Keep responses concise
- Use code examples where possible
"""),
)

# Seed a skill
store.put(
    ("my-agent",),
    "/skills/langgraph-docs/SKILL.md",
    create_file_data("""---
name: langgraph-docs
description: Fetch relevant LangGraph documentation to provide accurate guidance.
---

# langgraph-docs

Use the fetch_url tool to read https://docs.langchain.com/llms.txt, then fetch relevant pages.
"""),
)

agent = create_deep_agent(
    model="google_genai:gemini-3.5-flash",
    memory=["/memories/AGENTS.md"],
    skills=["/skills/"],
    backend=lambda rt: CompositeBackend(
        default=StateBackend(rt),
        routes={
            "/memories/": StoreBackend(
                rt, namespace=lambda rt: ("my-agent",)
            ),
            "/skills/": StoreBackend(
                rt, namespace=lambda rt: ("my-agent",)
            ),
        },
    ),
    store=store,
)

# Thread 1: the agent learns a new preference and saves it to memory
config1 = {"configurable": {"thread_id": str(uuid7())}}
agent.invoke(
    {"messages": [{"role": "user", "content": "I prefer detailed explanations. Remember that."}]},
    config=config1,
)

# Thread 2: the agent reads memory and applies the preference
config2 = {"configurable": {"thread_id": str(uuid7())}}
agent.invoke(
    {"messages": [{"role": "user", "content": "Explain how transformers work."}]},
    config=config2,
)

User-scoped memory

Give each user their own memory file. The agent remembers preferences, context, and history per user while core agent instructions stay fixed. Users can also have per-user skills if stored in a user-scoped backend. The namespace uses (user_id,) so each user gets an isolated copy of the memory file. User A’s preferences never leak into User B’s conversations.

from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.5-flash",
    memory=["/memories/preferences.md"],
    skills=["/skills/"],
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": StoreBackend(
                namespace=lambda rt: (rt.server_info.user.identity,),
            ),
            "/skills/": StoreBackend(
                namespace=lambda rt: (rt.server_info.user.identity,),
            ),
        },
    ),
)

Full example: isolated memory across users

Seed per-user memories and invoke the agent as two different users. Each user sees only their own preferences.

from langchain_core.utils.uuid import uuid7

from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend
from deepagents.backends.utils import create_file_data
from langgraph.store.memory import InMemoryStore


store = InMemoryStore()  # Use platform store when deploying to LangSmith

# Seed preferences for two users
store.put(
    ("user-alice",),
    "/memories/preferences.md",
    create_file_data("""## Preferences
- Likes concise bullet points
- Prefers Python examples
"""),
)
store.put(
    ("user-bob",),
    "/memories/preferences.md",
    create_file_data("""## Preferences
- Likes detailed explanations
- Prefers TypeScript examples
"""),
)

# Seed a skill for Alice
store.put(
    ("user-alice",),
    "/skills/langgraph-docs/SKILL.md",
    create_file_data("""---
name: langgraph-docs
description: Fetch relevant LangGraph documentation to provide accurate guidance.
---

# langgraph-docs

Use the fetch_url tool to read https://docs.langchain.com/llms.txt, then fetch relevant pages.
"""),
)

agent = create_deep_agent(
    model="google_genai:gemini-3.5-flash",
    memory=["/memories/preferences.md"],
    skills=["/skills/"],
    backend=lambda rt: CompositeBackend(
        default=StateBackend(rt),
        routes={
            "/memories/": StoreBackend(
                rt,
                namespace=lambda rt: (rt.server_info.user.identity,),
            ),
            "/skills/": StoreBackend(
                rt,
                namespace=lambda rt: (rt.server_info.user.identity,),
            ),
        },
    ),
    store=store,
)

# When deployed, each authenticated request resolves
# `rt.server_info.user.identity` to the calling user, so Alice and Bob
# automatically see only their own preferences.
agent.invoke(
    {"messages": [{"role": "user", "content": "How do I read a CSV file?"}]},
    config={"configurable": {"thread_id": str(uuid7())}},
)

Advanced usage

On top of the basic configuration options for memory paths and scope, you can also configure more advanced parameters for memory:

Dimension	Question it answers	Options
Duration	How long does it last?	Short-term (single conversation) or long-term (across conversations)
Information type	What kind of information is it?	Episodic (past experiences), procedural (instructions and skills), or semantic (facts)
Scope	Who can see and modify it?	User, agent, or organization
Update strategy	When are memories written?	During conversation (default) or between conversations
Retrieval	How are memories read?	Loaded into prompt (default) or on demand (e.g., skills)
Agent permissions	Can the agent write to memory?	Read-write (default) or read-only (for shared policies)

Episodic memory

Episodic memory stores records of past experiences: what happened, in what order, and what the outcome was. Unlike semantic memory (facts and preferences stored in files like AGENTS.md), episodic memory preserves the full conversational context so the agent can recall how a problem was solved, not just what was learned from it. Deep Agents already use checkpointers which is the mechanism that supports episodic memory: every conversation is persisted as a checkpointed thread. To make past conversations searchable, wrap thread search in a tool. The user_id is pulled from the runtime context rather than passed as a parameter:

from langgraph_sdk import get_client
from langchain.tools import tool, ToolRuntime

client = get_client(url="<DEPLOYMENT_URL>")


@tool
async def search_past_conversations(query: str, runtime: ToolRuntime) -> str:
    """Search past conversations for relevant context."""
    user_id = runtime.server_info.user.identity  
    threads = await client.threads.search(
        metadata={"user_id": user_id},
        limit=5,
    )
    results = []
    for thread in threads:
        history = await client.threads.get_history(thread_id=thread["thread_id"])
        results.append(history)
    return str(results)

You can scope thread search by user or organization by adjusting the metadata filter:

# Search conversations for a specific user
threads = await client.threads.search(
    metadata={"user_id": user_id},
    limit=5,
)

# Search conversations across an organization
threads = await client.threads.search(
    metadata={"org_id": org_id},
    limit=5,
)

This is useful for agents that perform complex, multi-step tasks. For example, a coding agent can look back at a past debugging session and skip straight to the likely root cause.

Organization-level memory

Organization-level memory follows the same pattern as user-scoped memory, but with an organization-wide namespace instead of a per-user one. Use it for policies or knowledge that should apply across all users and agents in an organization. Organization memory is typically read-only to prevent prompt injection via shared state. See read-only vs writable memory for details.

from deepagents import create_deep_agent
from deepagents.backends import CompositeBackend, StateBackend, StoreBackend

agent = create_deep_agent(
    model="google_genai:gemini-3.5-flash",
    memory=[
        "/memories/preferences.md",
        "/policies/compliance.md",
    ],
    backend=CompositeBackend(
        default=StateBackend(),
        routes={
            "/memories/": StoreBackend(
                namespace=lambda rt: (rt.server_info.user.identity,),
            ),
            "/policies/": StoreBackend(
                namespace=lambda rt: (rt.context.org_id,),
            ),
        },
    ),
)

Populate organization memory from your application code:

from langgraph_sdk import get_client
from deepagents.backends.utils import create_file_data

client = get_client(url="<DEPLOYMENT_URL>")

await client.store.put_item(
    (org_id,),
    "/compliance.md",
    create_file_data("""## Compliance policies
- Never disclose internal pricing
- Always include disclaimers on financial advice
"""),
)

Use permissions to enforce that org-level memory is read-only, or policy hooks for custom validation logic.

Background consolidation

By default, the agent writes memories during the conversation (hot path). An alternative is to process memories between conversations as a background task, sometimes called sleep time compute. A separate deep agent reviews recent conversations, extracts key facts, and merges them with existing memories.

Approach	Pros	Cons
Hot path (during conversation)	Memories available immediately, transparent to user	Adds latency, agent must multitask
Background (between conversations)	No user-facing latency, can synthesize across multiple conversations	Memories not available until next conversation, requires a second agent

For most applications, the hot path is sufficient. Add background consolidation when you need to reduce latency or improve memory quality across many conversations. The recommended pattern is to deploy a consolidation agent alongside your main agent — a deep agent that reads recent conversation history, extracts key facts, and merges them into the memory store — and trigger it on a cron schedule. Pick a cadence that reflects how often your users actually interact with the agent: a chat product with steady daily traffic might consolidate every few hours, while a tool used a handful of times per week only needs to run nightly or weekly. Consolidating much more often than users converse just burns tokens on no-op runs.

Consolidation agent

The consolidation agent reads recent conversation history and merges key facts into the memory store. Register it alongside your main agent in langgraph.json:

consolidation_agent.py

from datetime import datetime, timedelta, timezone

from deepagents import create_deep_agent
from langchain.tools import tool, ToolRuntime
from langgraph_sdk import get_client

sdk_client = get_client(url="<DEPLOYMENT_URL>")


@tool
async def search_recent_conversations(query: str, runtime: ToolRuntime) -> str:
    """Search this user's conversations updated in the last 6 hours."""
    user_id = runtime.server_info.user.identity  

    since = datetime.now(timezone.utc) - timedelta(hours=6)
    threads = await sdk_client.threads.search(
        metadata={"user_id": user_id},
        updated_after=since.isoformat(),
        limit=20,
    )
    conversations = []
    for thread in threads:
        history = await sdk_client.threads.get_history(
            thread_id=thread["thread_id"]
        )
        conversations.append(history["values"]["messages"])
    return str(conversations)


agent = create_deep_agent(
    model="google_genai:gemini-3.5-flash",
    system_prompt="""Review recent conversations and update the user's memory file.
Merge new facts, remove outdated information, and keep it concise.""",
    tools=[search_recent_conversations],
)

langgraph.json

{
  "dependencies": ["."],
  "graphs": {
    "agent": "./agent.py:agent",
    "consolidation_agent": "./consolidation_agent.py:agent"
  },
  "env": ".env"
}

Cron

A cron job runs the consolidation agent on a fixed schedule. The agent searches recent conversations and synthesizes them into memory. Match the schedule to your usage patterns so consolidation runs roughly track real activity. Schedule the consolidation agent with a cron job:

from langgraph_sdk import get_client

client = get_client(url="<DEPLOYMENT_URL>")

cron_job = await client.crons.create(
    assistant_id="consolidation_agent",
    schedule="0 */6 * * *",
    input={"messages": [{"role": "user", "content": "Consolidate recent memories."}]},
)

All cron schedules are interpreted in UTC. See cron jobs for details on managing and deleting cron jobs.

The cron interval must match the lookback window inside the consolidation agent. The example above runs every 6 hours (0 */6 * * *) and the agent’s search_recent_conversations tool looks back timedelta(hours=6) — keep these in sync. If the cron runs more often than the lookback, you’ll reprocess the same conversations; if it runs less often, you’ll drop memories that fall outside the window.

For more on deploying agents with background processes, see going to production.

Read-only vs writable memory

By default, the agent can both read and write memory files. For shared state like organization policies or compliance rules, you may want to make memory read-only so the agent can reference it but not modify it. This prevents prompt injection via shared memory and ensures that only your application code controls what’s in the file.

Permission	Use case	How it works
Read-write (default)	User preferences, agent self-improvement, learned skills	Agent updates files via `edit_file` tool
Read-only	Organization policies, compliance rules, shared knowledge bases, developer-defined skills	Populate via application code or the Store API. Use permissions to deny writes to specific paths, or policy hooks for custom validation logic.

Security considerations: If one user can write to memory that another user reads, a malicious user could inject instructions into shared state. To mitigate this:

Default to user scope (user_id) unless you have a specific reason to share
Use read-only memory for shared policies (populate via application code, not the agent)
Add human-in-the-loop validation before the agent writes to shared memory. Use an interrupt to require human approval for writes to sensitive paths.

To enforce read-only memory, use permissions to declaratively deny writes to specific paths. For custom validation logic (rate limiting, audit logging, content inspection), use backend policy hooks.

Concurrent writes

Multiple threads can write to memory in parallel, but concurrent writes to the same file can cause last-write-wins conflicts. For user-scoped memory this is rare since users typically have one active conversation at a time. For agent-scoped or organization-scoped memory, consider using background consolidation to serialize writes, or structure memory as separate files per topic to reduce contention. In practice, if a write fails due to a conflict, the LLM is usually smart enough to retry or recover gracefully, so a single lost write is not catastrophic.

Multiple agents in the same deployment

To give each agent its own memory in a shared deployment, add assistant_id to the namespace:

StoreBackend(
    namespace=lambda rt: (
        rt.server_info.assistant_id,
        rt.server_info.user.identity,
    ),
)

Use assistant_id alone if you only need per-agent isolation without per-user scoping.

Use LangSmith tracing to audit what your agent writes to memory. Every file write appears as a tool call in the trace.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

​How memory works

​Scoped memory

​Agent-scoped memory

​User-scoped memory

​Advanced usage

​Episodic memory

​Organization-level memory

​Background consolidation

​Consolidation agent

​Cron

​Read-only vs writable memory

​Concurrent writes

​Multiple agents in the same deployment

How memory works

Scoped memory

Agent-scoped memory

User-scoped memory

Advanced usage

Episodic memory

Organization-level memory

Background consolidation

Consolidation agent

Cron

Read-only vs writable memory

Concurrent writes

Multiple agents in the same deployment