Test agent logic without API calls using fake chat models and in-memory persistence.
Unit tests exercise small, deterministic pieces of your agent in isolation. By replacing the real LLM with an in-memory fake (AKA fixture), you can script exact responses (text, tool calls, and errors) so tests are fast, free, and repeatable without API keys.
fakeModel is a builder-style fake chat model that lets you script exact responses (text, tool calls, errors) and assert what the model received. It extends BaseChatModel, so it works anywhere a real model is expected.
Create a model, queue responses with .respond(), and invoke. Each invoke() consumes the next queued response in order:
Copy
import { fakeModel } from "langchain";import { AIMessage, HumanMessage } from "@langchain/core/messages";const model = fakeModel() .respond(new AIMessage("I can help with that.")) .respond(new AIMessage("Here's what I found.")) .respond(new AIMessage("You're welcome!"));const r1 = await model.invoke([new HumanMessage("Can you help?")]);// r1.content === "I can help with that."const r2 = await model.invoke([new HumanMessage("What did you find?")]);// r2.content === "Here's what I found."const r3 = await model.invoke([new HumanMessage("Thanks!")]);// r3.content === "You're welcome!"
If the model is invoked more times than there are queued responses, it throws a descriptive error:
Copy
const model = fakeModel() .respond(new AIMessage("only one"));await model.invoke([new HumanMessage("first")]); // worksawait model.invoke([new HumanMessage("second")]); // throws: "no response queued for invocation 1"
The id field is optional. If omitted, a unique ID is auto-generated.
.respond() and .respondWithTools() can be mixed freely in any order. This is particularly useful for testing agentic loops where the model alternates between tool calls and text responses.
.respond() also accepts a function that computes the response based on the input messages. The function receives the full message array and returns either a BaseMessage or an Error:
Copy
import { fakeModel } from "langchain";import { AIMessage, HumanMessage } from "@langchain/core/messages";const model = fakeModel() .respond((messages) => { const last = messages[messages.length - 1].text; return new AIMessage(`You said: ${last}`); });const result = await model.invoke([new HumanMessage("hello")]);console.log(result.content); // "You said: hello"
Factory functions can also return errors:
Copy
import { fakeModel } from "langchain";import { AIMessage, HumanMessage } from "@langchain/core/messages";const model = fakeModel() .respond((messages) => { const content = messages[messages.length - 1].text; if (content.includes("forbidden")) { return new Error("Content policy violation"); } return new AIMessage("OK"); });await model.invoke([new HumanMessage("forbidden topic")]); // throws "Content policy violation"
Each function is a single queue entry, consumed once. To reuse the same dynamic logic for multiple turns, queue multiple respond function calls.
For code that uses .withStructuredOutput(), configure the fake return value with .structuredResponse():
Copy
import { fakeModel } from "langchain";import { HumanMessage } from "@langchain/core/messages";import { z } from "zod";const model = fakeModel() .structuredResponse({ temperature: 72, unit: "fahrenheit" });const structured = model.withStructuredOutput( z.object({ temperature: z.number(), unit: z.string(), }));const result = await structured.invoke([new HumanMessage("Weather?")]);console.log(result);// { temperature: 72, unit: "fahrenheit" }
The schema passed to .withStructuredOutput() is ignored. The model always returns the value configured with .structuredResponse(). This keeps tests focused on application logic rather than parsing.
fakeModel records every invocation, including the messages and options passed to the model. This works like a spy or mock in traditional testing frameworks:
Agent frameworks like LangChain agents and LangGraph call model.bindTools(tools) internally. fakeModel handles this automatically. The bound model shares the same response queue and call recording as the original, so no special setup is needed:
Copy
import { fakeModel } from "langchain";import { AIMessage, HumanMessage } from "@langchain/core/messages";import { tool } from "@langchain/core/tools";import { z } from "zod";const searchTool = tool(async ({ query }) => `Results for: ${query}`, { name: "search", description: "Search the web", schema: z.object({ query: z.string() }),});const model = fakeModel() .respondWithTools([{ name: "search", args: { query: "weather" }, id: "1" }]) .respond(new AIMessage("The weather is sunny."));const bound = model.bindTools([searchTool]);const r1 = await bound.invoke([new HumanMessage("weather?")]);console.log(r1.tool_calls[0].name); // "search"const r2 = await bound.invoke([new HumanMessage("thanks")]);console.log(r2.content); // "The weather is sunny."// Call recording is shared. Inspect via the original model.console.log(model.callCount); // 2
Full example: test a tool-calling agent with vitest
Copy
import { describe, test, expect } from "vitest";import { fakeModel } from "langchain";import { AIMessage, HumanMessage, ToolMessage } from "@langchain/core/messages";import { tool } from "@langchain/core/tools";import { z } from "zod";const getWeather = tool( async ({ city }) => `72°F and sunny in ${city}`, { name: "get_weather", description: "Get weather for a city", schema: z.object({ city: z.string() }), });async function runAgent( model: ReturnType<typeof fakeModel>, input: string) { const messages: any[] = [new HumanMessage(input)]; const bound = model.bindTools([getWeather]); while (true) { const response = await bound.invoke(messages); messages.push(response); if (!response.tool_calls?.length) { return { messages, finalResponse: response }; } for (const tc of response.tool_calls) { const result = await getWeather.invoke(tc.args); messages.push(new ToolMessage({ content: result as string, tool_call_id: tc.id!, })); } }}describe("weather agent", () => { test("calls get_weather and returns a final answer", async () => { const model = fakeModel() .respondWithTools([ { name: "get_weather", args: { city: "SF" }, id: "call_1" }, ]) .respond(new AIMessage("It's 72°F and sunny in SF!")); const { finalResponse } = await runAgent(model, "Weather in SF?"); expect(finalResponse.content).toBe("It's 72°F and sunny in SF!"); expect(model.callCount).toBe(2); const secondCall = model.calls[1].messages; const toolMsg = secondCall.find((m: any) => m._getType() === "tool"); expect(toolMsg?.content).toContain("72°F and sunny in SF"); }); test("handles model errors gracefully", async () => { const model = fakeModel() .respond(new Error("rate limit")); await expect( runAgent(model, "Weather?") ).rejects.toThrow("rate limit"); expect(model.callCount).toBe(1); });});