Parallel is a real-time web search and content extraction platform built for LLMs and AI applications.
ChatParallel is an OpenAI-compatible chat interface to Parallel’s models. The speed model is a low-latency conversational model with no citations; the research models (lite, base, core) browse the web and return per-field citations and structured output via JSON schema.
ChatParallel is the canonical class name. The earlier ChatParallelWeb continues to work as an alias for the same class.
Overview
Integration details
| Class | Package | Serializable | JS/TS Support | Downloads | Latest Version |
|---|
ChatParallel | langchain-parallel | ✅ | ❌ |  |  |
Model features
| Tool calling | Structured output | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |
|---|
| ❌ | ✅ (research models) | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |
Choosing a model
| Model | Latency | Web browsing | Citations | Structured output | Use when |
|---|
speed | low | ❌ | ❌ | ❌ | Conversational answers from the model’s parametric knowledge. |
lite | medium | ✅ | ✅ | ✅ | Fact lookups with citations. |
base | medium-high | ✅ | ✅ | ✅ | Mid-depth research with citations. |
core | higher | ✅ | ✅ | ✅ | Multi-source research with citations. |
speed does not honor response_format, so with_structured_output() raises a clear error there. Use a research model when you need parsed pydantic output or per-field citations.
Setup
To access Parallel models, install the langchain-parallel integration package and acquire a Parallel API key.
Installation
pip install -U langchain-parallel
Credentials
Head to Parallel to sign up and generate an API key. Set PARALLEL_API_KEY in your environment:
import getpass
import os
if not os.environ.get("PARALLEL_API_KEY"):
os.environ["PARALLEL_API_KEY"] = getpass.getpass("Parallel API key:\n")
Instantiation
from langchain_parallel import ChatParallel
llm = ChatParallel(
model="speed",
# timeout=None,
# max_retries=2,
# api_key="...", # optional if PARALLEL_API_KEY is set
# base_url="https://api.parallel.ai", # default
)
See the ChatParallel API reference for the full set of available parameters.
Invocation
messages = [
("system", "You are a helpful assistant with access to real-time web information."),
("human", "What are the latest developments in AI?"),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)
Chaining
Chain the model with a prompt template:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate(
[
(
"system",
"You are a research assistant with access to real-time web information. "
"Answer questions about {topic} using current sources.",
),
("human", "{question}"),
]
)
chain = prompt | llm
chain.invoke(
{
"topic": "artificial intelligence",
"question": "What are the most significant AI breakthroughs in 2026?",
}
)
Structured output
On the research models (lite, base, core), ChatParallel.with_structured_output(...) binds a JSON-schema response_format and returns a parsed pydantic object (or dict). Calling it on speed raises a ValueError, since speed silently ignores response_format.
from pydantic import BaseModel, Field
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
company: str = Field(description="Company they founded")
structured = ChatParallel(model="lite").with_structured_output(Founder)
parsed = structured.invoke([("human", "Who founded SpaceX?")])
print(parsed)
name='Elon Musk' company='SpaceX'
method="json_schema" (the default), method="json_mode", and method="function_calling" are all accepted. Pass include_raw=True to receive the full {"raw", "parsed", "parsing_error"} envelope and capture parser failures:
structured = ChatParallel(model="lite").with_structured_output(Founder, include_raw=True)
res = structured.invoke([("human", "Who founded SpaceX?")])
res["parsed"] # Founder(...) or None
res["parsing_error"] # Exception or None
res["raw"] # original AIMessage
Citations
Research models populate AIMessage.response_metadata["basis"] with per-field citations, the model’s reasoning, and a confidence label. response_metadata["interaction_id"] is surfaced for multi-turn context chaining; system_fingerprint is forwarded when present.
cited = ChatParallel(model="lite").invoke([
("human", "Who is the current CEO of OpenAI? One sentence."),
])
print(cited.content)
print("\nbasis:", cited.response_metadata.get("basis"))
print("interaction_id:", cited.response_metadata.get("interaction_id"))
Streaming
ChatParallel supports per-token streaming:
for chunk in llm.stream(messages):
print(chunk.content, end="")
Async
ai_msg = await llm.ainvoke(messages)
async for chunk in llm.astream(messages):
print(chunk.content, end="")
Token usage
Parallel does not currently provide token usage metadata. usage_metadata is None.
ai_msg = llm.invoke(messages)
print(ai_msg.usage_metadata)
# None
ai_msg = llm.invoke(messages)
print(ai_msg.response_metadata)
# {'model_name': 'speed', 'finish_reason': 'stop', 'created': 1764043410}
For research models, response_metadata additionally carries basis (per-field citations), interaction_id (for multi-turn chaining), and system_fingerprint when available.
Error handling
The integration raises ValueError with a descriptive message on common failure modes:
from langchain_parallel import ChatParallel
try:
llm = ChatParallel(api_key="invalid-key")
response = llm.invoke([("human", "Hello")])
except ValueError as e:
if "Authentication failed" in str(e):
print("Invalid API key provided")
elif "Rate limit exceeded" in str(e):
print("API rate limit exceeded, please try again later")
OpenAI compatibility
ChatParallel accepts many OpenAI Chat Completions API parameters for drop-in OpenAI-client migration. Advanced parameters such as tools, tool_choice, top_p, and frequency_penalty are accepted but ignored by the Parallel API.
llm = ChatParallel(
model="speed",
# accepted but ignored by Parallel:
tools=[{"type": "function", "function": {"name": "example"}}],
tool_choice="auto",
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0,
logit_bias={},
seed=42,
user="user-123",
)
For structured output, prefer ChatParallel.with_structured_output(...) (see Structured output) over passing response_format directly. It works on the research models and returns a parsed object.
Message handling
The integration merges consecutive messages of the same type to satisfy API requirements:
from langchain.messages import HumanMessage, SystemMessage
# Consecutive system messages are automatically merged before the API call.
messages = [
SystemMessage("You are a helpful assistant."),
SystemMessage("Always be polite and concise."),
HumanMessage("What is the weather like today?"),
]
response = llm.invoke(messages)
API reference
For detailed documentation, head to the ChatParallel API reference or the Parallel chat API quickstart.