Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.langchain.com/llms.txt

Use this file to discover all available pages before exploring further.

Parallel is a real-time web search and content extraction platform built for LLMs and AI applications.
ChatParallel is an OpenAI-compatible chat interface to Parallel’s models. The speed model is a low-latency conversational model with no citations; the research models (lite, base, core) browse the web and return per-field citations and structured output via JSON schema.
ChatParallel is the canonical class name. The earlier ChatParallelWeb continues to work as an alias for the same class.

Overview

Integration details

ClassPackageSerializableJS/TS SupportDownloadsLatest Version
ChatParallellangchain-parallelDownloads per monthPyPI - Latest version

Model features

Tool callingStructured outputImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs
✅ (research models)

Choosing a model

ModelLatencyWeb browsingCitationsStructured outputUse when
speedlowConversational answers from the model’s parametric knowledge.
litemediumFact lookups with citations.
basemedium-highMid-depth research with citations.
corehigherMulti-source research with citations.
speed does not honor response_format, so with_structured_output() raises a clear error there. Use a research model when you need parsed pydantic output or per-field citations.

Setup

To access Parallel models, install the langchain-parallel integration package and acquire a Parallel API key.

Installation

pip install -U langchain-parallel

Credentials

Head to Parallel to sign up and generate an API key. Set PARALLEL_API_KEY in your environment:
import getpass
import os

if not os.environ.get("PARALLEL_API_KEY"):
    os.environ["PARALLEL_API_KEY"] = getpass.getpass("Parallel API key:\n")

Instantiation

from langchain_parallel import ChatParallel

llm = ChatParallel(
    model="speed",
    # timeout=None,
    # max_retries=2,
    # api_key="...",  # optional if PARALLEL_API_KEY is set
    # base_url="https://api.parallel.ai",  # default
)
See the ChatParallel API reference for the full set of available parameters.

Invocation

messages = [
    ("system", "You are a helpful assistant with access to real-time web information."),
    ("human", "What are the latest developments in AI?"),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)

Chaining

Chain the model with a prompt template:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate(
    [
        (
            "system",
            "You are a research assistant with access to real-time web information. "
            "Answer questions about {topic} using current sources.",
        ),
        ("human", "{question}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "topic": "artificial intelligence",
        "question": "What are the most significant AI breakthroughs in 2026?",
    }
)

Structured output

On the research models (lite, base, core), ChatParallel.with_structured_output(...) binds a JSON-schema response_format and returns a parsed pydantic object (or dict). Calling it on speed raises a ValueError, since speed silently ignores response_format.
from pydantic import BaseModel, Field

class Founder(BaseModel):
    name: str = Field(description="Full name of the founder")
    company: str = Field(description="Company they founded")

structured = ChatParallel(model="lite").with_structured_output(Founder)
parsed = structured.invoke([("human", "Who founded SpaceX?")])
print(parsed)
name='Elon Musk' company='SpaceX'
method="json_schema" (the default), method="json_mode", and method="function_calling" are all accepted. Pass include_raw=True to receive the full {"raw", "parsed", "parsing_error"} envelope and capture parser failures:
structured = ChatParallel(model="lite").with_structured_output(Founder, include_raw=True)
res = structured.invoke([("human", "Who founded SpaceX?")])
res["parsed"]          # Founder(...) or None
res["parsing_error"]   # Exception or None
res["raw"]             # original AIMessage

Citations

Research models populate AIMessage.response_metadata["basis"] with per-field citations, the model’s reasoning, and a confidence label. response_metadata["interaction_id"] is surfaced for multi-turn context chaining; system_fingerprint is forwarded when present.
cited = ChatParallel(model="lite").invoke([
    ("human", "Who is the current CEO of OpenAI? One sentence."),
])
print(cited.content)
print("\nbasis:", cited.response_metadata.get("basis"))
print("interaction_id:", cited.response_metadata.get("interaction_id"))

Streaming

ChatParallel supports per-token streaming:
for chunk in llm.stream(messages):
    print(chunk.content, end="")

Async

ai_msg = await llm.ainvoke(messages)

async for chunk in llm.astream(messages):
    print(chunk.content, end="")

Token usage

Parallel does not currently provide token usage metadata. usage_metadata is None.
ai_msg = llm.invoke(messages)
print(ai_msg.usage_metadata)
# None

Response metadata

ai_msg = llm.invoke(messages)
print(ai_msg.response_metadata)
# {'model_name': 'speed', 'finish_reason': 'stop', 'created': 1764043410}
For research models, response_metadata additionally carries basis (per-field citations), interaction_id (for multi-turn chaining), and system_fingerprint when available.

Error handling

The integration raises ValueError with a descriptive message on common failure modes:
from langchain_parallel import ChatParallel

try:
    llm = ChatParallel(api_key="invalid-key")
    response = llm.invoke([("human", "Hello")])
except ValueError as e:
    if "Authentication failed" in str(e):
        print("Invalid API key provided")
    elif "Rate limit exceeded" in str(e):
        print("API rate limit exceeded, please try again later")

OpenAI compatibility

ChatParallel accepts many OpenAI Chat Completions API parameters for drop-in OpenAI-client migration. Advanced parameters such as tools, tool_choice, top_p, and frequency_penalty are accepted but ignored by the Parallel API.
llm = ChatParallel(
    model="speed",
    # accepted but ignored by Parallel:
    tools=[{"type": "function", "function": {"name": "example"}}],
    tool_choice="auto",
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0,
    logit_bias={},
    seed=42,
    user="user-123",
)
For structured output, prefer ChatParallel.with_structured_output(...) (see Structured output) over passing response_format directly. It works on the research models and returns a parsed object.

Message handling

The integration merges consecutive messages of the same type to satisfy API requirements:
from langchain.messages import HumanMessage, SystemMessage

# Consecutive system messages are automatically merged before the API call.
messages = [
    SystemMessage("You are a helpful assistant."),
    SystemMessage("Always be polite and concise."),
    HumanMessage("What is the weather like today?"),
]

response = llm.invoke(messages)

API reference

For detailed documentation, head to the ChatParallel API reference or the Parallel chat API quickstart.