ChatParallel integration

Parallel is a real-time web search and content extraction platform built for LLMs and AI applications.

ChatParallel is an OpenAI-compatible chat interface to Parallel’s models. The speed model is a low-latency conversational model with no citations; the research models (lite, base, core) browse the web and return per-field citations and structured output via JSON schema.

ChatParallel is the canonical class name. The earlier ChatParallelWeb continues to work as an alias for the same class.

Overview

Integration details

Class	Package	Serializable	JS/TS Support	Downloads	Latest Version
`ChatParallel`	`langchain-parallel`	✅	❌

Model features

Tool calling	Structured output	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
❌	✅ (research models)	❌	❌	❌	✅	✅	❌	❌

Choosing a model

Model	Latency	Web browsing	Citations	Structured output	Use when
`speed`	low	❌	❌	❌	Conversational answers from the model’s parametric knowledge.
`lite`	medium	✅	✅	✅	Fact lookups with citations.
`base`	medium-high	✅	✅	✅	Mid-depth research with citations.
`core`	higher	✅	✅	✅	Multi-source research with citations.

speed does not honor response_format, so with_structured_output() raises a clear error there. Use a research model when you need parsed pydantic output or per-field citations.

Setup

To access Parallel models, install the langchain-parallel integration package and acquire a Parallel API key.

Installation

pip install -U langchain-parallel

Credentials

Head to Parallel to sign up and generate an API key. Set PARALLEL_API_KEY in your environment:

import getpass
import os

if not os.environ.get("PARALLEL_API_KEY"):
    os.environ["PARALLEL_API_KEY"] = getpass.getpass("Parallel API key:\n")

Instantiation

from langchain_parallel import ChatParallel

llm = ChatParallel(
    model="speed",
    # timeout=None,
    # max_retries=2,
    # api_key="...",  # optional if PARALLEL_API_KEY is set
    # base_url="https://api.parallel.ai",  # default
)

See the ChatParallel API reference for the full set of available parameters.

Invocation

messages = [
    ("system", "You are a helpful assistant with access to real-time web information."),
    ("human", "What are the latest developments in AI?"),
]
ai_msg = llm.invoke(messages)
print(ai_msg.content)

Chaining

Chain the model with a prompt template:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate(
    [
        (
            "system",
            "You are a research assistant with access to real-time web information. "
            "Answer questions about {topic} using current sources.",
        ),
        ("human", "{question}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "topic": "artificial intelligence",
        "question": "What are the most significant AI breakthroughs in 2026?",
    }
)

Structured output

On the research models (lite, base, core), ChatParallel.with_structured_output(...) binds a JSON-schema response_format and returns a parsed pydantic object (or dict). Calling it on speed raises a ValueError, since speed silently ignores response_format.

from pydantic import BaseModel, Field

class Founder(BaseModel):
    name: str = Field(description="Full name of the founder")
    company: str = Field(description="Company they founded")

structured = ChatParallel(model="lite").with_structured_output(Founder)
parsed = structured.invoke([("human", "Who founded SpaceX?")])
print(parsed)

name='Elon Musk' company='SpaceX'

method="json_schema" (the default), method="json_mode", and method="function_calling" are all accepted. Pass include_raw=True to receive the full {"raw", "parsed", "parsing_error"} envelope and capture parser failures:

structured = ChatParallel(model="lite").with_structured_output(Founder, include_raw=True)
res = structured.invoke([("human", "Who founded SpaceX?")])
res["parsed"]          # Founder(...) or None
res["parsing_error"]   # Exception or None
res["raw"]             # original AIMessage

Citations

Research models populate AIMessage.response_metadata["basis"] with per-field citations, the model’s reasoning, and a confidence label. response_metadata["interaction_id"] is surfaced for multi-turn context chaining; system_fingerprint is forwarded when present.

cited = ChatParallel(model="lite").invoke([
    ("human", "Who is the current CEO of OpenAI? One sentence."),
])
print(cited.content)
print("\nbasis:", cited.response_metadata.get("basis"))
print("interaction_id:", cited.response_metadata.get("interaction_id"))

Streaming

ChatParallel supports per-token streaming:

for chunk in llm.stream(messages):
    print(chunk.content, end="")

Async

ai_msg = await llm.ainvoke(messages)

async for chunk in llm.astream(messages):
    print(chunk.content, end="")

Token usage

Parallel does not currently provide token usage metadata. usage_metadata is None.

ai_msg = llm.invoke(messages)
print(ai_msg.usage_metadata)
# None

Response metadata

ai_msg = llm.invoke(messages)
print(ai_msg.response_metadata)
# {'model_name': 'speed', 'finish_reason': 'stop', 'created': 1764043410}

For research models, response_metadata additionally carries basis (per-field citations), interaction_id (for multi-turn chaining), and system_fingerprint when available.

Error handling

The integration raises ValueError with a descriptive message on common failure modes:

from langchain_parallel import ChatParallel

try:
    llm = ChatParallel(api_key="invalid-key")
    response = llm.invoke([("human", "Hello")])
except ValueError as e:
    if "Authentication failed" in str(e):
        print("Invalid API key provided")
    elif "Rate limit exceeded" in str(e):
        print("API rate limit exceeded, please try again later")

OpenAI compatibility

ChatParallel accepts many OpenAI Chat Completions API parameters for drop-in OpenAI-client migration. Advanced parameters such as tools, tool_choice, top_p, and frequency_penalty are accepted but ignored by the Parallel API.

llm = ChatParallel(
    model="speed",
    # accepted but ignored by Parallel:
    tools=[{"type": "function", "function": {"name": "example"}}],
    tool_choice="auto",
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0,
    logit_bias={},
    seed=42,
    user="user-123",
)

For structured output, prefer ChatParallel.with_structured_output(...) (see Structured output) over passing response_format directly. It works on the research models and returns a parsed object.

Message handling

The integration merges consecutive messages of the same type to satisfy API requirements:

from langchain.messages import HumanMessage, SystemMessage

# Consecutive system messages are automatically merged before the API call.
messages = [
    SystemMessage("You are a helpful assistant."),
    SystemMessage("Always be polite and concise."),
    HumanMessage("What is the weather like today?"),
]

response = llm.invoke(messages)

API reference

For detailed documentation, head to the ChatParallel API reference or the Parallel chat API quickstart.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Popular Providers

Integrations by component

Overview

Integration details

Model features

Choosing a model

Setup

Installation

Credentials

Instantiation

Invocation

Chaining

Structured output

Citations

Streaming

Async

Token usage

Response metadata

Error handling

OpenAI compatibility

Message handling

API reference

Popular Providers

Integrations by component

Documentation Index

​Overview

​Integration details

​Model features

​Choosing a model

​Setup

​Installation

​Credentials

​Instantiation

​Invocation

​Chaining

​Structured output

​Citations

​Streaming

​Async

​Token usage

​Response metadata

​Error handling

​OpenAI compatibility

​Message handling

​API reference

Overview

Integration details

Model features

Choosing a model

Setup

Installation

Credentials

Instantiation

Invocation

Chaining

Structured output

Citations

Streaming

Async

Token usage

Response metadata

Error handling

OpenAI compatibility

Message handling

API reference