Parallel Extract

Parallel is a real-time web search and content extraction platform designed specifically for LLMs and AI applications.

The ParallelExtractTool provides access to Parallel’s Extract API, which extracts clean, structured content from web pages.

Overview

Integration details

Class	Package	Serializable	JS support	Package latest
`ParallelExtractTool`	`langchain-parallel`	❌	❌

Tool features

Clean content extraction: Extracts main content from web pages, removing ads, navigation, and boilerplate
Markdown formatting: Returns content formatted as clean markdown
Batch processing: Extract from multiple URLs in a single API call
Metadata extraction: Includes title, publish date, and other metadata
Content length control: Configure maximum characters per extraction
Error handling: Gracefully handles failed extractions with detailed error information
Async support: Full async/await support for better performance

Setup

The integration lives in the langchain-parallel package.

pip install -qU langchain-parallel

Credentials

Head to Parallel to sign up and generate an API key. Once you’ve done this set the PARALLEL_API_KEY environment variable:

import getpass
import os

if not os.environ.get("PARALLEL_API_KEY"):
    os.environ["PARALLEL_API_KEY"] = getpass.getpass("Parallel API key:\n")

Instantiation

Here we show how to instantiate an instance of the ParallelExtractTool. The tool can be configured with API key and content length parameters:

from langchain_parallel import ParallelExtractTool

# Basic instantiation - API key from environment
tool = ParallelExtractTool()

# With explicit API key and custom settings
tool = ParallelExtractTool(
    api_key="your-api-key",
    base_url="https://api.parallel.ai",  # default value
    max_chars_per_extract=5000,  # Limit content length
)

Invocation

Invoke directly with args

You can invoke the tool with a list of URLs to extract content from:

# Extract from a single URL
result = tool.invoke(
    {"urls": ["https://en.wikipedia.org/wiki/Artificial_intelligence"]}
)

print(f"Extracted {len(result)} result(s)")
print(f"Title: {result[0]['title']}")
print(f"URL: {result[0]['url']}")
print(f"Content length: {len(result[0]['content'])} characters")
print(f"Content preview: {result[0]['content'][:200]}...")

# Extract from multiple URLs
result = tool.invoke(
    {
        "urls": [
            "https://en.wikipedia.org/wiki/Machine_learning",
            "https://en.wikipedia.org/wiki/Deep_learning",
            "https://en.wikipedia.org/wiki/Natural_language_processing",
        ]
    }
)

print(f"Extracted {len(result)} results")
for i, item in enumerate(result, 1):
    print(f"\n{i}. {item['title']}")
    print(f"   URL: {item['url']}")
    print(f"   Content length: {len(item['content'])} characters")

# Example response structure:
# [
#     {
#         "url": "https://example.com/article",
#         "title": "Article Title",
#         "content": "# Article Title\n\nMain content in markdown...",
#         "publish_date": "2024-01-15"  # Optional
#     }
# ]

Invoke with `ToolCall`

We can also invoke the tool with a model-generated ToolCall, in which case a ToolMessage will be returned:

# This is usually generated by a model, but we'll create a tool call directly for demo purposes.
model_generated_tool_call = {
    "args": {
        "urls": [
            "https://en.wikipedia.org/wiki/Climate_change",
            "https://en.wikipedia.org/wiki/Renewable_energy",
        ]
    },
    "id": "call_123",
    "name": tool.name,  # "parallel_extract"
    "type": "tool_call",
}

result = tool.invoke(model_generated_tool_call)
print(result)
print(f"Tool name: {tool.name}")  # parallel_extract
print(f"Tool description: {tool.description}")

Async usage

The tool supports full async/await operations for better performance in async applications:

async def extract_async():
    return await tool.ainvoke(
        {
            "urls": [
                "https://en.wikipedia.org/wiki/Python_(programming_language)",
                "https://en.wikipedia.org/wiki/JavaScript",
            ]
        }
    )


# Run async extraction
result = await extract_async()
print(f"Extracted {len(result)} results asynchronously")

Advanced features

The extract tool supports focused extraction with search objectives/queries, fetch policies, and fine-grained control over excerpts and full content:

# Extract focused excerpts with search objective
result = tool.invoke(
    {
        "urls": ["https://en.wikipedia.org/wiki/Artificial_intelligence"],
        "search_objective": "What are the main applications and ethical concerns of AI?",
        "excerpts": {"max_chars_per_result": 2000},
        "full_content": False,
    }
)

print(f"Extracted focused excerpts: {len(result[0].get('excerpts', []))} excerpts")
print(f"Content preview: {result[0]['content'][:200]}...")

# Extract with fetch policy for fresh content
result = tool.invoke(
    {
        "urls": ["https://en.wikipedia.org/wiki/Quantum_computing"],
        "fetch_policy": {
            "max_age_seconds": 86400,  # 1 day cache
            "timeout_seconds": 60,
            "disable_cache_fallback": False,
        },
        "full_content": {"max_chars_per_result": 5000},
    }
)

print(f"Content length: {len(result[0]['content'])} characters")

Error handling

The tool gracefully handles URLs that fail to extract, including them in results with error information:

# Mix of valid and invalid URLs
result = tool.invoke(
    {
        "urls": [
            "https://en.wikipedia.org/wiki/Artificial_intelligence",
            "https://this-domain-does-not-exist-12345.com/",
        ]
    }
)

for item in result:
    if "error_type" in item:
        print(f"Failed: {item['url']}")
        print(f"Error: {item['content']}")
    else:
        print(f"Success: {item['url']}")
        print(f"Extracted {len(item['content'])} characters")

Best practices

Batch URLs: Extract multiple URLs in a single call for better performance
Set content limits: Use max_chars_per_extract to control response size and token usage
Handle errors: Check for error_type in results to identify failed extractions
Use async for performance: Use ainvoke() in async applications for better performance
Metadata fields: Use publish_date and other metadata when available for context

Response format

The tool returns a list of dictionaries with the following format:

[
    {
        "url": "https://example.com/article",
        "title": "Article Title",
        "content": "# Article Title\n\nMain content formatted as markdown...",
        "publish_date": "2024-01-15"  # Optional, if available
    },
    # For failed extractions:
    {
        "url": "https://failed-site.com",
        "title": None,
        "content": "Error: 404 Not Found",
        "error_type": "http_error"
    }
]

API reference

For detailed documentation of all features and configuration options, head to the ParallelExtractTool API reference or the Parallel extract reference.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Overview

Integration details

Tool features

Setup

Credentials

Instantiation

Invocation

Invoke directly with args

Invoke with `ToolCall`

Async usage

Advanced features

Error handling

Best practices

Response format

API reference

Popular Providers

Integrations by component

​Overview

​Integration details

​Tool features

​Setup

​Credentials

​Instantiation

​Invocation

​Invoke directly with args

​Invoke with ToolCall

​Async usage

​Advanced features

​Error handling

​Best practices

​Response format

​API reference

Overview

Integration details

Tool features

Setup

Credentials

Instantiation

Invocation

Invoke directly with args

Invoke with `ToolCall`

Async usage

Advanced features

Error handling

Best practices

Response format

API reference