Parallel is a real-time web search and content extraction platform designed specifically for LLMs and AI applications.The
ParallelExtractTool provides access to Parallelβs Extract API, which extracts clean, structured content from web pages.
Overview
Integration details
| Class | Package | Serializable | JS support | Package latest |
|---|---|---|---|---|
ParallelExtractTool | langchain-parallel | β | β |
Tool features
- Clean content extraction: Extracts main content from web pages, removing ads, navigation, and boilerplate
- Markdown formatting: Returns content formatted as clean markdown
- Batch processing: Extract from multiple URLs in a single API call
- Metadata extraction: Includes title, publish date, and other metadata
- Content length control: Configure maximum characters per extraction
- Error handling: Gracefully handles failed extractions with detailed error information
- Async support: Full async/await support for better performance
Setup
The integration lives in thelangchain-parallel package.
Credentials
Head to Parallel to sign up and generate an API key. Once youβve done this set thePARALLEL_API_KEY environment variable:
Instantiation
Here we show how to instantiate an instance of theParallelExtractTool. The tool can be configured with API key and content length parameters:
Invocation
Invoke directly with args
You can invoke the tool with a list of URLs to extract content from:Invoke with ToolCall
We can also invoke the tool with a model-generated ToolCall, in which case a ToolMessage will be returned:
Async usage
The tool supports full async/await operations for better performance in async applications:Advanced features
The extract tool supports focused extraction with search objectives/queries, fetch policies, and fine-grained control over excerpts and full content:Error handling
The tool gracefully handles URLs that fail to extract, including them in results with error information:Best practices
- Batch URLs: Extract multiple URLs in a single call for better performance
- Set content limits: Use
max_chars_per_extractto control response size and token usage - Handle errors: Check for
error_typein results to identify failed extractions - Use async for performance: Use
ainvoke()in async applications for better performance - Metadata fields: Use
publish_dateand other metadata when available for context
Response format
The tool returns a list of dictionaries with the following format:API reference
For detailed documentation of all features and configuration options, head to theParallelExtractTool API reference or the Parallel extract reference.