Scrapeless offers flexible and feature-rich data acquisition services with extensive parameter customization and multi-format export support. These capabilities empower LangChain to integrate and leverage external data more effectively. The core functional modules include: DeepSerpDocumentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
- Google Search: Enables comprehensive extraction of Google SERP data across all result types.
- Supports selection of localized Google domains (e.g.,
google.com,google.ad) to retrieve region-specific search results. - Pagination supported for retrieving results beyond the first page.
- Supports a search result filtering toggle to control whether to exclude duplicate or similar content.
- Supports selection of localized Google domains (e.g.,
- Google Trends: Retrieves keyword trend data from Google, including popularity over time, regional interest, and related searches.
- Supports multi-keyword comparison.
- Supports multiple data types:
interest_over_time,interest_by_region,related_queries, andrelated_topics. - Allows filtering by specific Google properties (Web, YouTube, News, Shopping) for source-specific trend analysis.
- Designed for modern, JavaScript-heavy websites, allowing dynamic content extraction.
- Global premium proxy support for bypassing geo-restrictions and improving reliability.
- Crawl: Recursively crawl a website and its linked pages to extract site-wide content.
- Supports configurable crawl depth and scoped URL targeting.
- Scrape: Extract content from a single webpage with high precision.
- Supports “main content only” extraction to exclude ads, footers, and other non-essential elements.
- Allows batch scraping of multiple standalone URLs.
Overview
Integration details
| Class | Package | Serializable | JS support | Version |
|---|---|---|---|---|
ScrapelessUniversalScrapingTool | langchain-scrapeless | ✅ | ❌ |
Tool features
| Native async | Returns artifact | Return data |
|---|---|---|
| ✅ | ✅ | html, markdown, links, metadata, structured content |
Setup
The integration lives in thelangchain-scrapeless package.
!pip install langchain-scrapeless
Credentials
You’ll need a Scrapeless API key to use this tool. You can set it as an environment variable:Instantiation
Here we show how to instantiate an instance of the Scrapeless Universal Scraping Tool. This tool allows you to scrape any website using a headless browser with JavaScript rendering capabilities, customizable output types, and geo-specific proxy support. The tool accepts the following parameters during instantiation:url(required, str): The URL of the website to scrape.headless(optional, bool): Whether to use a headless browser. Default is True.js_render(optional, bool): Whether to enable JavaScript rendering. Default is True.js_wait_until(optional, str): Defines when to consider the JavaScript-rendered page ready. Default is'domcontentloaded'. Options include:load: Wait until the page is fully loaded.domcontentloaded: Wait until the DOM is fully loaded.networkidle0: Wait until the network is idle.networkidle2: Wait until the network is idle for 2 seconds.
outputs(optional, str): The specific type of data to extract from the page. Options include:phone_numbersheadingsimagesaudiosvideoslinksmenushashtagsemailsmetadatatablesfavicon
response_type(optional, str): Defines the format of the response. Default is'html'. Options include:html: Return the raw HTML of the page.plaintext: Return the plain text content.markdown: Return a Markdown version of the page.png: Return a PNG screenshot.jpeg: Return a JPEG screenshot.
response_image_full_page(optional, bool): Whether to capture and return a full-page image when using screenshot output (png or jpeg). Default is False.selector(optional, str): A specific CSS selector to scope scraping within a part of the page. Default isNone.proxy_country(optional, str): Two-letter country code for geo-specific proxy access (e.g.,'us','gb','de','jp'). Default is'ANY'.
Invocation
Basic usage
Advanced usage with parameters
Use within an agent
API reference
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

