google.com
, google.ad
) to retrieve region-specific search results.interest_over_time
, interest_by_region
, related_queries
, and related_topics
.Class | Package | Serializable | JS support | Package latest |
---|---|---|---|---|
ScrapelessUniversalScrapingTool | langchain-scrapeless | ✅ | ❌ |
Native async | Returns artifact | Return data |
---|---|---|
✅ | ✅ | html, markdown, links, metadata, structured content |
langchain-scrapeless
package.
!pip install langchain-scrapeless
url
(required, str): The URL of the website to scrape.headless
(optional, bool): Whether to use a headless browser. Default is True.js_render
(optional, bool): Whether to enable JavaScript rendering. Default is True.js_wait_until
(optional, str): Defines when to consider the JavaScript-rendered page ready. Default is 'domcontentloaded'
. Options include:
load
: Wait until the page is fully loaded.domcontentloaded
: Wait until the DOM is fully loaded.networkidle0
: Wait until the network is idle.networkidle2
: Wait until the network is idle for 2 seconds.outputs
(optional, str): The specific type of data to extract from the page. Options include:
phone_numbers
headings
images
audios
videos
links
menus
hashtags
emails
metadata
tables
favicon
response_type
(optional, str): Defines the format of the response. Default is 'html'
. Options include:
html
: Return the raw HTML of the page.plaintext
: Return the plain text content.markdown
: Return a Markdown version of the page.png
: Return a PNG screenshot.jpeg
: Return a JPEG screenshot.response_image_full_page
(optional, bool): Whether to capture and return a full-page image when using screenshot output (png or jpeg). Default is False.selector
(optional, str): A specific CSS selector to scope scraping within a part of the page. Default is None
.proxy_country
(optional, str): Two-letter country code for geo-specific proxy access (e.g., 'us'
, 'gb'
, 'de'
, 'jp'
). Default is 'ANY'
.