> ## Documentation Index
> Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Brightdatawebscraperapi integration

> Extract structured data from 44 popular domains using Bright Data's Web Scraper API

[Bright Data](https://brightdata.com/) provides a powerful Web Scraper API that allows you to extract structured data from 44 popular domains, including e-commerce sites (Amazon, Walmart, eBay), social media (LinkedIn, Instagram, TikTok, Facebook), and more, making it particularly useful for AI agents requiring reliable structured web data feeds.

## Overview

### Integration details

| Class                                                                       | Package                                                                  | Serializable | JS support |                                               Version                                              |
| :-------------------------------------------------------------------------- | :----------------------------------------------------------------------- | :----------: | :--------: | :------------------------------------------------------------------------------------------------: |
| [`BrightDataWebScraperAPI`](https://pypi.org/project/langchain-brightdata/) | [`langchain-brightdata`](https://pypi.org/project/langchain-brightdata/) |       ✅      |      ❌     | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-brightdata?style=flat-square\&label=%20) |

### Tool features

| Native async | Returns artifact | Return data                                                              |            Pricing           |
| :----------: | :--------------: | :----------------------------------------------------------------------- | :--------------------------: |
|       ❌      |         ❌        | Structured data from websites (Amazon products, LinkedIn profiles, etc.) | Requires Bright Data account |

## Setup

The integration lives in the `langchain-brightdata` package.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
pip install langchain-brightdata
```

You'll need a Bright Data API key to use this tool. You can set it as an environment variable:

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
import os

os.environ["BRIGHT_DATA_API_KEY"] = "your-api-key"
```

Or pass it directly when initializing the tool:

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_brightdata import BrightDataWebScraperAPI

scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")
```

## Instantiation

Here we show how to instantiate an instance of the BrightDataWebScraperAPI tool. This tool allows you to extract structured data from various websites including Amazon product details, LinkedIn profiles, and more using Bright Data's Dataset API.

The tool accepts the following parameter during instantiation:

* `bright_data_api_key` (required, str): Your Bright Data API key for authentication.

## Invocation

### Basic usage

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_brightdata import BrightDataWebScraperAPI

# Initialize the tool
scraper_tool = BrightDataWebScraperAPI(
    bright_data_api_key="your-api-key"  # Optional if set in environment variables
)

# Extract Amazon product data
results = scraper_tool.invoke(
    {"url": "https://www.amazon.com/dp/B08L5TNJHG", "dataset_type": "amazon_product"}
)

print(results)
```

### Advanced usage with parameters

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_brightdata import BrightDataWebScraperAPI

# Initialize with default parameters
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Extract Amazon product data with location-specific pricing
results = scraper_tool.invoke(
    {
        "url": "https://www.amazon.com/dp/B08L5TNJHG",
        "dataset_type": "amazon_product",
        "zipcode": "10001",  # Get pricing for New York City
    }
)

print(results)

# Extract LinkedIn profile data
linkedin_results = scraper_tool.invoke(
    {
        "url": "https://www.linkedin.com/in/satyanadella/",
        "dataset_type": "linkedin_person_profile",
    }
)

print(linkedin_results)
```

## Customization options

The BrightDataWebScraperAPI tool accepts several parameters for customization:

| Parameter         | Type | Description                                                   |
| :---------------- | :--- | :------------------------------------------------------------ |
| `url`             | str  | The URL to extract data from                                  |
| `dataset_type`    | str  | Type of dataset to use (see available types below)            |
| `zipcode`         | str  | Optional zipcode for location-specific data                   |
| `keyword`         | str  | Search keyword (required for `amazon_product_search`)         |
| `first_name`      | str  | First name (required for `linkedin_people_search`)            |
| `last_name`       | str  | Last name (required for `linkedin_people_search`)             |
| `num_of_reviews`  | str  | Number of reviews (required for `facebook_company_reviews`)   |
| `num_of_comments` | str  | Number of comments (for `youtube_comments`, default: 10)      |
| `days_limit`      | str  | Days to limit results (for `google_maps_reviews`, default: 3) |

## Available dataset types (44 datasets)

### E-commerce (10 datasets)

| Dataset Type             | Description                     | Required Inputs   |
| :----------------------- | :------------------------------ | :---------------- |
| `amazon_product`         | Product details, pricing, specs | `url` (with /dp/) |
| `amazon_product_reviews` | Customer reviews and ratings    | `url` (with /dp/) |
| `amazon_product_search`  | Search results from Amazon      | `keyword`, `url`  |
| `walmart_product`        | Walmart product data            | `url` (with /ip/) |
| `walmart_seller`         | Walmart seller information      | `url`             |
| `ebay_product`           | eBay product data               | `url`             |
| `homedepot_products`     | Home Depot product data         | `url`             |
| `zara_products`          | Zara product data               | `url`             |
| `etsy_products`          | Etsy product data               | `url`             |
| `bestbuy_products`       | Best Buy product data           | `url`             |

### LinkedIn (5 datasets)

| Dataset Type               | Description                 | Required Inputs                  |
| :------------------------- | :-------------------------- | :------------------------------- |
| `linkedin_person_profile`  | Professional profile data   | `url`                            |
| `linkedin_company_profile` | Company information         | `url`                            |
| `linkedin_job_listings`    | Job listing details         | `url`                            |
| `linkedin_posts`           | Post content and engagement | `url`                            |
| `linkedin_people_search`   | Search for people           | `url`, `first_name`, `last_name` |

### Business intelligence (2 datasets)

| Dataset Type               | Description                         | Required Inputs |
| :------------------------- | :---------------------------------- | :-------------- |
| `crunchbase_company`       | Company funding, investors, metrics | `url`           |
| `zoominfo_company_profile` | B2B company intelligence            | `url`           |

### Instagram (4 datasets)

| Dataset Type         | Description                 | Required Inputs |
| :------------------- | :-------------------------- | :-------------- |
| `instagram_profiles` | Profile data and stats      | `url`           |
| `instagram_posts`    | Post content and engagement | `url`           |
| `instagram_reels`    | Reel content and metrics    | `url`           |
| `instagram_comments` | Comments on posts           | `url`           |

### Facebook (4 datasets)

| Dataset Type                    | Description                 | Required Inputs         |
| :------------------------------ | :-------------------------- | :---------------------- |
| `facebook_posts`                | Post content and engagement | `url`                   |
| `facebook_marketplace_listings` | Marketplace listing data    | `url`                   |
| `facebook_company_reviews`      | Company reviews             | `url`, `num_of_reviews` |
| `facebook_events`               | Event details               | `url`                   |

### TikTok (4 datasets)

| Dataset Type      | Description               | Required Inputs |
| :---------------- | :------------------------ | :-------------- |
| `tiktok_profiles` | Profile data and stats    | `url`           |
| `tiktok_posts`    | Video content and metrics | `url`           |
| `tiktok_shop`     | Shop product data         | `url`           |
| `tiktok_comments` | Comments on videos        | `url`           |

### YouTube (3 datasets)

| Dataset Type       | Description               | Required Inputs                        |
| :----------------- | :------------------------ | :------------------------------------- |
| `youtube_profiles` | Channel profile data      | `url`                                  |
| `youtube_videos`   | Video content and metrics | `url`                                  |
| `youtube_comments` | Comments on videos        | `url`, `num_of_comments` (default: 10) |

### Google (3 datasets)

| Dataset Type          | Description                | Required Inputs                  |
| :-------------------- | :------------------------- | :------------------------------- |
| `google_maps_reviews` | Business reviews from Maps | `url`, `days_limit` (default: 3) |
| `google_shopping`     | Shopping product data      | `url`                            |
| `google_play_store`   | App store data             | `url`                            |

### Other platforms (9 datasets)

| Dataset Type                | Description              | Required Inputs |
| :-------------------------- | :----------------------- | :-------------- |
| `apple_app_store`           | iOS app data             | `url`           |
| `x_posts`                   | X (Twitter) post data    | `url`           |
| `reddit_posts`              | Reddit post data         | `url`           |
| `github_repository_file`    | GitHub file content      | `url`           |
| `yahoo_finance_business`    | Financial business data  | `url`           |
| `reuter_news`               | News article data        | `url`           |
| `zillow_properties_listing` | Real estate listing data | `url`           |
| `booking_hotel_listings`    | Hotel listing data       | `url`           |

## Use within an agent

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_brightdata import BrightDataWebScraperAPI
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent


# Initialize the LLM
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key="your-api-key")

# Initialize the Bright Data Web Scraper API tool
scraper_tool = BrightDataWebScraperAPI(bright_data_api_key="your-api-key")

# Create the agent with the tool
agent = create_agent(llm, [scraper_tool])

# Provide a user query
user_input = "Scrape Amazon product data for https://www.amazon.com/dp/B0D2Q9397Y?th=1 in New York (zipcode 10001)."

# Stream the agent's step-by-step output
stream = agent.stream_events({"messages": user_input}, version="v3")
for snapshot in stream.values:
    snapshot["messages"][-1].pretty_print()
```

***

## API reference

* [Bright Data API Documentation](https://docs.brightdata.com/scraping-automation/web-scraper-api/overview)

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/oss/python/integrations/tools/brightdata-webscraperapi.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>