Class | Package | Local | Serializable | PY support |
---|---|---|---|---|
PuppeteerWebBaseLoader | @langchain/community | ✅ | beta | ❌ |
Source | Web Loader | Node Envs Only |
---|---|---|
PuppeteerWebBaseLoader | ✅ | ✅ |
PuppeteerWebBaseLoader
document loader you’ll need to install the @langchain/community
integration package, along with the puppeteer
peer dependency.
@langchain/community
package:
launchOptions
: an optional object that specifies additional options to pass to the puppeteer.launch() method. This can include options such as the headless flag to launch the browser in headless mode, or the slowMo option to slow down Puppeteer’s actions to make them easier to follow.
gotoOptions
: an optional object that specifies additional options to pass to the page.goto() method. This can include options such as the timeout option to specify the maximum navigation time in milliseconds, or the waitUntil option to specify when to consider the navigation as successful.
evaluate
: an optional function that can be used to evaluate JavaScript code on the page using the page.evaluate() method. This can be useful for extracting data from the page or interacting with page elements. The function should return a Promise that resolves to a string containing the result of the evaluation.
PuppeteerWebBaseLoader
constructor, you can customize the behavior of the loader and use Puppeteer’s powerful features to scrape and interact with web pages.
.screenshot()
method.
This will return an instance of Document
where the page content is a base64 encoded image, and the metadata contains a source
field with the URL of the page.