Overview
Integration details
Class | Package | Local | Serializable | PY support |
---|---|---|---|---|
WebPDFLoader | @langchain/community | ✅ | beta | ❌ |
Loader features
Source | Web Loader | Node Envs Only |
---|---|---|
WebPDFLoader | ✅ | ❌ |
splitPages
option to false
.
Setup
To accessWebPDFLoader
document loader you’ll need to install the @langchain/community
integration, along with the pdf-parse
package:
Credentials
If you want to get automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:Installation
The LangChain WebPDFLoader integration lives in the@langchain/community
package:
Instantiation
Now we can instantiate our model object and load documents:Load
Usage, custom pdfjs
build
By default we use the pdfjs
build bundled with pdf-parse
, which is compatible with most environments, including Node.js and modern browsers. If you want to use a more recent version of pdfjs-dist
or if you want to use a custom build of pdfjs-dist
, you can do so by providing a custom pdfjs
function that returns a promise that resolves to the PDFJS
object.
In the following example we use the “legacy” (see pdfjs docs) build of pdfjs-dist
, which includes several polyfills not included in the default build.