Nimble’s Extract API extracts rendered content from specific URLs by browsing them with headless browsers rather than relying on cached or API-limited data. This retriever handles JavaScript rendering, dynamic content, and complex navigation flows—making it suitable for RAG applications that need access to specific web pages, including content behind pagination, filters, and client-side rendering.We can use this as a retriever. It will show functionality specific to this integration. After going through, it may be useful to explore relevant use-case pages to learn how to use this retriever as part of a larger chain.
Installation
Usage
Now we can instantiate our retriever:Use within a chain
We can easily combine this retriever into a RAG chain for extracting and analyzing specific web content:Advanced configuration
The retriever supports extensive configuration for URL extraction:| Parameter | Type | Default | Description |
|---|---|---|---|
parsing_type | str | ”plain_text” | Output format: “plain_text”, “markdown”, or “simplified_html” |
driver | str | ”vx6” | Browser driver version: “vx6” (fast), “vx8” (balanced), or “vx10” (comprehensive) |
wait | int | None | Milliseconds to wait for page load (0-60000) |
render | bool | True | Enable JavaScript rendering |
locale | str | ”en” | Page locale preference (e.g., “en-US”) |
country | str | ”US” | Country code for localized content (e.g., “US”) |
api_key | str | env var | Nimble API key (defaults to NIMBLE_API_KEY environment variable) |
Best Practices
Driver selection
- vx6 (default): Fast extraction for standard websites
- vx8: Balanced performance for moderately complex sites
- vx10: Comprehensive rendering for JavaScript-heavy SPAs and complex dynamic content
Page load configuration
- No wait (
wait=None): Default for most modern websites - Short wait (
wait=1000-2000): For pages with lazy loading or deferred content - Longer wait (
wait=5000+): For slow-loading SPAs or heavy JavaScript that need time to fully render
Output format selection
- Plain text (default): Fast extraction of raw text content
- Markdown: Best for RAG - preserves structure with headers, lists, code blocks
- HTML: When you need to preserve detailed styling or structure information
Performance optimization
- Tune wait times: Only use when necessary—fast sites don’t need wait times
- Batch related URLs: Extract multiple pages from same domain in parallel
- Choose right format: Markdown for RAG, plain_text for simpler processing
- Use async: Leverage
ainvoke()for concurrent URL extraction - Validate content: Check that pages load successfully before processing
API reference
For detailed documentation of allNimbleExtractRetriever features and configurations, visit the Nimble API documentation.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.