AirbyteCDKLoader is deprecated. Please use AirbyteLoader instead.
Airbyte is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.A lot of source connectors are implemented using the Airbyte CDK. This loader allows to run any of these connectors and return the data as documents.
Installation
First, you need to install theairbyte-cdk python package.
Example
Now you can create anAirbyteCDKLoader based on the imported source. It takes a config object that’s passed to the connector. You also have to pick the stream you want to retrieve records from by name (stream_name). Check the connectors documentation page and spec definition for more information on the config object and available streams. For the GitHub connectors these are:
- https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/spec.json.
- https://docs.airbyte.com/integrations/sources/github/
load returns a list, it will block until all documents are loaded. To have better control over this process, you can also use the lazy_load method which returns an iterator instead:
Incremental loads
Some streams allow incremental loading, this means the source keeps track of synced records and won’t load them again. This is useful for sources that have a high volume of data and are updated frequently. To take advantage of this, store thelast_state property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.