LangSmith’s bulk data export lets you export trace data from a specific project and date range to an S3-compatible bucket in Parquet format, matching the fields in the Run data format. This is useful for offline analysis in tools like BigQuery, Snowflake, Redshift, or Jupyter Notebooks.
This page covers how to:
- Create an export destination
- Create and configure an export job, including scheduled exports and field filtering
- Monitor export progress
Before you start: exports may take some time depending on data volume, and LangSmith limits how many exports can run concurrently. Bulk exports have a 72-hour runtime timeout—refer to Automatic retry behavior for details. Once launched, LangSmith handles orchestration and resilience of the export process automatically.
1. Create a destination
The destination tells LangSmith where to write your exported data. Before making this request, you will need:
- Your LangSmith API key and workspace ID.
- An S3 or S3-compatible bucket with write access granted to LangSmith (refer to Permissions required).
- The bucket name, prefix, and either the AWS region (for AWS S3) or the endpoint URL (for GCS, MinIO, or other S3-compatible providers).
- An access key and secret key for the bucket.
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"destination_type": "s3",
"display_name": "My S3 Destination",
"config": {
"bucket_name": "your-s3-bucket-name",
"prefix": "root_folder_prefix",
"region": "your aws s3 region",
"endpoint_url": "your endpoint url for s3 compatible buckets"
},
"credentials": {
"access_key_id": "YOUR_S3_ACCESS_KEY_ID",
"secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
}
}'
Credentials are stored securely in encrypted form. The API will validate that the destination and credentials are valid before saving. If the request fails, refer to Debug destination errors.
Save the id from the response; you will need it when creating an export job.
Refer to Manage bulk export destinations for permissions setup, provider-specific configuration (AWS S3, GCS, MinIO), and credential options.
2. Create an export job
An export job targets a specific project and date range. You will need:
- The destination
id from the previous step.
- The project ID (
session_id)—copy this from the individual project view in the Tracing Projects list.
- A
start_time and end_time in UTC ISO 8601 format.
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"bulk_export_destination_id": "your_destination_id",
"session_id": "project_uuid",
"start_time": "2024-01-01T00:00:00Z",
"end_time": "2024-01-03T00:00:00Z",
"format_version": "v2_beta"
}'
The start_time is inclusive and end_time is exclusive. The export will include all runs where run.start_time >= start_time and run.start_time < end_time.
Save the id from the response to monitor the export’s progress.
You can optionally add a filter expression to narrow the set of runs exported. Refer to our filter query language and examples for syntax. Not setting the filter field will export all runs.
Schedule recurring exports
Requires LangSmith Helm version >= 0.10.42 (application version >= 0.10.109)
Scheduled exports collect runs periodically and export to the configured destination.
To create a scheduled export, include interval_hours and omit end_time:
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"bulk_export_destination_id": "your_destination_id",
"session_id": "project_uuid",
"start_time": "2024-01-01T00:00:00Z",
"interval_hours": 1,
"format_version": "v2_beta"
}'
interval_hours must be between 1 and 168 (1 week) inclusive.
end_time must be omitted for scheduled exports; it is still required for one-time exports.
- Each spawned export covers
start_time to start_time + interval_hours, then advances by interval_hours for each subsequent run. Since end_time is exclusive, consecutive exports do not overlap.
- Spawned exports run at
end_time + 10 minutes to account for runs submitted with end_time in the recent past.
- Spawned exports have the
source_bulk_export_id attribute filled. If desired, they must be cancelled separately—cancelling the source export does not cancel already-spawned exports.
- To stop a scheduled export, cancel it.
Example
If a scheduled bulk export is created with start_time=2025-07-16T00:00:00Z and interval_hours=6:
| Export | Start Time | End Time | Runs At |
|---|
| 1 | 2025-07-16T00:00:00Z | 2025-07-16T06:00:00Z | 2025-07-16T06:10:00Z |
| 2 | 2025-07-16T06:00:00Z | 2025-07-16T12:00:00Z | 2025-07-16T12:10:00Z |
| 3 | 2025-07-16T12:00:00Z | 2025-07-16T18:00:00Z | 2025-07-16T18:10:00Z |
Limit exported fields
Requires LangSmith Helm version >= 0.12.11 (application version >= 0.12.42). Supported in both one-time and scheduled exports.
You can improve export speed and reduce file size by limiting which fields are included using the export_fields parameter. When omitted, all fields are included.
curl --request POST \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
--data '{
"bulk_export_destination_id": "your_destination_id",
"session_id": "project_uuid",
"start_time": "2024-01-01T00:00:00Z",
"end_time": "2024-01-03T00:00:00Z",
"export_fields": ["id", "name", "run_type", "start_time", "end_time", "status", "total_tokens", "total_cost"],
"format_version": "v2_beta"
}'
Excluding inputs and outputs can significantly improve export performance and reduce file sizes, especially for large runs. Only include these fields if you need them for your analysis.
Exportable fields
By default, bulk exports include the following fields for each run:
Identifiers & hierarchy:
| Field | Description |
|---|
id | Run ID |
tenant_id | Workspace/tenant ID |
session_id | Project/session ID |
trace_id | Trace ID |
parent_run_id | Parent run ID |
parent_run_ids | List of all parent run IDs |
reference_example_id | Reference to example if part of a dataset |
Basic metadata:
| Field | Description |
|---|
name | Run name |
run_type | Type of run (e.g., “chain”, “llm”, “tool”) |
start_time | Start timestamp (UTC) |
end_time | End timestamp (UTC) |
status | Run status (e.g., “success”, “error”) |
is_root | Whether this is a root-level run |
dotted_order | Hierarchical ordering string |
trace_tier | Trace tier/retention level |
Run data:
| Field | Description |
|---|
inputs | Run inputs (JSON) |
outputs | Run outputs (JSON) |
error | Error message if failed |
extra | Extra metadata (JSON) |
events | Run events (JSON) |
Tags & feedback:
| Field | Description |
|---|
tags | List of tags |
feedback_stats | Feedback statistics (JSON) |
Token usage & costs:
| Field | Description |
|---|
total_tokens | Total token count |
prompt_tokens | Prompt token count |
completion_tokens | Completion token count |
total_cost | Total cost |
prompt_cost | Prompt cost |
completion_cost | Completion cost |
first_token_time | Time to first token |
Partitioning scheme
Data is exported into your bucket using the following Hive partitioned structure:
<bucket>/<prefix>/export_id=<export_id>/tenant_id=<tenant_id>/session_id=<session_id>/runs/year=<year>/month=<month>/day=<day>
3. Monitor your export
Poll the export status using the id from the previous step:
curl --request GET \
--url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: YOUR_API_KEY' \
--header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
The status field in the response will be one of CREATED, RUNNING, COMPLETED, FAILED, CANCELLED, or TIMEDOUT. Exports may take some time depending on the volume of data. Once the status is COMPLETED, the Parquet files are available in your bucket.
Refer to Monitor and troubleshoot bulk exports for how to list runs, stop an export, and diagnose failures.