> ## Documentation Index
> Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Bulk export trace data

> Export LangSmith trace data to an S3-compatible bucket in Parquet format.

<Info>
  **Plan restrictions apply**

  Please note that the Data Export functionality is only supported for [LangSmith Plus or Enterprise tiers](https://www.langchain.com/pricing-langsmith).
</Info>

LangSmith's bulk data export lets you export trace data from a specific project and date range to an S3-compatible bucket in [Parquet](https://parquet.apache.org/docs/overview/) format, matching the fields in the [Run data format](/langsmith/run-data-format). This is useful for offline analysis in tools like BigQuery, Snowflake, Redshift, or Jupyter Notebooks.

This page covers how to:

* Create an export destination
* Create and configure an export job, including scheduled exports and field filtering
* Monitor export progress

**Before you start:** exports may take some time depending on data volume, and LangSmith limits how many exports can run concurrently. Bulk exports have a 72-hour runtime timeout—refer to [Automatic retry behavior](/langsmith/data-export-monitor#automatic-retry-behavior) for details. Once launched, LangSmith handles orchestration and [resilience of the export process](/langsmith/data-export-monitor#failure-modes-and-retry-policy) automatically.

## 1. Create a destination

The destination tells LangSmith where to write your exported data. Before making this request, you will need:

* Your [LangSmith API key](/langsmith/create-account-api-key) and [workspace ID](/langsmith/set-up-hierarchy#set-up-a-workspace).
* An S3 or S3-compatible bucket with **write access** granted to LangSmith (refer to [Permissions required](/langsmith/data-export-destinations#permissions-required)).
* The bucket name, prefix, and either the AWS region (for AWS S3) or the endpoint URL (for GCS, MinIO, or other S3-compatible providers).
* An access key and secret key for the bucket.

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/destinations' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "destination_type": "s3",
    "display_name": "My S3 Destination",
    "config": {
      "bucket_name": "your-s3-bucket-name",
      "prefix": "root_folder_prefix",
      "region": "your aws s3 region",
      "endpoint_url": "your endpoint url for s3 compatible buckets"
    },
    "credentials": {
      "access_key_id": "YOUR_S3_ACCESS_KEY_ID",
      "secret_access_key": "YOUR_S3_SECRET_ACCESS_KEY"
    }
  }'
```

Credentials are stored securely in encrypted form. The API will validate that the destination and credentials are valid before saving. If the request fails, refer to [Debug destination errors](/langsmith/data-export-destinations#debug-destination-errors).

Save the `id` from the response; you will need it when creating an export job.

Refer to [Manage bulk export destinations](/langsmith/data-export-destinations) for permissions setup, provider-specific configuration (AWS S3, GCS, MinIO), and credential options.

## 2. Create an export job

An export job targets a specific project and date range. You will need:

* The destination `id` from the [previous step](#1-create-a-destination).
* The project ID (`session_id`)—copy this from the individual project view in the [**Tracing Projects** list](https://smith.langchain.com?utm_source=docs\&utm_medium=cta\&utm_campaign=langsmith-signup\&utm_content=langsmith-data-export).
* A `start_time` and `end_time` in UTC ISO 8601 format.

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-03T00:00:00Z",
    "format_version": "v2_beta"
  }'
```

The `start_time` is inclusive and `end_time` is exclusive. The export will include all runs where `run.start_time >= start_time` and `run.start_time < end_time`.

Save the `id` from the response to monitor the export's progress.

You can optionally add a `filter` expression to narrow the set of runs exported. Refer to our [filter query language](/langsmith/trace-query-syntax#filter-query-language) and [examples](/langsmith/export-traces#use-filter-query-language) for syntax. Not setting the `filter` field will export all runs.

### Schedule recurring exports

<Note>
  Requires LangSmith Helm version >= `0.10.42` (application version >= `0.10.109`)
</Note>

Scheduled exports collect runs periodically and export to the configured destination.
To create a scheduled export, include `interval_hours` and omit `end_time`:

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "interval_hours": 1,
    "format_version": "v2_beta"
  }'
```

* `interval_hours` must be between 1 and 168 (1 week) inclusive.
* `end_time` must be omitted for scheduled exports; it is still required for one-time exports.
* Each spawned export covers `start_time` to `start_time + interval_hours`, then advances by `interval_hours` for each subsequent run. Since `end_time` is exclusive, consecutive exports do not overlap.
* Spawned exports run at `end_time + 10 minutes` to account for runs submitted with `end_time` in the recent past.
* Spawned exports have the `source_bulk_export_id` attribute filled. If desired, they must be cancelled separately—cancelling the source export **does not** cancel already-spawned exports.
* To stop a scheduled export, [cancel it](/langsmith/data-export-monitor#stop-an-export).

**Example**

If a scheduled bulk export is created with `start_time=2025-07-16T00:00:00Z` and `interval_hours=6`:

| Export | Start Time           | End Time             | Runs At              |
| ------ | -------------------- | -------------------- | -------------------- |
| 1      | 2025-07-16T00:00:00Z | 2025-07-16T06:00:00Z | 2025-07-16T06:10:00Z |
| 2      | 2025-07-16T06:00:00Z | 2025-07-16T12:00:00Z | 2025-07-16T12:10:00Z |
| 3      | 2025-07-16T12:00:00Z | 2025-07-16T18:00:00Z | 2025-07-16T18:10:00Z |

### Limit exported fields

<Note>
  Requires LangSmith Helm version >= `0.12.11` (application version >= `0.12.42`). Supported in both one-time and scheduled exports.
</Note>

You can improve export speed and reduce file size by limiting which fields are included using the `export_fields` parameter. When omitted, all fields are included.

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
curl --request POST \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID' \
  --data '{
    "bulk_export_destination_id": "your_destination_id",
    "session_id": "project_uuid",
    "start_time": "2024-01-01T00:00:00Z",
    "end_time": "2024-01-03T00:00:00Z",
    "export_fields": ["id", "name", "run_type", "start_time", "end_time", "status", "total_tokens", "total_cost"],
    "format_version": "v2_beta"
  }'
```

<Tip>
  Excluding `inputs` and `outputs` can significantly improve export performance and reduce file sizes, especially for large runs. Only include these fields if you need them for your analysis.
</Tip>

### Exportable fields

By default, bulk exports include the following fields for each run:

**Identifiers & hierarchy:**

| Field                  | Description                               |
| ---------------------- | ----------------------------------------- |
| `id`                   | Run ID                                    |
| `tenant_id`            | Workspace/tenant ID                       |
| `session_id`           | Project/session ID                        |
| `trace_id`             | Trace ID                                  |
| `parent_run_id`        | Parent run ID                             |
| `parent_run_ids`       | List of all parent run IDs                |
| `reference_example_id` | Reference to example if part of a dataset |

**Basic metadata:**

| Field          | Description                                |
| -------------- | ------------------------------------------ |
| `name`         | Run name                                   |
| `run_type`     | Type of run (e.g., "chain", "llm", "tool") |
| `start_time`   | Start timestamp (UTC)                      |
| `end_time`     | End timestamp (UTC)                        |
| `status`       | Run status (e.g., "success", "error")      |
| `is_root`      | Whether this is a root-level run           |
| `dotted_order` | Hierarchical ordering string               |
| `trace_tier`   | Trace tier/retention level                 |

**Run data:**

| Field     | Description             |
| --------- | ----------------------- |
| `inputs`  | Run inputs (JSON)       |
| `outputs` | Run outputs (JSON)      |
| `error`   | Error message if failed |
| `extra`   | Extra metadata (JSON)   |
| `events`  | Run events (JSON)       |

**Tags & feedback:**

| Field            | Description                                                                          |
| ---------------- | ------------------------------------------------------------------------------------ |
| `tags`           | List of tags                                                                         |
| `feedback_stats` | Feedback statistics (JSON). Refer to the following note for aggregation limitations. |

<Note>
  **`feedback_stats` aggregation limitation**

  The `feedback_stats` field only includes value breakdowns for string-type feedback. Feedback with non-string values (numeric, boolean, complex types) is excluded from these breakdowns. To analyze non-string feedback values, export the raw feedback data separately.
</Note>

**Token usage & costs:**

| Field               | Description            |
| ------------------- | ---------------------- |
| `total_tokens`      | Total token count      |
| `prompt_tokens`     | Prompt token count     |
| `completion_tokens` | Completion token count |
| `total_cost`        | Total cost             |
| `prompt_cost`       | Prompt cost            |
| `completion_cost`   | Completion cost        |
| `first_token_time`  | Time to first token    |

### Partitioning scheme

Data is exported into your bucket using the following Hive partitioned structure:

```
<bucket>/<prefix>/export_id=<export_id>/tenant_id=<tenant_id>/session_id=<session_id>/runs/year=<year>/month=<month>/day=<day>
```

## 3. Monitor your export

Poll the export status using the `id` from the [previous step](#2-create-an-export-job):

```bash theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
curl --request GET \
  --url 'https://api.smith.langchain.com/api/v1/bulk-exports/{export_id}' \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: YOUR_API_KEY' \
  --header 'X-Tenant-Id: YOUR_WORKSPACE_ID'
```

The `status` field in the response will be one of `CREATED`, `RUNNING`, `COMPLETED`, `FAILED`, `CANCELLED`, or `TIMEDOUT`. Exports may take some time depending on the volume of data. Once the status is `COMPLETED`, the Parquet files are available in your bucket.

Refer to [Monitor and troubleshoot bulk exports](/langsmith/data-export-monitor) for how to list runs, stop an export, and diagnose failures.

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/langsmith/data-export.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>
