Skip to main content
When you call an LLM directly, outside of LangChain or a supported integration, you need to provide specific metadata so that LangSmith can display token counts, calculate costs, and let you open the run in the Playground with the correct provider and model. There are four requirements for a fully functional LLM trace:
RequirementWhat to doEnables
1. Set run_type="llm"Pass run_type="llm" to @traceableLLM-specific rendering, token/cost display
2. Format inputs/outputsUse OpenAI, Anthropic, or LangChain message formatStructured message rendering, Playground support
3. Set ls_provider and ls_model_namePass both in metadataCost tracking, Playground model selection
4. Provide token countsSet usage_metadata on the runToken counts and cost calculation
If you are using LangChain OSS, the OpenAI wrapper, or the Anthropic wrapper, these details are handled automatically.The examples on this page use the traceable decorator/wrapper (the recommended approach for Python and JS/TS). The same requirements apply if you use the RunTree or API directly.

Messages format

When tracing a custom model or a custom input/output format, it must either follow the LangChain format, OpenAI completions format or Anthropic messages format. For more details, refer to the OpenAI Chat Completions or Anthropic Messages documentation. The LangChain format is:
 inputs = {
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Hi, can you tell me the capital of France?"
        }
      ]
    }
  ]
}

outputs = {
  "messages": [
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "The capital of France is Paris."
        },
        {
          "type": "reasoning",
          "text": "The user is asking about..."
        }
      ]
    }
  ]
}

Convert custom I/O formats into LangSmith compatible formats

If you’re using a custom input or output format, you can convert it to a LangSmith compatible format using process_inputs/processInputs and process_outputs/processOutputs functions on the @traceable decorator (Python) or traceable function (TS). process_inputs/processInputs and process_outputs/processOutputs accept functions that allow you to transform the inputs and outputs of a specific trace before they are logged to LangSmith. They have access to the trace’s inputs and outputs, and can return a new dictionary with the processed data. Here’s a boilerplate example of how to use process_inputs and process_outputs to convert a custom I/O format into a LangSmith compatible format:

Identify a custom model in traces

When using a custom model, it is recommended to also provide the following metadata fields to identify the model when viewing traces and when filtering.
  • ls_provider: The provider of the model, e.g. “openai”, “anthropic”, etc.
  • ls_model_name: The name of the model, e.g. “gpt-4.1-mini”, “claude-3-opus-20240229”, etc.
from langsmith import traceable

inputs = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "I'd like to book a table for two."},
]
output = {
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "Sure, what time would you like to book the table for?"
            }
        }
    ]
}

@traceable(
    run_type="llm",
    metadata={"ls_provider": "my_provider", "ls_model_name": "my_model"}
)
def chat_model(messages: list):
    return output

chat_model(inputs)
This code will log the following trace:
LangSmith UI showing an LLM call trace called ChatOpenAI with a system and human input followed by an AI Output.
If you implement a custom streaming chat_model, you can “reduce” the outputs into the same format as the non-streaming version. This is currently only supported in Python.
def _reduce_chunks(chunks: list):
    all_text = "".join([chunk["choices"][0]["message"]["content"] for chunk in chunks])
    return {"choices": [{"message": {"content": all_text, "role": "assistant"}}]}

@traceable(
    run_type="llm",
    reduce_fn=_reduce_chunks,
    metadata={"ls_provider": "my_provider", "ls_model_name": "my_model"}
)
def my_streaming_chat_model(messages: list):
    for chunk in ["Hello, " + messages[1]["content"]]:
        yield {
            "choices": [
                {
                    "message": {
                        "content": chunk,
                        "role": "assistant",
                    }
                }
            ]
        }

list(
    my_streaming_chat_model(
        [
            {"role": "system", "content": "You are a helpful assistant. Please greet the user."},
            {"role": "user", "content": "polly the parrot"},
        ],
    )
)
If ls_model_name is not present in extra.metadata, other fields might be used from the extra.metadata for estimating token counts. The following fields are used in the order of precedence:
  1. metadata.ls_model_name
  2. inputs.model
  3. inputs.model_name
To learn more about how to use the metadata fields, refer to the Add metadata and tags guide.

Provide token and cost information

Token counts enable cost calculation and are displayed in the trace UI. There are two ways to provide them:
  • Set usage_metadata on the run tree: call get_current_run_tree() / getCurrentRunTree() inside your @traceable function and set the usage_metadata field. This does not change your function’s return value.
  • Return usage_metadata in the output: include usage_metadata as a top-level key in the dictionary your function returns.

Supported usage_metadata fields

FieldTypeDescription
input_tokensintTotal input/prompt tokens
output_tokensintTotal output/completion tokens
total_tokensintSum of input + output (optional, can be inferred)
input_token_detailsobjectBreakdown: cache_read, cache_creation, audio, text, image
output_token_detailsobjectBreakdown: reasoning, audio, text, image
To send costs directly (for non-linear pricing), you can also include input_cost, output_cost, and total_cost fields. See Cost tracking for details on configuring model pricing and viewing costs in the UI.

Time-to-first-token

If you are using traceable or one of our SDK wrappers, LangSmith will automatically populate time-to-first-token for streaming LLM runs. However, if you are using the RunTree API directly, you will need to add a new_token event to the run tree in order to properly populate time-to-first-token. Here’s an example:
from langsmith.run_trees import RunTree
run_tree = RunTree(
    name="CustomChatModel",
    run_type="llm",
    inputs={ ... }
)
run_tree.post()
llm_stream = ...
first_token = None
for token in llm_stream:
    if first_token is None:
      first_token = token
      run_tree.add_event({
        "name": "new_token"
      })
run_tree.end(outputs={ ... })
run_tree.patch()