Skip to main content
Get started using the Soniox audio transcription loader in LangChain.

Setup

Install the package:
npm2yarn
npm install @soniox/langchain

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Example how to transcribe audio file using the SonioxAudioTranscriptLoader and generate the summary with an LLM.
import { SonioxAudioTranscriptLoader } from "@soniox/langchain";
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const audioFileUrl = "https://soniox.com/media/examples/coffee_shop.mp3";
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    language_hints: ["en"],
    // Any other transcription parameters you find here
    // https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription
  }
);

console.log(`Transcribing ${audioFileUrl}...`);
const docs = await loader.load();

const transcriptText = docs[0].pageContent;
console.log(`Transcript: ${transcriptText}`);

// Create a chain to summarize the transcript
const prompt = ChatPromptTemplate.fromTemplate(
  "Write a concise summary of the following speech:\n\n{transcript}"
);

const chain = prompt
  .pipe(new ChatOpenAI({ model: "gpt-5-mini" }))
  .pipe(new StringOutputParser());

const summary = await chain.invoke({ transcript: transcriptText });
console.log(summary);
You can also transcribe audio from binary data:
// Fetch the file
const response = await fetch("https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3");
const audioBuffer = await response.bytes(); // Uint8Array

const loader = new SonioxAudioTranscriptLoader({
    audio: audioBuffer,
})

const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed text

Translation

Translate from any detected language to a target language:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    translation: {
      type: "one_way",
      target_language: "fr",
    },
    language_hints: ["en"],
  }
);

const docs = await loader.load();

let originalText = "";
let translatedText = "";

for (const token of docs[0].metadata.tokens) {
  if (token.translation_status === "translation") {
    translatedText += token.text;
  } else {
    originalText += token.text;
  }
}

console.log(originalText);
console.log(translatedText);
You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.

Language hints

Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages. Language hints do not restrict recognition — they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    language_hints: ["en", "es"],
  }
);

const docs = await loader.load();
For more details, see the Soniox language hints documentation.

Speaker diarization

Enable speaker identification to distinguish between different speakers:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    enable_speaker_diarization: true,
  }
);

const docs = await loader.load();

// Access speaker information in the metadata
let currentSpeaker = null;
let output = "";
for (const token of docs[0].metadata.tokens) {
  if (currentSpeaker !== token.speaker) {
    currentSpeaker = token.speaker;
    output += `\nSpeaker ${currentSpeaker}: ${token.text.trimStart()}`;
  } else {
    output += token.text;
  }
}
console.log(output);

// Analyze the conversation
const prompt = ChatPromptTemplate.fromTemplate(
  `Analyze the following conversation between speakers.
Identify the intent of each speaker.

Conversation:
{conversation}`
);

const chain = prompt
  .pipe(new ChatOpenAI({ model: "gpt-5-mini" }))
  .pipe(new StringOutputParser());

const analysis = await chain.invoke({ conversation: output });
console.log(analysis);

Language identification

Enable automatic language detection and identification:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    enable_language_identification: true,
  }
);

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    context: {
      general: [
        { key: "industry", value: "healthcare" },
        { key: "meeting_type", value: "consultation" }
      ],
      terms: ["hypertension", "cardiology", "metformin"],
      translation_terms: [
        { source: "blood pressure", target: "presión arterial" },
        { source: "medication", target: "medicamento" }
      ]
    }
  }
);
For more details, see the Soniox context documentation.

API reference

Constructor parameters

SonioxLoaderParams (required)

ParameterTypeRequiredDescription
audioUint8Array | stringYesAudio file as buffer or URL
audioFormatSonioxAudioFormatNoAudio file format
apiKeystringNoSoniox API key (defaults to SONIOX_API_KEY env var)
apiBaseUrlstringNoAPI base URL (defaults to https://api.soniox.com/v1)
pollingIntervalMsnumberNoPolling interval in ms (min: 1000, default: 1000)
pollingTimeoutMsnumberNoPolling timeout in ms (default: 180000)

SonioxLoaderOptions (optional)

ParameterTypeDescription
modelSonioxTranscriptionModelIdModel to use (default: "stt-async-v3")
translationobjectTranslation configuration
language_hintsstring[]Language hints for transcription
language_hints_strictbooleanEnforce strict language hints
enable_speaker_diarizationbooleanEnable speaker identification
enable_language_identificationbooleanEnable language detection
contextobjectContext for improved accuracy
Browse the documentation for a full list of supported options.

Supported audio formats

  • aac - Advanced Audio Coding
  • aiff - Audio Interchange File Format
  • amr - Adaptive Multi-Rate
  • asf - Advanced Systems Format
  • flac - Free Lossless Audio Codec
  • mp3 - MPEG Audio Layer III
  • ogg - Ogg Vorbis
  • wav - Waveform Audio File Format
  • webm - WebM Audio

Return value

The load() method returns an array containing a single Document object:
type Document {
  pageContent: string, // The transcribed text
  metadata: SonioxTranscriptResponse // Full transcript with metadata
}
The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.
type SonioxTranscriptResponse = {
  id: string;
  text?: string | null;
  tokens?: SonioxTranscriptToken[] | null;
}
Token type:
type SonioxTranscriptToken = {
  text: string;
  start_ms?: number | null;
  end_ms?: number | null;
  confidence?: number | null;
  speaker?: number | string | null;
  language?: string | null;
  translation_status?: string | null;
};
You can learn more about the SonioxTranscriptResponse type in the Soniox REST API Reference.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.