Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Get started using the Soniox audio transcription loader in LangChain.
Setup
Install the package:
npm install @soniox/langchain
Credentials
Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_key
Usage
Basic transcription
Example how to transcribe audio file using the SonioxAudioTranscriptLoader and generate the summary with an LLM.
import { SonioxAudioTranscriptLoader } from "@soniox/langchain";
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
const audioFileUrl = "https://soniox.com/media/examples/coffee_shop.mp3";
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioFileUrl,
},
{
language_hints: ["en"],
// Any other transcription parameters you find here
// https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription
}
);
console.log(`Transcribing ${audioFileUrl}...`);
const docs = await loader.load();
const transcriptText = docs[0].pageContent;
console.log(`Transcript: ${transcriptText}`);
// Create a chain to summarize the transcript
const prompt = ChatPromptTemplate.fromTemplate(
"Write a concise summary of the following speech:\n\n{transcript}"
);
const chain = prompt
.pipe(new ChatOpenAI({ model: "gpt-5-mini" }))
.pipe(new StringOutputParser());
const summary = await chain.invoke({ transcript: transcriptText });
console.log(summary);
You can also transcribe audio from binary data:
// Fetch the file
const response = await fetch("https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3");
const audioBuffer = await response.bytes(); // Uint8Array
const loader = new SonioxAudioTranscriptLoader({
audio: audioBuffer,
})
const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed text
Translation
Translate from any detected language to a target language:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioFileUrl,
},
{
translation: {
type: "one_way",
target_language: "fr",
},
language_hints: ["en"],
}
);
const docs = await loader.load();
let originalText = "";
let translatedText = "";
for (const token of docs[0].metadata.tokens) {
if (token.translation_status === "translation") {
translatedText += token.text;
} else {
originalText += token.text;
}
}
console.log(originalText);
console.log(translatedText);
You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about Soniox translation.
Language hints
Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages.
Language hints do not restrict recognition—they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioFileUrl,
},
{
language_hints: ["en", "es"],
}
);
const docs = await loader.load();
For more details, see the Soniox language hints documentation.
Speaker diarization
Enable speaker identification to distinguish between different speakers:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioFileUrl,
},
{
enable_speaker_diarization: true,
}
);
const docs = await loader.load();
// Access speaker information in the metadata
let currentSpeaker = null;
let output = "";
for (const token of docs[0].metadata.tokens) {
if (currentSpeaker !== token.speaker) {
currentSpeaker = token.speaker;
output += `\nSpeaker ${currentSpeaker}: ${token.text.trimStart()}`;
} else {
output += token.text;
}
}
console.log(output);
// Analyze the conversation
const prompt = ChatPromptTemplate.fromTemplate(
`Analyze the following conversation between speakers.
Identify the intent of each speaker.
Conversation:
{conversation}`
);
const chain = prompt
.pipe(new ChatOpenAI({ model: "gpt-5-mini" }))
.pipe(new StringOutputParser());
const analysis = await chain.invoke({ conversation: output });
console.log(analysis);
Language identification
Enable automatic language detection and identification:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioFileUrl,
},
{
enable_language_identification: true,
}
);
Context for improved accuracy
Provide domain-specific context to improve transcription accuracy:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
context: {
general: [
{ key: "industry", value: "healthcare" },
{ key: "meeting_type", value: "consultation" }
],
terms: ["hypertension", "cardiology", "metformin"],
translation_terms: [
{ source: "blood pressure", target: "presión arterial" },
{ source: "medication", target: "medicamento" }
]
}
}
);
For more details, see the Soniox context documentation.
API reference
Constructor parameters
SonioxLoaderParams (required)
| Parameter | Type | Required | Description |
|---|
audio | Uint8Array | string | Yes | Audio file as buffer or URL |
audioFormat | SonioxAudioFormat | No | Audio file format |
apiKey | string | No | Soniox API key (defaults to SONIOX_API_KEY env var) |
apiBaseUrl | string | No | API base URL (defaults to https://api.soniox.com/v1) |
pollingIntervalMs | number | No | Polling interval in ms (min: 1000, default: 1000) |
pollingTimeoutMs | number | No | Polling timeout in ms (default: 180000) |
SonioxLoaderOptions (optional)
| Parameter | Type | Description |
|---|
model | SonioxTranscriptionModelId | Model to use (default: "stt-async-v4") |
translation | object | Translation configuration |
language_hints | string[] | Language hints for transcription |
language_hints_strict | boolean | Enforce strict language hints |
enable_speaker_diarization | boolean | Enable speaker identification |
enable_language_identification | boolean | Enable language detection |
context | object | Context for improved accuracy |
Browse the documentation for a full list of supported options.
aac - Advanced Audio Coding
aiff - Audio Interchange File Format
amr - Adaptive Multi-Rate
asf - Advanced Systems Format
flac - Free Lossless Audio Codec
mp3 - MPEG Audio Layer III
ogg - Ogg Vorbis
wav - Waveform Audio File Format
webm - WebM Audio
Return value
The load() method returns an array containing a single Document object:
type Document {
pageContent: string, // The transcribed text
metadata: SonioxTranscriptResponse // Full transcript with metadata
}
The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.
type SonioxTranscriptResponse = {
id: string;
text?: string | null;
tokens?: SonioxTranscriptToken[] | null;
}
Token type:
type SonioxTranscriptToken = {
text: string;
start_ms?: number | null;
end_ms?: number | null;
confidence?: number | null;
speaker?: number | string | null;
language?: string | null;
translation_status?: string | null;
};
You can learn more about the SonioxTranscriptResponse type in the Soniox REST API Reference.