langchain-azure-ai package, are exported from langchain_azure_ai.tools, and can be instantiated individually or loaded together with AzureAIServicesToolkit.
Use these tools when you want LangChain agents to analyze documents, images, or healthcare text with Azure-managed services.
Overview
| Tool | Description |
|---|---|
AzureAIContentUnderstandingTool | Extract structured content from documents, images, audio, and video. |
AzureAIDocumentIntelligenceTool | Parse documents into OCR text, tables, and key-value pairs. |
AzureAIImageAnalysisTool | Run OCR, captions, tagging, object detection, and related image analysis. |
AzureAISpeechToTextTool | Transcribe audio files to text with language support. |
AzureAITextToSpeechTool | Convert text to synthesized speech audio with multi-language support. |
AzureAITextAnalyticsHealthTool | Extract medical entities from healthcare text. |
Features
- Shared authentication and endpoint handling across all service tools.
- Support for Azure AI Foundry project endpoints and direct service endpoints.
- Individual tools for multimodal extraction, document parsing, image analysis, speech-to-text transcription, text-to-speech synthesis, and healthcare text analysis.
AzureAIServicesToolkitfor loading all service tools at once.- Automatic audio source detection (local files and remote URLs) for transcription.
- Multi-language speech recognition and synthesis with BCP-47 language codes.
- WAV file generation for synthesized speech output.
Setup
Install the integration package, configure either an Azure AI Foundry project endpoint or a direct Azure AI Services endpoint, and provide a credential.Installation
Install the package with thetools extra:
azure-ai-documentintelligence, azure-ai-vision-imageanalysis, azure-cognitiveservices-speech, and azure-ai-textanalytics. The base package includes azure-ai-contentunderstanding.
Credentials
Pass eitherDefaultAzureCredential() or an API-key string through the credential argument. If you use a Foundry project endpoint, use a Microsoft Entra ID credential such as DefaultAzureCredential().
Initialize credential
Configure endpoints
The service tools support two endpoint styles:- An Azure AI Foundry project endpoint via
project_endpointorAZURE_AI_PROJECT_ENDPOINT - A direct Azure AI Services endpoint via
endpointorAZURE_AI_INFERENCE_ENDPOINT
project_endpoint because it resolves the backing service endpoint automatically for Foundry-based workflows.
Configure endpoint
Instantiate a tool
IfAZURE_AI_PROJECT_ENDPOINT is already set, you can usually omit project_endpoint during instantiation.
Initialize tool
Use with an agent
Pass one or more tools tocreate_agent.
Agent with tools
Use the toolkit
UseAzureAIServicesToolkit to get all services tools with a shared credential and endpoint configuration.
Tools
AzureAIContentUnderstandingTool
AzureAIContentUnderstandingTool extracts structured content from documents, images, audio, and video. It returns markdown-like extracted content and can also surface structured fields from the selected analyzer.
The tool defaults to analyzer_id="prebuilt-documentSearch". You can switch analyzers for other modalities, such as prebuilt-audioSearch or prebuilt-videoSearch, and you can provide model_deployments when your analyzer depends on custom model deployment names.
Configuration options
Configuration options
The input to analyze. Pass a public URL, local file path, or base64-encoded payload.
Controls how the tool interprets
source.The Content Understanding analyzer to run.
Optional mapping from model names to deployment names when a custom analyzer needs them.
AzureAIDocumentIntelligenceTool
AzureAIDocumentIntelligenceTool extracts OCR text, tables, and key-value pairs from documents. It is a good fit for invoices, forms, receipts, contracts, and other document-heavy workflows where the agent needs structured output instead of raw text only.
The tool defaults to model_id="prebuilt-layout". Its public input schema accepts url, path, and base64 sources.
Configuration options
Configuration options
AzureAIImageAnalysisTool
AzureAIImageAnalysisTool analyzes images and returns a JSON-formatted summary with captions, OCR text, tags, objects, people, and smart crops when those features are enabled.
By default, the tool enables a broad set of visual features, including TAGS, OBJECTS, CAPTION, DENSE_CAPTIONS, READ, SMART_CROPS, and PEOPLE.
Configuration options
Configuration options
The image input. Pass a public URL, local file path, or base64-encoded payload.
Controls how the tool interprets
source.Optional list of image-analysis features to request. If omitted, the tool uses its default feature set.
AzureAITextAnalyticsHealthTool
AzureAITextAnalyticsHealthTool extracts healthcare entities from medical text. It is useful for clinical notes, patient summaries, intake forms, and research workflows where the agent needs medical entities rather than free-form summarization.
The tool accepts a plain-text query and can be configured with optional language and country_hint defaults.
Configuration options
Configuration options
AzureAISpeechToTextTool
AzureAISpeechToTextTool transcribes audio files to text using the Azure AI Speech service. It supports a wide range of audio formats and can handle both local files and remote audio URLs.
The tool automatically detects whether the input is a local file path or a remote URL, and handles downloading remote files as needed. It is useful for workflows where the agent needs to convert spoken audio into written text.
Configuration options
Configuration options
Path to a local audio file or a URL pointing to an audio file. Supports WAV, MP3, OGG, FLAC, and other common audio formats.
The language of the speech in BCP-47 format (e.g.,
"en-US", "es-ES", "fr-FR"). Defaults to "en-US".The Azure AI Speech service endpoint. For example,
https://eastus.api.cognitive.microsoft.com/. Can be set via AZURE_AI_INFERENCE_ENDPOINT environment variable or resolved from AZURE_AI_PROJECT_ENDPOINT.The credentials to use. Either a subscription key string or any
TokenCredential such as DefaultAzureCredential.AzureAITextToSpeechTool
AzureAITextToSpeechTool converts text to spoken audio using the Azure AI Speech service. It synthesizes the provided text and returns a local WAV audio file path containing the synthesized speech.
The tool supports multi-language synthesis with BCP-47 language codes and is useful for workflows where the agent needs to generate audio narration, voice-over content, or audio notifications from text.
Configuration options
Configuration options
The text to convert to speech.
The language of the synthesized speech in BCP-47 format (e.g.,
"en-US", "es-ES", "fr-FR"). Defaults to "en-US".The Azure AI Speech service endpoint. For example,
https://eastus.api.cognitive.microsoft.com/. Can be set via AZURE_AI_INFERENCE_ENDPOINT environment variable or resolved from AZURE_AI_PROJECT_ENDPOINT.The credentials to use. Either a subscription key string or any
TokenCredential such as DefaultAzureCredential.API reference
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

