This guide helps you get started with AI/ML API chat models. For detailed documentation of all ChatAimlapi features and configurations, head to the API reference. AI/ML API provides unified access to hundreds of hosted foundation models with high availability and throughput.

Overview

Integration details

ClassPackageLocalSerializableJS supportDownloadsVersion
ChatAimlapilangchain-aimlapibetaPyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputJSON modeImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

To access AI/ML API models you’ll need to create an account, get an API key, and install the langchain-aimlapi integration package.

Credentials

Head to aimlapi.com to sign up and generate an API key. Once you’ve done this set the AIMLAPI_API_KEY environment variable:
import getpass
import os

if not os.getenv("AIMLAPI_API_KEY"):
    os.environ["AIMLAPI_API_KEY"] = getpass.getpass("Enter your AI/ML API key: ")
To enable automated tracing of your model calls, set your LangSmith API key:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

The LangChain AI/ML API integration lives in the langchain-aimlapi package:
%pip install -qU langchain-aimlapi

Instantiation

Now we can instantiate our model object and generate chat completions:
from langchain_aimlapi import ChatAimlapi

llm = ChatAimlapi(
    model="meta-llama/Llama-3-70b-chat-hf",
    temperature=0.7,
    max_tokens=512,
    timeout=30,
    max_retries=3,
)

Invocation

messages = [
    ("system", "You are a helpful assistant that translates English to French."),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 23, 'total_tokens': 32}, 'model_name': 'meta-llama/Llama-3-70b-chat-hf'}, id='run-...')
print(ai_msg.content)
J'adore la programmation.

Streaming invocation

You can also stream responses token-by-token:
for chunk in llm.stream("List top 5 programming languages in 2025 with reasons."):
    print(chunk.content, end="", flush=True)

Chaining

We can chain our model with a prompt template like so:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
response = chain.invoke(
    {
        "input_language": "English",
        "output_language": "German",
        "input": "I love programming.",
    }
)
response
AIMessage(content='Ich liebe das Programmieren.', response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 32, 'total_tokens': 44}, 'model_name': 'meta-llama/Llama-3-70b-chat-hf'}, id='run-...')

API reference

For detailed documentation of all ChatAimlapi features and configurations head to the API reference.