You are currently on a page documenting the use of AI/ML API models as text completion models. Many of the latest and most popular AI/ML API models are chat completion models.You may be looking for this page instead.
This page helps you get started with AI/ML API text completion models. For detailed documentation of all AimlapiLLM features and configurations, head to the API reference.

Overview

Integration details

ClassPackageLocalSerializableJS supportDownloadsVersion
AimlapiLLMlangchain-aimlapibetaPyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputJSON modeImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

To access AI/ML API models you’ll need to create an account, get an API key, and install the langchain-aimlapi integration package.

Credentials

Head to aimlapi.com to sign up and generate an API key. Once you’ve done this set the AIMLAPI_API_KEY environment variable:
import getpass
import os

if not os.getenv("AIMLAPI_API_KEY"):
    os.environ["AIMLAPI_API_KEY"] = getpass.getpass("Enter your AI/ML API key: ")
To enable automated tracing of your model calls, set your LangSmith API key:
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

Installation

The LangChain AI/ML API integration lives in the langchain-aimlapi package:
%pip install -qU langchain-aimlapi

Instantiation

Now we can instantiate our model object and generate text completions:
from langchain_aimlapi import AimlapiLLM

llm = AimlapiLLM(
    model="gpt-3.5-turbo-instruct",
    temperature=0.5,
    max_tokens=256,
)

Invocation

response = llm.invoke("Explain the bubble sort algorithm in Python.")
print(response)
Bubble sort is a simple sorting algorithm that repeatedly steps through a list, compares adjacent items, and swaps them when they are out of order. The process repeats until the entire list is sorted. While easy to understand and implement, bubble sort is inefficient on large datasets because it has quadratic time complexity.

Streaming invocation

You can also stream responses token-by-token:
llm = AimlapiLLM(
    model="gpt-3.5-turbo-instruct",
)

for chunk in llm.stream("List top 5 programming languages in 2025 with reasons."):
    print(chunk, end="", flush=True)

Chaining

You can easily combine the LLM with a prompt template for structured inputs using LCEL:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | llm

chain.invoke({"topic": "bears"})
"Why do bears have fur coats? Because they'd look silly in sweaters!"

API reference

For detailed documentation of all AimlapiLLM features and configurations head to the API reference.