Fireworks integration

You are currently on a page documenting the use of Fireworks models as text completion models. Many popular Fireworks models are chat completion models.You may be looking for this page instead.

Fireworks accelerates product development on generative AI by creating an innovative AI experiment and production platform.

This example goes over how to use LangChain to interact with Fireworks models.

Overview

Integration details

Class	Package	Local	Serializable	JS support	Downloads	Version
`Fireworks`	`langchain-fireworks`	❌	❌	✅

Setup

Credentials

Sign in to Fireworks AI for the an API Key to access our models, and make sure it is set as the FIREWORKS_API_KEY environment variable. 3. Set up your model using a model id. If the model is not set, the default model is fireworks-llama-v2-7b-chat. See the full, most up-to-date model list on fireworks.ai.

import getpass
import os

if "FIREWORKS_API_KEY" not in os.environ:
    os.environ["FIREWORKS_API_KEY"] = getpass.getpass("Fireworks API Key:")

Installation

You need to install the langchain-fireworks python package for the rest of the notebook to work.

pip install -qU langchain-fireworks

Instantiation

from langchain_fireworks import Fireworks

# Initialize a Fireworks model
llm = Fireworks(
    model="accounts/fireworks/models/llama-v3p1-8b-instruct", # Model library in: https://app.fireworks.ai/models
    base_url="https://api.fireworks.ai/inference/v1/completions",
)

Invocation

You can call the model directly with string prompts to get completions.

output = llm.invoke("Who's the best quarterback in the NFL?")
print(output)

  That's an easy one. It's Aaron Rodgers. Rodgers has consistently been one

Invoking with multiple prompts

# Calling multiple prompts
output = llm.generate(
    [
        "Who's the best cricket player in 2016?",
        "Who's the best basketball player in the league?",
    ]
)
print(output.generations)

[[Generation(text=' You could choose one of the top performers in 2016, such as Vir')], [Generation(text=' -- Keith Jackson\nA: LeBron James, Chris Paul and Kobe Bryant are the')]]

Invoking with additional parameters

# Setting additional parameters: temperature, max_tokens, top_p
llm = Fireworks(
    model="accounts/fireworks/models/llama-v3p1-8b-instruct",
    temperature=0.7,
    max_tokens=15,
    top_p=1.0,
)
print(llm.invoke("What's the weather like in Kansas City in December?"))

December is a cold month in Kansas City, with temperatures of

Chaining

You can use the LangChain Expression Language to create a simple chain with non-chat models.

from langchain_core.prompts import PromptTemplate
from langchain_fireworks import Fireworks

llm = Fireworks(
    model="accounts/fireworks/models/llama-v3p1-8b-instruct",
    temperature=0.7,
    max_tokens=15,
    top_p=1.0,
)
prompt = PromptTemplate.from_template("Tell me a joke about {topic}?")
chain = prompt | llm

print(chain.invoke({"topic": "bears"}))

 What do you call a bear with no teeth? A gummy bear!

Streaming

You can stream the output, if you want.

for token in chain.stream({"topic": "bears"}):
    print(token, end="", flush=True)

 Why do bears hate shoes so much? They like to run around in their

API reference

For detailed documentation of all Fireworks LLM features and configurations head to the API reference

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Popular Providers

Integrations by component

Overview

Integration details

Setup

Credentials

Installation

Instantiation

Invocation

Invoking with multiple prompts

Invoking with additional parameters

Chaining

Streaming

API reference

Popular Providers

Integrations by component

Documentation Index

​Overview

​Integration details

​Setup

​Credentials

​Installation

​Instantiation

​Invocation

​Invoking with multiple prompts

​Invoking with additional parameters

​Chaining

​Streaming

​API reference

Overview

Integration details

Setup

Credentials

Installation

Instantiation

Invocation

Invoking with multiple prompts

Invoking with additional parameters

Chaining

Streaming

API reference