Replicate runs machine learning models in the cloud. We have a library of open-source models that you can run with a few lines of code. If you’re building your own machine learning models, Replicate makes it easy to deploy them at scale.This example goes over how to use LangChain to interact with
Replicate models
Setup
Calling a model
Find a model on the replicate explore page, and then paste in the model name and version in this format: model_name/version. For example, here isMeta Llama 3.
replicate/dolly-v2-12b:ef0e1aefc61f8e096ebe4db6b2bacc297daf2ef6899f0f7e001ec445893500e5
Only the model param is required, but we can add other model params when initializing.
For example, if we were running stable diffusion and wanted to change the image dimensions:
Streaming response
You can optionally stream the response as it is produced, which is helpful to show interactivity to users for time-consuming generations. See detailed docs on Streaming for more information.Stop sequences
You can also specify stop sequences. If you have a definite stop sequence for the generation that you are going to parse with anyway, it is better (cheaper and faster!) to just cancel the generation once one or more stop sequences are reached, rather than letting the model ramble on till the specifiedmax_length. Stop sequences work regardless of whether you are in streaming mode or not, and Replicate only charges you for the generation up until the stop sequence.
Chaining calls
The whole point of langchain is to… chain! Here’s an example of how do that.Connect these docs to Claude, VSCode, and more via MCP for real-time answers.