Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Bodo DataFrames is a high performance DataFrame library
for large scale Python data processing and drop-in replacement for Pandas; simply replace:
with:
to automatically scale and accelerate Pandas workloads.
Since Bodo DataFrames is compatible with Pandas, it is an ideal target for LLM code generation
that’s easy to verify, efficient, and scalable beyond the typical limitations of Pandas.
Our integration package provides a toolkit for asking agents questions about large datasets
using Bodo DataFrames for efficiency and scalability.
Under the hood, Bodo DataFrames uses lazy evaluation to optimize sequences of Pandas operations,
streams data through operators to enable processing larger-than-memory datasets, and
leverages MPI-based high-performance computing technology for efficient parallel execution that can
easily scale from laptop to large cluster.
Installation and setup
pip install -U langchain_bodo
The langchain-bodo package provides functionality for creating agents that can answer questions about large datasets using Bodo DataFrames.
See the Bodo DataFrames tools page for more detailed usage examples.
NOTE: This feature uses the Python agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.
from langchain_bodo import create_bodo_dataframes_agent
Usage example
Before running the code below, copy the titanic dataset
and save locally as titanic.csv.
import bodo.pandas as pd
from langchain_openai import OpenAI
df = pd.read_csv("titanic.csv")
agent = create_bodo_dataframes_agent(
OpenAI(temperature=0), df, verbose=True, allow_dangerous_code=True
)
agent.invoke("how many rows are there?")
> Entering new AgentExecutor chain...
Thought: I can use the len() function to get the number of rows in the dataframe.
Action: python_repl_ast
Action Input: len(df)891891 is the number of rows in the dataframe.
Final Answer: 891
> Finished chain.
{'input': 'how many rows are there?', 'output': '891'}