Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Kinetica is a database with integrated support for text-to-SQL generation.
This notebook demonstrates how to use Kinetica to transform natural language into SQL
and simplify the process of data retrieval. This demo is intended to show the text generation workflow as opposed to the capabilities of the LLM.
Overview
With the Kinetica LLM workflow you create an LLM context in the database that provides
information needed for infefencing that includes tables, annotations, rules, and
samples. Invoking ChatKinetica.load_messages_from_context() will retrieve the
context information from the database so that it can be used to create a chat prompt.
The chat prompt consists of a SystemMessage and pairs of
HumanMessage/AIMessage that contain the samples which are question/SQL
pairs. You can append pairs samples to this list but it is not intended to
facilitate a typical natural language conversation.
When you create a chain from the chat prompt and execute it, the Kinetica LLM will
generate SQL from the input. Optionally you can use KineticaSqlOutputParser to
execute the SQL and return the result as a dataframe.
Currently, 2 LLM’s are supported for SQL generation:
- Kinetica SQL-GPT: This LLM is based on OpenAI ChatGPT API.
- Kinetica SqlAssist: This LLM is purpose built to integrate with the Kinetica
database and it can run in a secure customer premise.
For this demo we will be using SqlAssist. See the Kinetica Documentation
site for more information.
Prerequisites
To get started you will need a Kinetica DB instance. If you don’t have one you can
obtain a free development instance.
You will need to install the following packages…
pip install -qU langchain-kinetica faker
Database connection
You must set the database connection in the following environment variables. If you are using a virtual environment you can set them in the .env file of the project:
KINETICA_URL: Database connection URL (e.g. http://localhost:9191)
KINETICA_USER: Database user
KINETICA_PASSWD: Secure password.
If you can create an instance of KineticaChatLLM then you are successfully connected.
from langchain_kinetica import ChatKinetica
kinetica_llm = ChatKinetica()
# Test table we will create
table_name = "demo.user_profiles"
# LLM Context we will create
kinetica_ctx = "demo.test_llm_ctx"
2026-02-02 19:39:09.975 INFO [GPUdb] Connected to Kinetica! (host=http://localhost:19191 api=7.2.3.3 server=7.2.3.5)
Create test data
Before we can generate SQL we will need to create a Kinetica table and an LLM context that can inference the table.
Create some fake user profiles
We will use the faker package to create a dataframe with 100 fake profiles.
from collections.abc import Generator
import pandas as pd
from faker import Faker
Faker.seed(5467)
faker = Faker(locale="en-US")
def profile_gen(count: int) -> Generator:
for p_id in range(count):
rec = dict(id=p_id, **faker.simple_profile())
rec["birthdate"] = pd.Timestamp(rec["birthdate"])
yield rec
load_df = pd.DataFrame.from_records(data=profile_gen(100), index="id")
print(load_df.head())
username name sex \
id
0 eduardo69 Haley Beck F
1 lbarrera Joshua Stephens M
2 bburton Paula Kaiser F
3 melissa49 Wendy Reese F
4 melissacarter Manuel Rios M
address mail \
id
0 59836 Carla Causeway Suite 939\nPort Eugene, I... meltondenise@yahoo.com
1 3108 Christina Forges\nPort Timothychester, KY... erica80@hotmail.com
2 Unit 7405 Box 3052\nDPO AE 09858 timothypotts@gmail.com
3 6408 Christopher Hill Apt. 459\nNew Benjamin, ... dadams@gmail.com
4 2241 Bell Gardens Suite 723\nScottside, CA 38463 williamayala@gmail.com
birthdate
id
0 1999-08-22
1 1926-04-17
2 1935-08-19
3 1990-07-10
4 1932-11-30
Create a kinetica table from the dataframe
from gpudb import GPUdbTable
gpudb_table = GPUdbTable.from_df(
load_df,
db=kinetica_llm.kdbc,
table_name=table_name,
clear_table=True,
load_data=True,
)
# See the Kinetica column types
print(gpudb_table.type_as_df())
name type properties
0 username string [char32]
1 name string [char32]
2 sex string [char2]
3 address string [char64]
4 mail string [char32]
5 birthdate long [timestamp]
Create the LLM context
You can create an LLM Context using the Kinetica Workbench UI or you can manually create it with the CREATE OR REPLACE CONTEXT syntax.
Here we create a context from the SQL syntax referencing the table we created.
from gpudb import GPUdbSamplesClause, GPUdbSqlContext, GPUdbTableClause
table_ctx = GPUdbTableClause(table=table_name, comment="Contains user profiles.")
samples_ctx = GPUdbSamplesClause(
samples=[
(
"How many users born after 1970 are there?",
f"""
select count(1) as num_users
from {table_name}
where birthdate > '1970-01-01';
""",
)
]
)
context_sql = GPUdbSqlContext(
name=kinetica_ctx, tables=[table_ctx], samples=samples_ctx
).build_sql()
print(context_sql)
count_affected = kinetica_llm.kdbc.execute(context_sql)
count_affected
CREATE OR REPLACE CONTEXT "demo"."test_llm_ctx" (
TABLE = "demo"."user_profiles",
COMMENT = 'Contains user profiles.'
),
(
SAMPLES = (
'How many users born after 1970 are there?' = 'select count(1) as num_users
from demo.user_profiles
where birthdate > ''1970-01-01'';' )
)
1
Use LangChain for inferencing
In the example below we will create a chain from the previously created table and LLM context. This chain will generate SQL and return the resulting data as a dataframe.
Load the chat prompt from the kinetica DB
The load_messages_from_context() function will retrieve a context from the DB and convert it into a list of chat messages that we use to create a ChatPromptTemplate.
from langchain_core.prompts import ChatPromptTemplate
# load the context from the database
ctx_messages = kinetica_llm.load_messages_from_context(kinetica_ctx)
# Add the input prompt. This is where input question will be substituted.
ctx_messages.append(("human", "{input}"))
# Create the prompt template.
prompt_template = ChatPromptTemplate.from_messages(ctx_messages)
print(prompt_template.pretty_repr())
================================ System Message ================================
CREATE TABLE demo.user_profiles AS
(
username VARCHAR (32) NOT NULL,
name VARCHAR (32) NOT NULL,
sex VARCHAR (2) NOT NULL,
address VARCHAR (64) NOT NULL,
mail VARCHAR (32) NOT NULL,
birthdate TIMESTAMP NOT NULL
);
COMMENT ON TABLE demo.user_profiles IS 'Contains user profiles.';
================================ Human Message =================================
How many users born after 1970 are there?
================================== Ai Message ==================================
select count(1) as num_users
from demo.user_profiles
where birthdate > '1970-01-01';
================================ Human Message =================================
{input}
Create the chain
The last element of this chain is KineticaSqlOutputParser that will execute the SQL and return a dataframe. This is optional and if we left it out then only SQL would be returned.
from langchain_kinetica import (
KineticaSqlOutputParser,
KineticaSqlResponse,
)
chain = prompt_template | kinetica_llm | KineticaSqlOutputParser(kdbc=kinetica_llm.kdbc)
Generate the SQL
The chain we created will take a question as input and return a KineticaSqlResponse containing the generated SQL and data. The question must be relevant to the to LLM context we used to create the prompt.
# Here you must ask a question relevant to the LLM context provided in the
# prompt template.
response: KineticaSqlResponse = chain.invoke(
{"input": "What users were born after 1990?"}
)
print(f"SQL: {response.sql}")
print(response.dataframe.head())
SQL: SELECT *
FROM demo.user_profiles
WHERE birthdate > '1990-01-01';
username name sex \
0 eduardo69 Haley Beck F
1 melissa49 Wendy Reese F
2 james26 Patricia Potter F
3 mooreandrew Wendy Ramirez F
4 melissabutler Alexa Kelly F
address mail \
0 59836 Carla Causeway Suite 939\nPort Eugene, I... meltondenise@yahoo.com
1 6408 Christopher Hill Apt. 459\nNew Benjamin, ... dadams@gmail.com
2 7977 Jonathan Meadow\nJerryside, OH 55205 jpatrick@gmail.com
3 8089 Gonzalez Fields\nJordanville, KS 22824 mathew05@hotmail.com
4 1904 Burke Roads\nPort Anne, DE 81252 douglas38@yahoo.com
birthdate
0 1999-08-25
1 1990-07-13
2 2010-03-21
3 2000-03-25
4 2023-02-01