PGVectorStore
is an implementation of a LangChain vectorstore using postgres
as the backend.
This notebook goes over how to use the PGVectorStore
API.
The code lives in an integration package called: langchain-postgres.
pgvector
extension.
You can run the following command to spin up a container for a pgvector
enabled Postgres instance:
langchain-postgres
.
PGEngine
object. The PGEngine
configures a shared connection pool to your Postgres database. This is an industry best practice to manage number of connections and to reduce latency through cached database connections.
PGVectorStore
can be used with the asyncpg
and psycopg3
drivers.
To create a PGEngine
using PGEngine.from_connection_string()
you need to provide:
url
: Connection string using the postgresql+asyncpg
driver.PGEngine
using PGEngine.from_engine()
you need to provide:
engine
: An object of AsyncEngine
PGVectorStore
class requires a database table. The PGEngine
engine has a helper method ainit_vectorstore_table()
that can be used to create a table with the proper schema for you.
See Create a custom Vector Store or Create a Vector Store using existing table for customizing the schema.
schema_name
wherever you pass table_name
. Eg:
ainit_vectorstore_table
content_column
, embedding_column
,metadata_columns
, metadata_json_column
, id_column
to rename the columns.Column
class to create custom id or metadata columns. A Column is defined by a name and data type. Any Postgres data type can be used.store_metadata
to create a JSON column to store extra metadata.PGVectorStore
content_column
, embedding_column
,metadata_columns
, metadata_json_column
, id_column
to rename the columns.ignore_metadata_columns
to ignore columns that should not be used for Document metadata. This is helpful when using a preexisting table, where all data columns are not necessary.distance_strategy
for the similarity calculation during vector search.index_query_options
to tune local index parameters during vector search.PGVectorStore
currently supports the following operators and all Postgres data types.
Operator | Meaning/Category |
---|---|
$eq | Equality (==) |
$ne | Inequality (!=) |
$lt | Less than (<) |
$lte | Less than or equal (<=) |
$gt | Greater than (>) |
$gte | Greater than or equal (>=) |
$in | Special Cased (in) |
$nin | Special Cased (not in) |
$between | Special Cased (between) |
$exists | Special Cased (is null) |
$like | Text (like) |
$ilike | Text (case-insensitive like) |
$and | Logical (and) |
$or | Logical (or) |
products
, which stores product details for an eComm venture.
Here is how this table mapped to PGVectorStore
:
id_column="product_id"
: ID column uniquely identifies each row in the products table.
content_column="description"
: The description
column contains text descriptions of each product. This text is used by the embedding_service
to create vectors that go in embedding_column and represent the semantic meaning of each description.
embedding_column="embed"
: The embed
column stores the vectors created from the product descriptions. These vectors are used to find products with similar descriptions.
metadata_columns=["name", "category", "price_usd", "quantity", "sku", "image_url"]
: These columns are treated as metadata for each product. Metadata provides additional information about a product, such as its name, category, price, quantity available, SKU (Stock Keeping Unit), and an image URL. This information is useful for displaying product details in search results or for filtering and categorization.
metadata_json_column="metadata"
: The metadata
column can store any additional information about the products in a flexible JSON format. This allows for storing varied and complex data that doesn’t fit into the standard columns.
embed
column is newly created or has different dimensions than supported by embedding model, it is required to one-time add the embeddings for the old records, like this:
ALTER TABLE products ADD COLUMN embed vector(768) DEFAULT NULL
VectorStore
embeddings are automatically generated.