Vector support
Azure Database for PostgreSQL - Flexible Server enables you to efficiently store and query millions of vector embeddings in PostgreSQL. As well as scale your AI use cases from POC to production:- Provides a familiar SQL interface for querying vector embeddings and relational data.
- Boosts
pgvectorwith a faster and more precise similarity search across 100M+ vectors using DiskANN indexing algorithm. - Simplifies operations by integrating relational metadata, vector embeddings, and time-series data into a single database.
- Leverages the power of the robust PostgreSQL ecosystem and Azure Cloud for enterprise-grade features including replication, and high availability.
Authentication
Azure Database for PostgreSQL - Flexible Server supports password-based as well as Microsoft Entra (formerly Azure Active Directory) authentication. Entra authentication allows you to use Entra identity to authenticate to your PostgreSQL server. This eliminates the need to manage separate usernames and passwords for your database users, and allows you to leverage the same security mechanisms that you use for other Azure services. This guide is set up to use either authentication method. You can configure whether or not to use Entra authentication later in the notebook.Setup
Azure Database for PostgreSQL is based on open-source Postgres. This integration uses the dedicatedlangchain-azure-postgresql package, which provides optimized support including DiskANN indexing and Microsoft Entra authentication.
First download the partner packages:
Enable pgvector
See enablement instructions for Azure Database for PostgreSQL.Set up credentials
You will need your Azure Database for PostgreSQL connection details and add them as environment variables to run this notebook. Set theUSE_ENTRA_AUTH flag to True if you want to use Microsoft Entra authentication. If using Entra authentication, you will only need to supply the host and database name. If using password authentication, youβll also need to set the username and password.
Setup AzureOpenAIEmbeddings
Initialization
Use Microsoft Entra authentication
The following sections demonstrate how to set up LangChain to use Microsoft Entra authentication. The classAzurePGConnectionPool in the LangChain Azure Postgres package retrieves tokens for the Azure Database for PostgreSQL service by using DefaultAzureCredential from the azure.identity library.
The connection can be passed into the connection parameter of the AzurePGVectorStore LangChain vector store.
Sign in to Azure
To log into Azure, ensure you have the Azure CLI installed. You will need to run the following command in your terminal:Password authentication
If youβre not using Microsoft Entra authentication, theBasicAuth class allows the use of username and password:
Creating the Vector Store
Configuring Vector Store parameters
You can override the default parameters for metadata type, embedding dimension, index type, and more when initializingAzurePGVectorStore. This allows you to tailor the vector store to your specific use case and data.
Key configuration options:
metadata_column_type: The type of the metadata column (default:'jsonb'). Set to'jsonb','text', etc.embedding_column_type: The type of the embedding column (default:'vector').embedding_dimension: The dimension of your embedding vectors (default:1536).embedding_index_type: The index type for vector search (default:'DiskANN'). Other options may include'ivfflat','hnsw', etc.embedding_index_opclass: The operator class for the index (default:'vector_cosine_ops').
Initialize the DiskANN Vector index for more efficient vector search
DiskANN is a scalable approximate nearest neighbor search algorithm for efficient vector search at any scale. It offers high recall, high queries per second, and low query latency, even for billion-point datasets. Those characteristics make it a powerful tool for handling large volumes of data.Manage vector store
Add items
Note that adding documents by ID will over-write any existing documents that match that ID.Update items
Retrieve items
Delete items
Query vector store
After you create your vector store and add the relevant documents, you can query the vector store in your chain or agent.Filtering
The vector store supports a set of filters that can be applied against the metadata fields of the documents via theFilterCondition, OrFilter, and AndFilter in the LangChain Azure PostgreSQL package:
| Operator | Meaning/Category |
|---|---|
= | Equality |
!= | Inequality |
< | Less than |
<= | Less than or equal |
> | Greater than |
>= | Greater than or equal |
in | Special cased (in) |
not in | Special cased (not in) |
is null | Special cased (is null) |
is not null | Special cased (is not null) |
between | Special cased (between) |
not between | Special cased (not between) |
like | Text (like) |
ilike | Text (case-insensitive like) |
AND | Logical (and) |
OR | Logical (or) |
Query directly
Performing a simple similarity search can be done as follows:AND filters, here is an example:
Query by turning into retriever
You can also transform the vector store into a retriever for easier usage in your chains.AzurePGVectorStore vector store, please refer to the documentation.
Usage for retrieval-augmented generation
For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:- Tutorials: working with external knowledge
- How-to: Question and answer with RAG
- Retrieval conceptual docs