BigtableVectorStore
This guide covers theBigtableVectorStore integration for using Google Cloud Bigtable as a vector store.
Bigtable is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.
Overview
TheBigtableVectorStore uses Google Cloud Bigtable to store documents and their vector embeddings for similarity search and retrieval. It supports powerful metadata filtering to refine search results.
Integration details
| Class | Package | Local | JS support | Package downloads | Package latest |
|---|---|---|---|---|---|
| BigtableVectorStore | langchain-google-bigtable | ❌ | ❌ |
Setup
Prerequisites
To get started, you will need a Google Cloud project with an active Bigtable instance.Installation
The integration is in thelangchain-google-bigtable package. The command below also installs langchain-google-vertexai to use for an embedding service.
Set Your Google Cloud Project
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook. If you don’t know your project ID, try the following:- Run
gcloud config list. - Run
gcloud projects list. - See the support page: Locate the project ID.
🔐 Authentication
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.- If you are using Colab to run this notebook, use the cell below and continue.
- If you are using Vertex AI Workbench, check out the setup instructions here.
Initialization
Initializing theBigtableVectorStore involves three steps: setting up the embedding service, ensuring the Bigtable table is created, and configuring the store’s parameters.
1. Set up Embedding Service
First, we need a model to create the vector embeddings for our documents. We’ll use a Vertex AI model for this example.2. Initialize a Table
Before creating aBigtableVectorStore, a table with the correct column families must exist. The init_vector_store_table helper function is the recommended way to create and configure a table. If the table already exists, it will do nothing.
3. Configure the Vector Store
Now we define the parameters that control how the vector store connects to Bigtable and how it handles data.The BigtableEngine
ABigtableEngine object manages clients and async operations. It is highly recommended to initialize a single engine and reuse it across multiple stores for better performance and resource management.
Collections
Acollection provides a logical namespace for your documents within a single Bigtable table. It is used as a prefix for the row keys, allowing multiple vector stores to coexist in the same table without interfering with each other.
Metadata Configuration
When creating aBigtableVectorStore, you have two optional parameters for handling metadata:
metadata_mappings: This is a list ofVectorMetadataMappingobjects. You must define a mapping for any metadata key you wish to use for filtering in your search queries. Each mapping specifies the data type (encoding) for the metadata field, which is crucial for correct filtering.metadata_as_json_column: This is an optionalColumnConfigthat tells the store to save the entire metadata dictionary as a single JSON string in a specific column. This is useful for efficiently retrieving all of a document’s metadata at once, including fields not defined inmetadata_mappings. Note: Fields stored only in this JSON column cannot be used for filtering.
4. Create the BigtableVectorStore Instance
Manage vector store
Add Documents
You can add documents with pre-defined IDs. If aDocument is added without an id attribute, the vector store will automatically generate a uuid4 string for it.
Update Documents
BigtableVectorStore handles updates by overwriting. To update a document, simply add it again with the same ID but with new content or metadata.
Delete Documents
Query vector store
Search
Search with Filters
Apply filters before the vector search runs.The kNN Search Algorithm and Filtering
By default,BigtableVectorStore uses a k-Nearest Neighbors (kNN) search algorithm to find the k vectors in the database that are most similar to your query vector. The vector store offers filtering to reduce the search space before the kNN search is performed, which can make queries faster and more relevant.
Configuring Queries with QueryParameters
All search settings are controlled via the QueryParameters object. This object allows you to specify not only filters but also other important search aspects:
algorithm: The search algorithm to use. Defaults to"kNN".distance_strategy: The metric used for comparison, such asCOSINE(default) orEUCLIDEAN.vector_data_type: The data type of the stored vectors, likeFLOAT32orDOUBLE64. This should match the precision of your embeddings.filters: A dictionary defining the filtering logic to apply.
Understanding Encodings
To filter on metadata fields, you must define them inmetadata_mappings with the correct encoding so Bigtable can properly interpret the data. Supported encodings include:
- String:
UTF8,UTF16,ASCIIfor text-based metadata. - Numeric:
INT_BIG_ENDIANorINT_LITTLE_ENDIANfor integers, andFLOATorDOUBLEfor decimal numbers. - Boolean:
BOOLfor true/false values.
Filtering Support Table
| Filter Category | Key / Operator | Meaning |
|---|---|---|
| Row Key | RowKeyFilter | Narrows search to document IDs with a specific prefix. |
| Metadata Key | ColumnQualifiers | Checks for the presence of one or more exact metadata keys. |
ColumnQualifierPrefix | Checks if a metadata key starts with a given prefix. | |
ColumnQualifierRegex | Checks if a metadata key matches a regular expression. | |
| Metadata Value | ColumnValueFilter | Container for all value-based conditions. |
== | Equality | |
!= | Inequality | |
> | Greater than | |
< | Less than | |
>= | Greater than or equal | |
<= | Less than or equal | |
in | Value is in a list. | |
nin | Value is not in a list. | |
contains | Checks for substring presence. | |
like | Performs a regex match on a string. | |
| Logical | ColumnValueChainFilter | Logical AND for combining value conditions. |
ColumnValueUnionFilter | Logical OR for combining value conditions. |
Complex Filter Example
This example uses multiple nested logical filters. It searches for documents that are either (category is ‘sci-fi’ AND year between 1970-2000) OR (author is ‘J.R.R. Tolkien’) OR (rating > 4.5).
Search with score
You can also retrieve the distance score along with the documents.Use as Retriever
The vector store can be easily used as a retriever in RAG applications. You can specify the search type (e.g.,similarity or mmr) and pass search-time arguments like k and query_parameters.
Usage for retrieval-augmented generation
For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:API reference
For full details on theBigtableVectorStore class, see the source code on GitHub.