SAP HANA Cloud Vector Engine is a vector store fully integrated into the SAP HANA Cloud database.
Setup
Install thelangchain-hana external integration package, as well as the other packages used throughout this notebook.
Credentials
Ensure your SAP HANA instance is running. Load your credentials from environment variables and create a connection:Initialization
To initialize aHanaDB vector store, you need a database connection and an embedding instance. SAP HANA Cloud Vector Engine supports both external and internal embeddings.
-
Using external embeddings
-
Using internal embeddings
VECTOR_EMBEDDING() function. To enable this, create an instance of HanaInternalEmbeddings with your internal model ID and pass it to HanaDB. Note that the HanaInternalEmbeddings instance is specifically designed for use with HanaDB and is not intended for use with other vector store implementations. For more information about internal embedding, see the SAP HANA VECTOR_EMBEDDING Function.
Caution: Ensure NLP is enabled in your SAP HANA Cloud instance.
HanaDB along with a table name for storing vectors:
Manage vector store
Once you have created your vector store, we can interact with it by adding and deleting different items.Add items to vector store
We can add items to our vector store by using theadd_documents function.
Delete items from vector store
Query vector store
Query directly
Similarity search
Performing a simple similarity search with filtering on metadata can be done as follows:MMR search
Performing a Maximal Marginal Relevance (MMR) with filtering on metadata search can be done as follows:Query by turning into retriever
You can also transform the vector store into a retriever for easier usage in your chains.Distance similarity algorithm
HanaDB supports the following distance similarity algorithms:
- Cosine Similarity (default)
- Euclidian Distance (L2)
HanaDB instance by using the distance_strategy parameter.
Creating a HNSW index
A vector index can significantly speed up top-k nearest neighbor queries for vectors. Users can create a Hierarchical Navigable Small World (HNSW) vector index using thecreate_hnsw_index function.
For more information about creating an index at the database level, please refer to the official documentation.
Advanced filtering
In addition to the basic value-based filtering capabilities, it is possible to use more advanced filtering. The table below shows the available filter operators.| Operator | Semantic |
|---|---|
$eq | Equality (==) |
$ne | Inequality (!=) |
$lt | Less than (<) |
$lte | Less than or equal (<=) |
$gt | Greater than (>) |
$gte | Greater than or equal (>=) |
$in | Contained in a set of given values (in) |
$nin | Not contained in a set of given values (not in) |
$between | Between the range of two boundary values |
$like | Text equality based on the “LIKE” semantics in SQL (using ”%” as wildcard) |
$contains | Filters documents containing a specific keyword |
$and | Logical “and”, supporting two or more operands |
$or | Logical “or”, supporting two or more operands |
$ne, $gt, $gte, $lt, $lte
$between, $in, $nin
$like
$contains
$and, $or
Usage for retrieval-augmented generation
For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:Standard tables vs. “custom” tables with vector data
As default behaviour, the table for the embeddings is created with 3 columns:- A column
VEC_TEXT, which contains the text of the Document - A column
VEC_META, which contains the metadata of the Document - A column
VEC_VECTOR, which contains the embeddings-vector of the Document’s text
- A column with type
NCLOBorNVARCHARfor the text/context of the embeddings - A column with type
NCLOBorNVARCHARfor the metadata - A column with type
REAL_VECTORorHALF_VECTORfor the embedding vector
Filter performance optimization with custom columns
To allow flexible metadata values, all metadata is stored as JSON in the metadata column by default. If some of the used metadata keys and value types are known, they can be stored in additional columns instead by creating the target table with the key names as column names and passing them to the HanaDB constructor via thespecific_metadata_columns list. Metadata keys that match those values are copied into the special column during insert. Filters use the special columns instead of the metadata JSON column for keys in the specific_metadata_columns list.
A simple example
Load the sample document “state_of_the_union.txt” and create chunks from it.Maximal marginal relevance search (MMR)
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. The first 20 (fetch_k) items will be retrieved from the DB. The MMR algorithm will then find the best 2 (k) matches.
Creating an HNSW Vector Index
A vector index can significantly speed up top-k nearest neighbor queries for vectors. Users can create a Hierarchical Navigable Small World (HNSW) vector index using thecreate_hnsw_index function.
- Similarity Function: The similarity function for the index is cosine similarity by default. If you want to use a different similarity function (e.g.,
L2distance), you need to specify it when initializing theHanaDBinstance. - Default Parameters: In the
create_hnsw_indexfunction, if the user does not provide custom values for parameters likem,ef_construction, oref_search, the default values (e.g.,m=64,ef_construction=128,ef_search=200) will be used automatically. These values ensure the index is created with reasonable performance without requiring user intervention.