OracleVS (the LangChain vector store integration for Oracle AI Vector Search) to:
- Ingest documents and embeddings into Oracle
- Run similarity search
- Create HNSW and IVF indexes
- Apply metadata filters for advanced retrieval
- Enable hybrid search (keyword + semantic) in Oracle Database 26ai
- Run full-text search using Oracle Text
Prerequisites
Installlangchain-oracledb. The python-oracledb driver will be installed automatically as a dependency.
Connect to Oracle Database
The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following guide that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode.Import the required dependencies
Load documents
Create vector stores with different distance metrics
First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies. To understand more about the different types of indices Oracle AI Vector Search supports, refer to the following guide. You can manually connect to the Oracle Database and will see three tables :Documents_DOT, Documents_COSINE and Documents_EUCLIDEAN.
We will then create three additional tables Documents_DOT_IVF, Documents_COSINE_IVF and Documents_EUCLIDEAN_IVF which will be used
to create IVF indicies on the tables instead of HNSW indices.
Add and delete operations for texts, along with basic similarity search
Index creation with specific parameters
Advanced search
Oracle Database 23ai supports pre-filtering, in-filtering, and post-filtering to enhance AI Vector Search capabilities. These filtering mechanisms allow users to apply constraints before, during, and after performing vector similarity searches, improving search performance and accuracy. Key Points about Filtering in Oracle 23ai:- Pre-filtering Applies traditional SQL filters to reduce the dataset before performing the vector similarity search. Helps improve efficiency by limiting the amount of data processed by AI algorithms.
- In-filtering Utilizes AI Vector Search to perform similarity searches directly on vector embeddings, using optimized indexes and algorithms. Efficiently filters results based on vector similarity without requiring full dataset scans.
- Post-filtering Applies additional SQL filtering to refine the results after the vector similarity search. Allows further refinement based on business logic or additional metadata conditions.
- Performance Optimization: Pre-filtering significantly reduces query execution time, making searches on massive datasets more efficient.
- Accuracy Enhancement: In-filtering ensures that vector searches are semantically meaningful, improving the quality of search results.
Filter details
OracleVS supports a set of filters that can be applied to metadata fields using filter parameter. These filters allow you to select and refine data based on various criteria.
Available Filter Operators:
| Operator | Description |
|---|---|
\$exists | Field exists. |
\$eq | Field value equals the operand value (=). |
\$ne | Field exists and value does not equal the operand value (!=). |
\$gt | Field value is greater than the operand value (>). |
\$lt | Field value is less than the operand value (<). |
\$gte | Field value is greater than or equal to the operand value (>=). |
\$lte | Field value is less than or equal to the operand value (<=). |
\$between | Field value is between (or equal to) two values in the operand array. |
\$startsWith | Field value starts with the operand value. |
\$hasSubstring | Field value contains the operand as a substring. |
\$instr | Field value contains the operand as a substring. |
\$regex | Field value matches the given regular expression pattern. |
\$like | Field value matches the operand pattern (using SQL-like syntax). |
\$in | Field value equals at least one value in the operand array. |
\$nin | Field exists, but its value is not equal to any in the operand array, or the field does not exist. |
\$all | Field value is an array containing all items from the operand array, or a scalar matching a single operand. |
- You can combine these filters using logical operators:
| Logical Operator | Description |
|---|---|
\$and | Logical AND |
\$or | Logical OR |
\$nor | Logical NOR |
- You can omit
$andwhen all filters in an object must be satisfied. These two are equivalent:
- The
$notclause can negate a comparison operator:
- Using
field: scalaris equivalent tofield: { "$eq": scalar }:
Hybrid search
Oracle Database 26ai supports hybrid search, combining keyword (full-text) and semantic (vector) search into a single retrieval capability. Thelangchain-oracledb integration exposes:
OracleVectorizerPreference: creates a DB-side vectorizer preference used by hybrid indexes.create_hybrid_index/acreate_hybrid_index: creates a HYBRID VECTOR INDEX.OracleHybridSearchRetriever: executes keyword, semantic, or hybrid retrieval over anOracleVStable.
Prerequisites and model configuration
When using hybrid search, configure yourOracleVS with OracleEmbeddings so the vectorizer preference exactly matches the embedding configuration. You can further tune the hybrid vector index by supplying additional parameters via OracleVectorizerPreference. For details, see the documentation.
- If you don’t want to manage a named preference, pass the
vector_storeinstead. The function will create a temporary preference, build the index, then drop the preference automatically.
- search_mode decides which signals are used:
- “keyword”: keyword-only
- “semantic”: vector-only
- “hybrid” (default): both combined
- Pass
DBMS_HYBRID_VECTORparameters via the retriever’s params argument. return_scores=Trueadds overall score and component text_score and vector_score to Document.metadata.- Async usage is supported via
acreate_hybrid_indexandOracleHybridSearchRetriever.ainvoke.
Full-text search (Oracle Text)
You can run high-quality keyword search directly against Oracle Database using Oracle Text. Thelangchain-oracledb integration exposes:
create_text_index/acreate_text_index: creates an Oracle TextSEARCH INDEXon a column.OracleTextSearchRetriever: runsCONTAINSqueries and returns LangChainDocumentobjects.
- If you have an
OracleVSvector store, you can index its built-in “text” column. - You can also index any other table/column by providing
table_name+column_namedirectly.
- Default behavior (
operator_search=False):- Input is treated as literal text, tokenized on non-word characters, and rewritten as an ACCUM expression.
- With
fuzzy=True, each token is wrapped asFUZZY("token")to match misspellings.
- Operator mode (
operator_search=True):- Pass an Oracle Text expression unchanged (
NEAR,ABOUT,AND,OR,NOT,WITHIN, etc.). In this mode, fuzzy is ignored.
- Pass an Oracle Text expression unchanged (
- When targeting a raw table, include extra columns in results via
returned_columns; they are attached toDocument.metadata. - With
OracleVS,returned_columnsdefaults to [“metadata”].
- When using
operator_search=True, fuzzy is ignored (by design). - Async usage is supported via
acreate_text_indexandOracleTextSearchRetriever.ainvoke.