Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with vector search capabilities.Note: in addition to access to the database, an OpenAI API Key is required to run the full example.
datasets
, openai
, pypdf
and tiktoken
are required, along with langchain-community
).
DataStax Astra DB is a managed serverless database built on Cassandra, offering the same interface and strengths.Depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object.
cassandra.cluster.Session
object, as described in the Cassandra driver documentation. The details vary (e.g. with network settings and authentication), but this might be something like:
cassio.init
setting, however, comes handy if your applications uses Cassandra in several ways (for instance, for vector store, chat memory and LLM response caching), as it allows to centralize credential and DB connection management in one place.
01234567-89ab-cdef-0123-456789abcdef
AstraCS:6gBhNmsk135....
(it must be a “Database Administrator” token)Document
, then write them into the vector store:
metadata
dictionaries are created from the source data and are part of the Document
.
Add some more entries, this time with add_texts
:
add_texts
and add_documents
by increasing the concurrency level for
these bulk operations - check out the methods’ batch_size
parameter
for more details. Depending on the network and the client machine specifications, your best-performing choice of parameters may vary.
Session
object from CassIO and runs a CQL DROP TABLE
statement with it:
(You will lose the data you stored in it.)
Cassandra
vector store.
Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.