Teradata Vector Store is designed to store, index, and search high-dimensional vector embeddings efficiently within your enterprise data platform.This guide shows you how to quickly get up and running with TeradataVectorStore for your semantic search and RAG applications. Whether you’re new to Teradata or looking to add AI capabilities to your existing data workflows, this guide will walk you through everything you need to know. What makes TeradataVectorStore special?
- Built on enterprise-grade Teradata Vantage platform.
- Seamlessly integrates with your existing data warehouse.
- Supports multiple vector search algorithms for different use cases.
- Scales from prototype to production workloads.
Setup
Before we dive in, you’ll need to install the necessary packages. TeradataVectorStore is part of thelangchain-teradata package, which also includes other Teradata integrations for LangChain.
New to Teradata? Refer to :
- Teradata VantageCloud Lake
- Get started with VantageCloud Lake
Installation
pip
Credentials
Connecting to Teradata: Thecreate_context() function establishes your connection to the Teradata Vantage system. This is how teradataml (and by extension, TeradataVectorStore) knows which database to connect to and authenticate with.
What you’ll need:
- hostname: Your Teradata system’s address
- username/password: Your database credentials
- base_url: API endpoint for your Teradata system
- pat_token: Personal Access Token for API authentication
- pem_file: SSL certificate file for secure connections
Instantiation
Initialize your embeddings TeradataVectorStore supports three types of embedding objects:- String identifiers (e.g., “amazon.titan-embed-text-v1”)
- TeradataAI objects
- LangChain embedding objects - LangChain-compatible embedding model objects
from_documents() method is one of the most straightforward ways to get started - just pass in your documents and TeradataVectorStore handles the rest.
What happens under the hood:
- Your documents get converted to a Teradataml Dataframe and passed to the vector store
- The embeddings are generated and stored for each Document object
- Indexes are automatically created for fast similarity search and chat operations
- Operation tracking: See exactly which stage your vector store creation is at.
- Troubleshooting: Quickly identify if something went wrong during setup.
- Progress monitoring: For large datasets, track embedding generation progress.
- Validation: Confirm your vector store is ready for queries.
get_details() method gives you a comprehensive overview of your setup - think of it as your vector store’s “dashboard.”
What you’ll see:
- Object inventory: Number of tables or documents you have added.
- Search parameters: Current algorithm settings (HNSW, K-means, etc.)
- Configuration details: Embedding dimensions, distance metrics, and indexing options.
- Performance settings: Top-k values, similarity thresholds, and other query parameters.
Manage vector store
Add items to vector store
One of the best features of TeradataVectorStore is how easy it is to expand your knowledge base. As your business grows and you have more documents, you can continuously add them without rebuilding everything from scratch. Real-world scenarios:- Add new product documentation as it’s created.
- Include fresh research papers or industry reports.
- Incorporate customer feedback and support documents.
- Update with latest policy or procedure changes.
Query vector store
Once your vector store has been created and the relevant documents have been added, you will most likely wish to query it during the running of your chain or agent.Query directly
Now let’s search for information in our vector store. Unlike traditional keyword search, vector search understands the meaning behind your questions. Ask about “AI applications” and it might return results about “machine learning models” because it understands these concepts are related. How similarity search works:- Your question gets converted to a vector embedding (just like your documents).
- TeradataVectorStore calculates similarity scores between your question and stored documents.
- The most relevant results are returned, ranked by similarity.
Query by turning into retriever
You can also transform the vector store into a retriever for easier usage in your chains.Usage for retrieval-augmented generation
Theask() combines the power of vector search with language model generation. Instead of just returning raw document chunks, you get coherent, contextual answers.
The two-step process:
- Retrieval: Find the most relevant documents from your vector store.
- Generation: Use those documents as context to generate a natural language response.
- Relevant retrieval: Your vector store finds the right information.
- Contextual generation: The language model uses that information effectively.
- Source transparency: Users can see where answers come from.
- You can use your vector store as a retriever to get the most relevant documents, then pass those documents to a RAG chain within LangChain workflows.
- This gives you the flexibility to build custom pipelines while leveraging Teradata’s powerful vector search capabilities.
- Retrieval: Your vector store finds the most relevant documents for the question.
- Context preparation: Those documents become context for the language model.
- Generation: The LM generates an answer based on your actual data.
- Output parsing: Clean, formatted response ready for your application.
- Customer support: Answer questions using your product documentation.
- Research assistance: Query your organization’s knowledge repositories.
- Compliance: Ensure responses are based on approved company information.
Working with Different Data Types
TeradataVectorStore’s flexibility really shines when working with different types of data sources. Depending on what you’re starting with, you can choose the most appropriate method. Choose your starting point:- Have PDF documents? Use
from_documents()with file paths - Working with database tables? Use
from_datasets()with DataFrames - Already have embeddings? Use
from_embeddings()to import them directly
From PDF Files
From Database Tables
From Pre-computed Embeddings
When working with tables (and embedded tables), the
data_columns parameter is mandatory. This tells TeradataVectorStore exactly which columns contain the text content you want to convert into embeddings. Think of it as pointing the service to the right information
For example, if your table has columns like id, title, description, and category, you’d specify data_columns=[“description”] to embed only the description text, or data_columns=[“title”, “description”] to combine both fields.
Below is a small example of loading sample table with teradatagenai and creating a content based store out of it. For the data_columns we will pass the “rev_text” column which will be used to generate the embeddings.
Next Steps
Congratulations! You’ve just built your first AI-powered search and RAG system with TeradataVectorStore. You’re now ready to scale this up to handle real enterprise workloads. Ready to go deeper?- Advanced search algorithms: Try HNSW or K-means clustering for large-scale deployments
- Custom embedding models: Experiment with domain-specific embeddings for your industry
- Real-time updates: Set up pipelines to automatically update your vector store as new data arrives
- Security: Leverage Teradata’s enterprise security features
- Monitoring: Use Teradata’s built-in performance monitoring
- LangChain RAG Tutorials - Deep dive into RAG patterns
- TeradataVectorStore Workflows - Complete examples and use cases
- VantageCloud Lake - Cloud-native analytics platform