VectorizeRetriever integration

This notebook shows how to use the LangChain Vectorize retriever.

Vectorize helps you build AI apps faster and with less hassle. It automates data extraction, finds the best vectorization strategy using RAG evaluation, and lets you quickly deploy real-time RAG pipelines for your unstructured data. Your vector search indexes stay up-to-date, and it integrates with your existing vector database, so you maintain full control of your data. Vectorize handles the heavy lifting, freeing you to focus on building robust AI solutions without getting bogged down by data management.

Setup

In the following steps, we’ll setup the Vectorize environment and create a RAG pipeline.

Create a vectorize account & get your access token

Sign up for a free Vectorize account.
Generate an access token in the Access Token section.
Gather your organization ID. From the browser URL, extract the UUID from the URL after /organization/.

Configure token and organization ID

import getpass

VECTORIZE_ORG_ID = getpass.getpass("Enter Vectorize organization ID: ")
VECTORIZE_API_TOKEN = getpass.getpass("Enter Vectorize API Token: ")

Installation

This retriever lives in the langchain-vectorize package:

!pip install -qU langchain-vectorize

Download a PDF file

!wget "https://raw.githubusercontent.com/vectorize-io/vectorize-clients/refs/tags/python-0.1.3/tests/python/tests/research.pdf"

Initialize the vectorize client

import vectorize_client as v

api = v.ApiClient(v.Configuration(access_token=VECTORIZE_API_TOKEN))

Create a file upload source connector

import json
import os

import urllib3

connectors_api = v.ConnectorsApi(api)
response = connectors_api.create_source_connector(
    VECTORIZE_ORG_ID, [{"type": "FILE_UPLOAD", "name": "From API"}]
)
source_connector_id = response.connectors[0].id

Upload the PDF file

file_path = "research.pdf"

http = urllib3.PoolManager()
uploads_api = v.UploadsApi(api)
metadata = {"created-from-api": True}

upload_response = uploads_api.start_file_upload_to_connector(
    VECTORIZE_ORG_ID,
    source_connector_id,
    v.StartFileUploadToConnectorRequest(
        name=file_path.split("/")[-1],
        content_type="application/pdf",
        # add additional metadata that will be stored along with each chunk in the vector database
        metadata=json.dumps(metadata),
    ),
)

with open(file_path, "rb") as f:
    response = http.request(
        "PUT",
        upload_response.upload_url,
        body=f,
        headers={
            "Content-Type": "application/pdf",
            "Content-Length": str(os.path.getsize(file_path)),
        },
    )

if response.status != 200:
    print("Upload failed: ", response.data)
else:
    print("Upload successful")

Connect to the AI platform and vector Database

ai_platforms = connectors_api.get_ai_platform_connectors(VECTORIZE_ORG_ID)
builtin_ai_platform = [
    c.id for c in ai_platforms.ai_platform_connectors if c.type == "VECTORIZE"
][0]

vector_databases = connectors_api.get_destination_connectors(VECTORIZE_ORG_ID)
builtin_vector_db = [
    c.id for c in vector_databases.destination_connectors if c.type == "VECTORIZE"
][0]

Configure and deploy the pipeline

pipelines = v.PipelinesApi(api)
response = pipelines.create_pipeline(
    VECTORIZE_ORG_ID,
    v.PipelineConfigurationSchema(
        source_connectors=[
            v.SourceConnectorSchema(
                id=source_connector_id, type="FILE_UPLOAD", config={}
            )
        ],
        destination_connector=v.DestinationConnectorSchema(
            id=builtin_vector_db, type="VECTORIZE", config={}
        ),
        ai_platform=v.AIPlatformSchema(
            id=builtin_ai_platform, type="VECTORIZE", config={}
        ),
        pipeline_name="My Pipeline From API",
        schedule=v.ScheduleSchema(type="manual"),
    ),
)
pipeline_id = response.data.id

Configure tracing (optional)

If you want to get automated tracing from individual queries, you can also set your LangSmith API key by uncommenting below:

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

Instantiation

from langchain_vectorize.retrievers import VectorizeRetriever

retriever = VectorizeRetriever(
    api_token=VECTORIZE_API_TOKEN,
    organization=VECTORIZE_ORG_ID,
    pipeline_id=pipeline_id,
)

Usage

query = "Apple Shareholders equity"
retriever.invoke(query, num_results=2)

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Popular Providers

Integrations by component

VectorizeRetriever integration

Setup

Create a vectorize account & get your access token

Configure token and organization ID

Installation

Download a PDF file

Initialize the vectorize client

Create a file upload source connector

Upload the PDF file

Connect to the AI platform and vector Database

Configure and deploy the pipeline

Configure tracing (optional)

Instantiation

Usage

Popular Providers

Integrations by component

Documentation Index

​Setup

​Create a vectorize account & get your access token

​Configure token and organization ID

​Installation

​Download a PDF file

​Initialize the vectorize client

​Create a file upload source connector

​Upload the PDF file

​Connect to the AI platform and vector Database

​Configure and deploy the pipeline

​Configure tracing (optional)

​Instantiation

​Usage

Setup

Create a vectorize account & get your access token

Configure token and organization ID

Installation

Download a PDF file

Initialize the vectorize client

Create a file upload source connector

Upload the PDF file

Connect to the AI platform and vector Database

Configure and deploy the pipeline

Configure tracing (optional)

Instantiation

Usage