Needle integration

Needle makes it easy to create your RAG pipelines with minimal effort. For more details, refer to our API documentation

Overview

The Needle Document Loader is a utility for integrating Needle collections with LangChain. It enables seamless storage, retrieval, and utilization of documents for Retrieval-Augmented Generation (RAG) workflows. This example demonstrates:

Storing documents into a Needle collection.
Setting up a retriever to fetch documents.
Building a Retrieval-Augmented Generation (RAG) pipeline.

Setup

Before starting, ensure you have the following environment variables set:

NEEDLE_API_KEY: Your API key for authenticating with Needle.
OPENAI_API_KEY: Your OpenAI API key for language model operations.

import os

os.environ["NEEDLE_API_KEY"] = ""

os.environ["OPENAI_API_KEY"] = ""

Initialization

To initialize the NeedleLoader, you need the following parameters:

needle_api_key: Your Needle API key (or set it as an environment variable).
collection_id: The ID of the Needle collection to work with.

Instantiation

from langchain_community.document_loaders.needle import NeedleLoader

collection_id = "clt_01J87M9T6B71DHZTHNXYZQRG5H"

# Initialize NeedleLoader to store documents to the collection
document_loader = NeedleLoader(
    needle_api_key=os.getenv("NEEDLE_API_KEY"),
    collection_id=collection_id,
)

Load

To add files to the Needle collection:

files = {
    "tech-radar-30.pdf": "https://www.thoughtworks.com/content/dam/thoughtworks/documents/radar/2024/04/tr_technology_radar_vol_30_en.pdf"
}

document_loader.add_files(files=files)

# Show the documents in the collection
# collections_documents = document_loader.load()

Lazy load

The lazy_load method allows you to iteratively load documents from the Needle collection, yielding each document as it is fetched:

# Show the documents in the collection
# collections_documents = document_loader.lazy_load()

Usage

Use within a chain

Below is a complete example of setting up a RAG pipeline with Needle within a chain:

import os

from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_community.retrievers.needle import NeedleRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)

# Initialize the Needle retriever (make sure your Needle API key is set as an environment variable)
retriever = NeedleRetriever(
    needle_api_key=os.getenv("NEEDLE_API_KEY"),
    collection_id="clt_01J87M9T6B71DHZTHNXYZQRG5H",
)

# Define system prompt for the assistant
system_prompt = """
    You are an assistant for question-answering tasks.
    Use the following pieces of retrieved context to answer the question.
    If you don't know, say so concisely.\n\n{context}
    """

prompt = ChatPromptTemplate.from_messages(
    [("system", system_prompt), ("human", "{input}")]
)

# Define the question-answering chain using a document chain (stuff chain) and the retriever
question_answer_chain = create_stuff_documents_chain(llm, prompt)

# Create the RAG (Retrieval-Augmented Generation) chain by combining the retriever and the question-answering chain
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# Define the input query
query = {"input": "Did RAG move to accepted?"}

response = rag_chain.invoke(query)

response

{'input': 'Did RAG move to accepted?',
 'context': [Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.')],
 'answer': 'Yes, RAG has been adopted as the preferred pattern for improving the quality of responses generated by a large language model.'}

API reference

For detailed documentation of all Needle features and configurations head to the API reference: docs.needle-ai.com

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Popular Providers

Integrations by component

Overview

Setup

Initialization

Instantiation

Load

Lazy load

Usage

Use within a chain

API reference

Popular Providers

Integrations by component

Documentation Index

​Overview

​Setup

​Initialization

​Instantiation

​Load

​Lazy load

​Usage

​Use within a chain

​API reference

Overview

Setup

Initialization

Instantiation

Load

Lazy load

Usage

Use within a chain

API reference