Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.Pebblo has two components.
Pebblo Server
see this pebblo server document.
Pebblo Safeloader enables safe data ingestion for Langchain DocumentLoader
. This is done by wrapping the document loader call with Pebblo Safe DocumentLoader
.
Note: To configure pebblo server on some url other that pebblo’s default (localhost:8000) url, put the correct URL in PEBBLO_CLASSIFIER_URL
env variable. This is configurable using the classifier_url
keyword argument as well. Ref: server-configurations
CSVLoader
to read a CSV document for inference.
Here is the snippet of Document loading using CSVLoader
.
PEBBLO_API_KEY
environment variable.
PEBBLO_LOAD_SEMANTIC
, and setting it to True.
anonymize_snippets
to True
to anonymize all personally identifiable information (PII) from the snippets going into VectorDB and the generated reports.
Note: The Pebblo Entity Classifier effectively identifies personally identifiable information (PII) and is continuously evolving. While its recall is not yet 100%, it is steadily improving. For more details, please refer to the Pebblo Entity Classifier docs