PDF
parser based on DocAI from Google Cloud.
You need to install two libraries to use this parser:
GCS_OUTPUT_PATH
should be a path to a folder on GCS (starting with gs://
) and a PROCESSOR_NAME
should look like projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID
or projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/PROCESSOR_VERSION_ID
. You can get it either programmatically or copy from the Prediction endpoint
section of the Processor details
tab in the Google Cloud Console.
DocAIParser
.
lazy_parse()
method to