Google Drive is a file storage and synchronization service developed by Google.This notebook covers how to load documents from
Google Drive
. Currently, only Google Docs
are supported.
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
GOOGLE_APPLICATION_CREDENTIALS
to an empty string (""
).
By default, the GoogleDriveLoader
expects the credentials.json
file to be located at ~/.credentials/credentials.json
, but this is configurable using the credentials_path
keyword argument. Same thing with token.json
- default path: ~/.credentials/token.json
, constructor param: token_path
.
The first time you use GoogleDriveLoader, you will be displayed with the consent screen in your browser for user authentication. After authentication, token.json
will be created automatically at the provided or the default path. Also, if there is already a token.json
at that path, then you will not be prompted for authentication.
GoogleDriveLoader
can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:
"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5"
"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw"
folder_id
by default all files of type document, sheet and pdf are loaded. You can modify this behaviour by passing a file_types
argument
GoogleDriveLoader
. If you pass in a file loader, that file loader will be used on documents that do not have a Google Docs or Google Sheets MIME type. Here is an example of how to load an Excel document from Google Drive using a file loader.
langchain-googledrive
It’s compatible with the ̀langchain_community.document_loaders.GoogleDriveLoader
and can be used
in its place.
To be compatible with containers, the authentication uses an environment variable ̀GOOGLE_ACCOUNT_FILE
to credential file (for user or service).
Document
.
GDriveLoader
.
But, the corresponding packages must be installed.
list()
API can be set.
To specify the new pattern of the Google request, you can use a PromptTemplate()
.
The variables for the prompt can be set with kwargs
in the constructor.
Some pre-formated request are proposed (use {query}
, {folder_id}
and/or {mime_type}
):
You can customize the criteria to select the files. A set of predefined filter are proposed:
template | description |
---|---|
gdrive-all-in-folder | Return all compatible files from a folder_id |
gdrive-query | Search query in all drives |
gdrive-by-name | Search file with name query |
gdrive-query-in-folder | Search query in folder_id (and sub-folders if recursive=true ) |
gdrive-mime-type | Search a specific mime_type |
gdrive-mime-type-in-folder | Search a specific mime_type in folder_id |
gdrive-query-with-mime-type | Search query with a specific mime_type |
gdrive-query-with-mime-type-and-folder | Search query with a specific mime_type and in folder_id |
return_link
to True
to export links.
gslide_mode
accepts different values:
gsheet_mode
accepts different values:
"single"
: Generate one document by line"elements"
: one document with markdown array and <PAGE BREAK> tags.lazy_update_description_with_summary()
).
If you use the mode="snippet"
, only the description will be used for the body. Else, the metadata['summary']
has the field.
Sometime, a specific filter can be used to extract some information from the filename, to select some files with specific criteria. You can use a filter.
Sometimes, many documents are returned. It’s not necessary to have all documents in memory at the same time. You can use the lazy versions of methods, to get one document at a time. It’s better to use a complex query in place of a recursive search. For each folder, a query must be applied if you activate recursive=True
.