Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
JSONLoader | langchain-community | ✅ | ❌ | ✅ |
Source | Document Lazy Loading | Native Async Support |
---|---|---|
JSONLoader | ✅ | ❌ |
langchain-community
integration package as well as the jq
python package.
JSONLoader
class.
To enable automated tracing of your model calls, set your LangSmith API key:
json_lines=True
and specify jq_schema
to extract page_content
from a single JSON object.
jq_schema='.'
and provide a content_key
in order to only load specific content:
content_key
content_key
within the jq schema, set is_content_key_jq_parsable=True
. Ensure that content_key
is compatible and can be parsed using the jq schema.
JSONLoader
.
There are some key changes to be noted. In the previous example where we didn’t collect the metadata, we managed to directly specify in the schema where the value for the page_content
can be extracted from.
In this example, we have to tell the loader to iterate over the records in the messages
field. The jq_schema then has to be .messages[]
This allows us to pass the records (dict) into the metadata_func
that has to be implemented. The metadata_func
is responsible for identifying which pieces of information in the record should be included in the metadata stored in the final Document
object.
Additionally, we now have to explicitly specify in the loader, via the content_key
argument, the key from the record where the value for the page_content
needs to be extracted from.