TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf.data.Datasets, enabling easy-to-use and high-performance input pipelines. To get started see the guide and the list of datasets.This notebook shows how to load
TensorFlow Datasets
into a Document format that we can use downstream.
tensorflow
and tensorflow-datasets
python packages.
mlqa/en
dataset.
MLQA
(Multilingual Question Answering Dataset
) is a benchmark dataset for evaluating multilingual question answering performance. The dataset consists of 7 languages: Arabic, German, Spanish, English, Hindi, Vietnamese, Chinese.
- Homepage: https://github.com/facebookresearch/MLQA
- Source code:
tfds.datasets.mlqa.Builder
- Download size: 72.21 MiB
context
field as the Document.page_content
and place other fields in the Document.metadata
.
TensorflowDatasetLoader
has these parameters:
dataset_name
: the name of the dataset to loadsplit_name
: the name of the split to load. Defaults to “train”.load_max_docs
: a limit to the number of loaded documents. Defaults to 100.sample_to_document_function
: a function that converts a dataset sample to a Document