Task: It would be nice to have our dataset catalog to display a snippet of the dataset content, similarly to https://www.tensorflow.org/datasets/overview#visualization.
Instructions:
ds.take() or the subsplit-API (DO NOT generate all datasets yourself, but only test on a few ones)tfds.show_examples(ds_info, ds) in a try/catch block as not all datasets supports itTo test the documentation generation, you can use the following code snippet:
import os
from tensorflow_datasets.scripts import generate_visualization # Your new script
from tensorflow_datasets.scripts import document_datasets
DATASET_TO_TESTS = ['mnist', 'cifar10',...] # Datasets you want to test the script on.
dst_dir = ... # Destination directory
def main(_):
for ds_name in DATASET_TO_TESTS:
# 1) Generate the datasets (as script assume the datasets are already generated)
tfds.load(ds_name)
# 2) Generate the figure for the dataset
generate_visualization.generate_visualization([ds_name])
# 3) Generate the documentation page which uses the generated figure in 2
with open(os.path.join(dst_dir, f'{ds_name}.md')) as f:
doc_content = document_datasets.dataset_docs_str([ds_name])
f.write(doc_content)
Difficulty: Intermediate
I've started working on this, I will make a PR begging of next week
I would like to contribute on this issue.
Hey @vvkio , share your status, if possible we could work on this parallely.
Hey @Nikhilnama18, sure we can work on this together. What's left is writing the logic for updating the documentation and testing.
@Conchylicultor why are we passing the dataset name in list to the generate_visualization function?
@ManishAradwad To indicates which datasets to generate images for. This is similar to dataset_docs_str( in https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/document_datasets.py
@Conchylicultor should I updated https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/dataset_info.py with the visualization location to be able to pull the image in the mako template similarly to the other selections? in this instance with builder.info.visualization?
@vvkio You shouldn't have to modify dataset_info. To get the path, you can use tfds_dir() or get_tfds_path() join with info.full_name for instance.
I think this issue can be closed as PR #1632, #1909 got merged
Yes, thanks you all for fixing this.
Created https://github.com/tensorflow/datasets/issues/1949 to track the visualisation issue.
Most helpful comment
Yes, thanks you all for fixing this.
Created https://github.com/tensorflow/datasets/issues/1949 to track the visualisation issue.