Datasets: tfds.Split.ALL does not work

Created on 7 Feb 2020 · 7Comments · Source: tensorflow/datasets

Short description

splits.py refers to a special split, tfds.Split.ALL, that's supposed to contain all splits merged together. Currently, this does not work, raising the error:

Requested split "all" does not exist.

Environment information

tensorflow-datasets version: 2.0.0

Reproduction instructions

import tensorflow_dataset as tfds

tfds.load(name='omniglot', split='all')

bug

Source

mrahtz

Most helpful comment

@ibarrond
Note that

ds = tfds.load(..., split='train+test+validation')

might be more performant. The generic form would be:

builder = tfds.builder('my_dataset')
ds = builder.as_dataset('+'.join(builder.info.splits.keys()))

Also in your above code, each split is read sequentially, (all train, then all test,...) while users might want to shuffle between splits (possible with shuffle_files=True)

Conchylicultor on 18 Sep 2020

👍2

All 7 comments

From my understanding of tfds.Split.ALL, in other files of code it says that its a special keyword that cannot be used as a key like in your reproduction instructions. split='all' cannot be used. If you did just have tfds.load(name='omniglot') split would default and return all splits. If that helps.

manda-creator on 16 Feb 2020

Hey, @mrahtz I'm giving you an example of mnist dataset for using all the split data. If you're looking for making a dataset which is having all the split then it may helps you.
`all=tfds.Split.Train+ tfds.Split.Test
ds = tfds.load("mnist", split="all")
print(len(ds))

output= 70000

abhinavsp0730 on 25 Feb 2020

For context, this was a regression when we switched to our new reading pipeline.

The question is, should we restore this feature to allow tfds.load(..., split='all') ?
From our statistics internally, it seems that this feature has almost never been used so this hasn't been prioritised.
The alternative is to delete ALL entirely.

Please +1 this issue if you're interested, so we can evaluate the demand.

Conchylicultor on 25 Feb 2020

(Personally, I don't need this feature - I only reported it because I thought it might be a regression. I'd be happy with this issue being closed.)

mrahtz on 1 Mar 2020

Thanks for the update. I removed Split.ALL from the API and the doc. So hopefully users will stop being confused about this.

Conchylicultor on 1 Mar 2020

If anyone is looking for this in the future, you can instead concatenate each split in one line of code:

ds_train = datasets['train']
ds_test = datasets['test']
ds_valid = datasets['validation']

ds = ds_train.concatenate(ds_test).concatenate(ds_valid)

SOURCE: https://stackoverflow.com/questions/56546672/how-can-i-merge-two-or-more-tensorflow-datasets

ibarrond on 18 Sep 2020

@ibarrond
Note that

ds = tfds.load(..., split='train+test+validation')

might be more performant. The generic form would be:

builder = tfds.builder('my_dataset')
ds = builder.as_dataset('+'.join(builder.info.splits.keys()))

Also in your above code, each split is read sequentially, (all train, then all test,...) while users might want to shuffle between splits (possible with shuffle_files=True)

Conchylicultor on 18 Sep 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings