Short description
I want to split the test portion of a TensorFlow dataset in half so I can use half as test data and half as validation data. I was trying to follow the examples in here and it seems that ReadInstruction is no longer in tensorflow_datasets.
Environment information
Reproduction instructions
Just try to create a ReadInstruction object.
Expected behavior
I expected the documentation above to be up-to-date and accurate.
Thanks for reporting. Indeed we forgot to exposed the object on the public API.
I think you should try to use the string version instead: train_ds, test_ds = tfds.load('mnist:3.*.*', split=['train[50%:]', 'train[:50%]'])
Ah. Thanks for the confirmation.
I think you should try to use the string version instead...
I tried that and an error was reported saying that only "train", "test", and some other category I can't remember off the top of my head were available.
In other words, I believe that the string version is also... not available on the public API.
It means that you're probably using a legacy dataset. Have you tried the legacy API ? https://www.tensorflow.org/datasets/splits#legacy_slicing_api
Ah yes. I see that the documentation is using a data source (imdb_reviews/subwords8k) that is flagged for "retirement" and does not support the S3 slicing API.
For anyone else stumbling on this threat, you can test this by running,
builder = tfds.builder('imdb_reviews/subwords8k')
builder.version.implements(tfds.core.Experiment.S3)
which outputs False.
Thanks.
Should be fixed with https://github.com/tensorflow/datasets/pull/1064
tfds.core.ReadInstruction is now exposed.
Most helpful comment
Thanks for reporting. Indeed we forgot to exposed the object on the public API.
I think you should try to use the string version instead:
train_ds, test_ds = tfds.load('mnist:3.*.*', split=['train[50%:]', 'train[:50%]'])