Is your feature request related to a problem? Please describe.
It's a bit painful to have to go searching for the class names. For example, I had no idea where to get the tf_flowers class names.
Describe the solution you'd like
It would be great to have the class names as part of the Dataset info, and optionally class descriptions as well. Perhaps also some statistics, like the total number of instances per class (and maybe also the number of instances per class in each split)?
Something like this:
>>> import tensorflow_datasets as tfds
>>> dataset, info = tfds.load("tf_flowers", with_info=True)
>>> info.class_names
["daisy", "dandelion", "roses", "sunflowers", "tulips"]
>>> info.class_descriptions
["Bellis perennis", "Bellis perennis", "Rosa", "Helianthus annuus", "Tulipa"]
>>> info.class_counts
[898, 633, 799, 699, 641]
>>> info.splits["train"].class_counts
[898, 633, 799, 699, 641]
Describe alternatives you've considered
One alternative is to at least document the class names on the page that lists the datasets, so we can have a simple place to get them. It would be much better to have that data in the Dataset info, though.
Also, instead of the API proposed above, you could have a ClassInfo class (or protobuf) containing all the details of a single class, such as its name and description, and perhaps extra details such as the total count.
>>> info.classes
[ClassInfo(name="daisy", description="Bellis perennis", count=898), ...]
Additional context
I love TFDS. :)
Oh wow, I just realized the class names are not in alphabetical order. The current order is:
class_names = ["dandelion", "daisy", "tulip", "sunflower", "rose"]
I hope it is consistent every time I download the dataset?
@ageron I will send a PR correcting it :+1:
I think this is already possible:
dataset, info = tfds.load("tf_flowers", with_info=True)
print(info.features['label'].names)
Which output:
[u'dandelion', u'daisy', u'tulips', u'sunflowers', u'roses']
Note that you also have info.features['label'].str2int('roses') == 4 and .int2str for easy conversion.
For the statistics, we plan to have the statistics info.stats be computed using https://www.tensorflow.org/tfx/guide/tfdv which hopefully will provide more insight about the label distribution & cie.
I'm closing this, don't hesitate to re-open if this didn't answered your use-case.
Fantastic, thanks! 馃憤 Perhaps this should be more prominent in the documentation?
Most helpful comment
I think this is already possible:
Which output:
Note that you also have
info.features['label'].str2int('roses') == 4and.int2strfor easy conversion.For the statistics, we plan to have the statistics
info.statsbe computed using https://www.tensorflow.org/tfx/guide/tfdv which hopefully will provide more insight about the label distribution & cie.I'm closing this, don't hesitate to re-open if this didn't answered your use-case.