Datasets: Wrong labels in coil100 dataset, has 72 instead of 100

Created on 17 Nov 2020  路  6Comments  路  Source: tensorflow/datasets

Hi, I believe the labels are wrong the dataset.

If you examine the "label" feature it says number of classes is 72, it should be 100.
info.features["label"].num_classes
72
72 is how many images there are per class, 100 is the number of objects/classes.

If you list the labels it shows the 5-degree steps.
info.features["label"].names
'0' '5' '10' '15' '20' '25' '30' ...

Should be:
'obj1', 'obj2', 'obj3', 'obj4', 'obj5' ...

Maybe solution in coil100.py

_LABELS = ['obj'+str(x) for x in range(1, 101)]

And in _generate_examples I think label should be:
label = int(file_name.split("__")[0][3:])-1
instead of:
label = file_name.split("_")[2].split(".")[0]

(and not sure if object_id is needed)

bug contributions welcome

All 6 comments

As far as I understand the implementation, it seems label correspond to the angle, while object_id correspond to the object id.
I agree this is confusing.

  • object_id should be changed to a ClassLabel
  • label should be renamed angle_label
  • There likelly should be an additional 'angle': tf.int64 feature to contain the raw angle.

Don't hesitate to send a PR to update the dataset

@Conchylicultor Hi, I was working on this issue, I seem to have run into an issue. When I renamed the key label to angle_label, it seemed that I was able to load the dataset successfully using load but I was unable to iterate the dataset using take.

Code: (Note: label is changed to angle_label)

def peek(ds):
  print(ds)
  return ds

ds_train,info= tfds.load("coil100", split='train', with_info=True)
ds_ = ds_train.map(peek, num_parallel_calls=tf.data.experimental.AUTOTUNE)   #works fine

ds = ds_train.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE) 
for d in ds.take(1):   #raises error
    print(d)
  • The peek function prints
{'angle_label': <tf.Tensor 'args_0:0' shape=() dtype=int64>, 'image': <tf.Tensor 'args_1:0' shape=(128, 128, 3) dtype=uint8>, 'object_id': <tf.Tensor 'args_2:0' shape=() dtype=string>}
  • The error take raises is
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: angle_label (data type: int64) is required but could not be found.

(Note: Same error is raised even when loaded as supervised)

But angle_lable seems to be of dtype: int64.

Same goes to object_id, when being converted to ClassLabel.

Thank You.

Thanks for looking into this. Could you open a draft PR so we can take a look ?

Could you make sure the data is fully regenerated by deleting ~/tensorflow_datasets/coil100/

Regenerating the dataset seems to have resolved the issue. Thank you very much. I am committing rest of the changes and opening the PR for review.

Thank You.

@duran67 Hi, I hope PR #2769 resolved your issue.
Now, label has become angle_label with 72 classes (0, 5, 10, 15,.... labelled as 0, 1, 2, 3, ....)
We have converted object_id to a ClassLabel with 100 classes (obj1, obj2, obj3,.... labelled as 0, 1, 2, 3....)
We have also added angle feature which stores raw angles (0, 5, 10, 15.....)

Thank You.

Was this page helpful?
0 / 5 - 0 ratings