Models: Explanation of keys present in TFRecord

Created on 14 Sep 2017  路  13Comments  路  Source: tensorflow/models

I have done lot of Google search but I am not able to find any answer to my questions. In generation of TFRecords (ex: create_pet_tf_record.py), there are few keys for which I am not able to find any documentation associated with it.
1) There is image/object/difficult and image/object/truncated which I am not able to understand. Regarding the image/object/truncated key, I believe, if the image of object to be detected is in a cropped manner, we set the value to 1 and 0 otherwise. Is that correct?

2) Please can somebody tell me all the values present in image/object/view. Right now I am just aware of value of Frontal.

3) In many of the blogs which I was looking into before, they have created a key for occlusion as well, which I am not able to find in any of the official examples of generating TFRecord. Is there any key for occlusion or not?

4) Please can somebody give me the link where all the keys present in TFRecord are explained in detail?

help wanted docs

Most helpful comment

Though I do agree that TFRecord generation need not follow the specification set by Pascal VOC annotation styled XML files, I have searched for answers in Pascal VOC website, considering that similar pattern will be followed in TFRecords as the example in documentation makes use of Pascal VOC annotation styled XML files.
So following are the answers(leads) which I found(so far).

  1. image/object/difficult: The difficult field being set to 1 indicates that the object has been annotated as "difficult", for example an object which is clearly visible but difficult to recognize without substantial use of context.
    image/object/truncated: The truncated field being set to 1 indicates that the object is "truncated" in the image. The definition of truncated is that the bounding box of the object specified does not correspond to the full extent of the object e.g. an image of a person from the waist up, or a view of a car extending outside the image.
  1. Different values of image/object/view: The view field contains the view: Frontal, Rear, Left (side view, facing left of image), Right (side view, facing right of image), or an empty string indicating another, or un-annotated view.

  2. There exists a key for occlusion with the key name of occluded. The occluded field being set to 1 indicates that the object is significantly occluded by another object.

  3. The documentation is not presented in a neat manner but can be read at Pascal VOC website.

The correct answer will be confirmed once TFRecord's key are explained in the documentation which needs to be written.

All 13 comments

I'm going to mark this as "docs", as it seems like a reasonable request.

In the meantime, may I also refer you to StackOverflow (http://stackoverflow.com/questions/tagged/tensorflow)? There is also a larger community that reads questions there. Thanks!

Though I do agree that TFRecord generation need not follow the specification set by Pascal VOC annotation styled XML files, I have searched for answers in Pascal VOC website, considering that similar pattern will be followed in TFRecords as the example in documentation makes use of Pascal VOC annotation styled XML files.
So following are the answers(leads) which I found(so far).

  1. image/object/difficult: The difficult field being set to 1 indicates that the object has been annotated as "difficult", for example an object which is clearly visible but difficult to recognize without substantial use of context.
    image/object/truncated: The truncated field being set to 1 indicates that the object is "truncated" in the image. The definition of truncated is that the bounding box of the object specified does not correspond to the full extent of the object e.g. an image of a person from the waist up, or a view of a car extending outside the image.
  1. Different values of image/object/view: The view field contains the view: Frontal, Rear, Left (side view, facing left of image), Right (side view, facing right of image), or an empty string indicating another, or un-annotated view.

  2. There exists a key for occlusion with the key name of occluded. The occluded field being set to 1 indicates that the object is significantly occluded by another object.

  3. The documentation is not presented in a neat manner but can be read at Pascal VOC website.

The correct answer will be confirmed once TFRecord's key are explained in the documentation which needs to be written.

@derekjchow, could you PTAL. Thanks!

Any word on this? Specifically, I am wondering what is the difference between truncated and occluded.

Any update on this?
Also, I wonder if I can either

  1. put the dummy value to the field for which information is not available. For example, difficult/view/truncated.

or

  1. ignore that field when creating the tf example.

Thank you!

Thanks, @Prasad9 , that seems like a good summary of the fields in question. Docs would be useful; care to submit a PR?
@jerowe - truncated means the object is not fully contained in the image or bounding box, but occluded means the object is blocked by something else in the scene.
@willSapgreen - I'm not exactly sure what you mean, but presuming you are asking about new records you want to create, the extra fields are likely helpful for training, but not critical during prediction/inference.

Thanks for the clarification. ;-)

I came across this key image/object/weight and I just can not find any explanation on what it exactly does. I imagine the following: Let's say I train a model to classify an object either as human or not. I have a training set with 1000 images with humans and only 500 with non-humans. Now I would set the weight of all human images to 1/3 and all of the non-human ones to 2/3, so that during the training the model is trained with an equal amount of human and non-human images.

Is my assumption correct?

How exactly these instances (difficult, truncated, view, occlusion) are treated in TensorFlow Object Detection API? Are they used in the training process?

How exactly these instances (difficult, truncated, view, occlusion) are treated in TensorFlow Object Detection API? Are they used in the training process?

I would also like to know if Tensorflow API automatically includes the labels set as Difficult in the training process or is there something additional we have to do in order for it to include labels marked as Difficult.

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Though I do agree that TFRecord generation need not follow the specification set by Pascal VOC annotation styled XML files, I have searched for answers in Pascal VOC website, considering that similar pattern will be followed in TFRecords as the example in documentation makes use of Pascal VOC annotation styled XML files.
So following are the answers(leads) which I found(so far).

  1. image/object/difficult: The difficult field being set to 1 indicates that the object has been annotated as "difficult", for example an object which is clearly visible but difficult to recognize without substantial use of context.
    image/object/truncated: The truncated field being set to 1 indicates that the object is "truncated" in the image. The definition of truncated is that the bounding box of the object specified does not correspond to the full extent of the object e.g. an image of a person from the waist up, or a view of a car extending outside the image.
  2. Different values of image/object/view: The view field contains the view: Frontal, Rear, Left (side view, facing left of image), Right (side view, facing right of image), or an empty string indicating another, or un-annotated view.
  3. There exists a key for occlusion with the key name of occluded. The occluded field being set to 1 indicates that the object is significantly occluded by another object.
  4. The documentation is not presented in a neat manner but can be read at Pascal VOC website.

The correct answer will be confirmed once TFRecord's key are explained in the documentation which needs to be written.

Does the XML file takes part in training process as we convert tfrecords from csv file that contains the name, size, class and bounding box information only?

Does the SSD network use the truncated or occluded fields if they were to be set in the tfrecord example?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Hemanth2396 picture Hemanth2396  路  60Comments

DanMossa picture DanMossa  路  48Comments

pjeambrun picture pjeambrun  路  51Comments

kirk86 picture kirk86  路  63Comments

ludazhao picture ludazhao  路  111Comments