The new Object Detection model has some great tutorials for training on PASCAL VOC and Pets, and example scripts for how to create TFRecords. The documentation and tutorials reference scripts for creating your own custom datasets... but these scripts focus only on the creation of TFRecords. I would argue that the biggest problem people, including myself, have with creating our own custom datasets for a model like this, is that it's apparently assumed that we know how to create the bounding box annotations for our images. This is anything but simple, and there is a huge void around how to accomplish this. The best I've located so far is for something called Yolo with Darknet (which seems remarkably similar, and predates, the object detection model here). The information about creating bounding boxes annotations for Yolo involves using a an application to create individual text files for each image, and then running the Yolo/Darknet code that turns those individual text files into a PASCAL VOC style annotation file.
My feature request is that a tutorial be added, or an application added, to the object detection model for creating the annotations. One of the most frustrating assumptions made by tensorflow developers appears to be that everyone is born understanding how to create their labels.
Cheers, and thank you!
hi, i share your concern. Am in the process of implementing bounding box prediction for kitti data set, i worte the whole code from scratch, will share here once am done.
That would be fantastic! Thank you in advance!
It's exactly where I'm stuck, I run the examples but I can not create my own dataset because it's not clear to me how to do it, your problem description was perfect, I already posted a Issue about it but nobody answered, please, when someone has a clear example post here that will be helping the community.
Will do., btw in the mean time, you look at lines 96-120 here, https://github.com/tensorflow/models/blob/master/object_detection/create_pascal_tf_record.py
I used this script and tried to adapt it to my images, but I did not succeed, because it presents an error that says missing the annotations . I see that this piece you suggested is to generate the annotations, but how to use it? Could you give a simple example with a folder containing the images a.jpeg, b.jpeg and c.jpeg?
Thanks man!
_Sent from my Motorola XT1058 using FastHub_
Thanks vxy, that's very helpful. Dark, if you download the Pascal VOC dataset you can use the images and annotations as examples (and have thousands of examples). The trick would be to create or find an application capable of letting you create/draw bounding boxes on your imagery and having them saved in the Pascal VOC annotation (xml) format.
I haven't tried this one yet, but it seems promising: https://github.com/tzutalin/labelImg
Thanks aloech is that, my question then is: How can I create bounding box annotations PASCAL VOC (xml files), from my folderv "/myimages" containing "a.jpeg, b.jpeg and c.jpeg"?
_Sent from my Motorola XT1058 using FastHub_
Similar to what @aloerch offers, Sloth is an easy GUI for creating bounding box annotations, in a JSON like format. Once you have annotations in any form, you can create a custom script that reads the annotations however you have them stored and turns them into what TFRecords needs, turning them into PASCAL VOC format is an unnecessary middle man.
Thanks Micahprice its awesome! Is the sloth data output even a .json file or can it be converted into a VOC compliant PASCAL? I think there is a solution here :)
_Sent from my Motorola XT1058 using FastHub_
We're working on documenting how you can bring in your own dataset. It should be coming soon!
For those who can't wait, take a look at the create_pascal_tf_record.py and create_pet_tf_record.py files in the object_detection directory (specifically the dict_to_tf_example
function). These scripts show how we read data from the PASCAL VOC format to the TFRecord format used by the object_detection API
Thanks for the help derekjchow, this is very important to us. As tfrecord interprets VOC data PASCAL seems to be now quite clear to me. The question is:
How to convert images in jpeg format to the data bounding box, VOC PASCAL, to TFrecord read?
_Sent from my Motorola XT1058 using FastHub_
Dark, like I mentioned, you can use the program I linked to (I'm trying it out now, it's not my program...) which writes annotations natively in PASCAL VOC xml format. You get a jpeg, and an xml. At that point, you can use/modify the create tfrecord from pascal script to create the records. Using sloth would work to, but it actually doesn't cut out a middle-man... it makes you create your own middle-man. If you're struggling to figure out how to create the annotations and tfrecords, then you probably don't want to start by having to create your own custom script.
Has anyone tried to build a convolutional NN on top of a trained object classifier? I am struggling with the same problem as of you 馃槩
Now the tutorial is available. Thank you @derekjchow!
Thank you, that tutorial is very helpful. I'm closing this issue :+1:
@vxy10 did you ever finish the kitti project? writing the same converter now
the tutorial link @korrawat mentions is broken
Most helpful comment
hi, i share your concern. Am in the process of implementing bounding box prediction for kitti data set, i worte the whole code from scratch, will share here once am done.