Models: I want to train my own image dataset

Created on 3 Jul 2017  路  30Comments  路  Source: tensorflow/models

Hi.

I want to train my own image dataset.
can I train with api code?

If I can train my own image dataset, how can I do.

community support docs

Most helpful comment

And you have to create a pipeline config. It is basically all documented in https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_locally.md.

For me it isn't the most helpful piece of documentation because it isn't describing the referenced config files nor is it giving an example of the config file (there are some examples in the models repository).

Probably, once I finish my work on my own dataset I will write something about my journey to help others.

All 30 comments

Does you want to train a new attention_ocr model?

I want to learn different categories of (I collected) image data as an api model.
In particular, object_detection api.

What kind of training data is needed? (For example folders of images arranged by category)

How do I use train.py?
I'd like to give an example.

I have the same question. I want use my own images to train, and my own images to test, and I don't know how.

I have the same question just like everyone else. I am working on my own solution to accomplish my goal.

I currently use LabelImg to create the required files. I copied the create_pascal_tf_record.py file to use it with my dataset. Look for example at the VOC dataset and the file setup.

I'm slowly progressing towards a dataset with the right source files. To me it looks like we only have to change the files in the root folder of object_detection.

And you have to create a pipeline config. It is basically all documented in https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_locally.md.

For me it isn't the most helpful piece of documentation because it isn't describing the referenced config files nor is it giving an example of the config file (there are some examples in the models repository).

Probably, once I finish my work on my own dataset I will write something about my journey to help others.

@ArjanSchouten, please do, that would be awesome. Thanks for helping @Heidisnaps!

We've just added documentation about using your own data. This assumes that you have your dataset (images and box labels) in a format you're familiar with and guides you on how to write a script that will convert it to TF Record.

We don't have a preferred labelling tool, but other users have brought up a few reasonable suggestions.

@derekjchow Nice! Exactly as I did it so far. Sure, tooling doesn't need to be mentioned there.

I went ahead and shared my experience on how to do this on SO

Thanks everyone!
I will try it!!

I try to use my own dataset. This dataset have 20 Smartphone and 20 guns. I know that is little dataset, but is to try. I do the next step:

Download the git proyect, extract and go to the directory models-master/ and do:

protoc object_detection/protos/*.proto --python_out=.
export PYTHONPATH=$PYTHONPATH:pwd:pwd/slim
python object_detection/builders/model_builder_test.py
cd object_detection/

I do the directory annotations/xmls with the xmls of images, and directory images with .jpg files. Then, I do the trainval.txt file, in annotations directory, with the list of all .jpg files.

ls images/ | sed -e 's/..*$//' > annotations/trainval.txt
cp create_pet_tf_record.py create_detector_tf_record.py

Create in directory data/ file detector_label_map.pbtxt with the name of class. Change in create_detector_tf_record.py the word pet by detector and do:

python create_detector_tf_record.py --data_dir=pwd --output_dir=pwd
cp samples/configs/faster_rcnn_resnet101_pets.config ./faster_rcnn_resnet101_detector.config
nano faster_rcnn_resnet101_detector.config
Change:

  • num_classes: 2
  • fine_tune_checkpoint : "./object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt"
  • In train_input_reader: input_path: "./object_detection/detector_train.record"
    label_map_path: "./object_detection/data/detector_label_map.pbtxt"
  • num_examples : 5
  • In eval_input_reader: input_path: "./object_detection/detector_val.record"
    label_map_path: "./object_detection/data/detector_label_map.pbtxt"

wget http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
cd ..
python object_detection/train.py --logtostderr --pipeline_config_path=object_detection/faster_rcnn_resnet101_detector.config --train_dir=object_detection

This last step is very slow, and I have the next error/output:

INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead.
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2017-07-14 18:08:42.027459: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-14 18:08:42.027483: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-14 18:08:42.027489: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-14 18:08:42.027494: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-14 18:08:42.027499: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-14 18:08:42.175137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-14 18:08:42.175498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.645
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.64GiB
2017-07-14 18:08:42.175510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-07-14 18:08:42.175532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-07-14 18:08:42.175537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
2017-07-14 18:08:43.004349: I tensorflow/core/common_runtime/simple_placer.cc:675] Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0
INFO:tensorflow:Restoring parameters from ./object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path object_detection/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
Terminado (killed)

I don't know what is my error. I do the same step whit the other dataset that I do like this: https://github.com/tensorflow/models/issues/1843#issuecomment-315126074
Thanks everyone!

@FPerezHernandez92 Did you find a solution to your problem, I'm getting the same and followed the same procedures.

@mattryles I can't solve the problem.

Sorry for the inconvenience @ArjanSchouten but, you know what is the problem that I have? Thanks for the help.

Hey @FPerezHernandez92 ,,

I solved the problem this morning, it was running out of memory. I set up some swap (100GB) on my hard drive this morning and its run fine.

I delete some files and I have more free space. Now when I run the last command, I have the next problem, but I don't know what it is.

https://pastebin.com/3fymSeFF

Thanks for the help.

I hit this problem and cannot remember how I fixed it, i think my config file was using incorrect file paths.
Sorry I can't be of any more help.

@FPerezHernandez92 Try deleting the contents in your training directory. At first glance it seems some parameters have changed between your training jobs.

Finally, I have used the example of the animal detector and I have changed a class of dogs by the smartphone. But I have the following problem when I am training:

INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead. INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead. /home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " 2017-07-20 16:54:34.769730: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-07-20 16:54:34.769756: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-07-20 16:54:34.769768: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-07-20 16:54:34.769777: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-07-20 16:54:34.769787: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-07-20 16:54:34.910759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-07-20 16:54:34.911087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate (GHz) 1.645 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 7.57GiB 2017-07-20 16:54:34.911101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-07-20 16:54:34.911106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-07-20 16:54:34.911114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0) 2017-07-20 16:54:35.746541: I tensorflow/core/common_runtime/simple_placer.cc:675] Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0 INFO:tensorflow:Restoring parameters from ./object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt INFO:tensorflow:Starting Session. INFO:tensorflow:Saving checkpoint to path object_detection/model.ckpt INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:Recording summary at step 0. INFO:tensorflow:global step 1: loss = 5.3051 (6.512 sec/step) 2017-07-20 16:54:48.659001: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2383 get requests, put_count=1964 evicted_count=1000 eviction_rate=0.509165 and unsatisfied allocation rate=0.637432 2017-07-20 16:54:48.659028: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110 INFO:tensorflow:global step 2: loss = 5.0217 (1.308 sec/step) INFO:tensorflow:global step 3: loss = 4.5842 (1.009 sec/step) INFO:tensorflow:global step 4: loss = 4.6999 (1.271 sec/step) INFO:tensorflow:global step 5: loss = 4.5648 (0.493 sec/step) INFO:tensorflow:global step 6: loss = 4.4153 (0.434 sec/step) INFO:tensorflow:global step 7: loss = 3.3839 (0.421 sec/step) INFO:tensorflow:global step 8: loss = 2.8255 (0.582 sec/step) INFO:tensorflow:global step 9: loss = 2.2235 (1.274 sec/step) INFO:tensorflow:global step 10: loss = 1.9660 (0.422 sec/step) INFO:tensorflow:global step 11: loss = 1.8010 (0.413 sec/step) INFO:tensorflow:global step 12: loss = 1.8422 (0.406 sec/step) INFO:tensorflow:global step 13: loss = 1.5978 (1.202 sec/step) INFO:tensorflow:global step 14: loss = 1.7833 (1.278 sec/step) INFO:tensorflow:global step 15: loss = 1.3990 (0.447 sec/step) INFO:tensorflow:global step 16: loss = 0.9678 (1.264 sec/step) INFO:tensorflow:global step 17: loss = 1.6968 (0.462 sec/step) 2017-07-20 16:55:01.549106: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2110 get requests, put_count=2062 evicted_count=1000 eviction_rate=0.484966 and unsatisfied allocation rate=0.507583 2017-07-20 16:55:01.549129: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281 INFO:tensorflow:global step 18: loss = 1.3028 (1.311 sec/step) INFO:tensorflow:global step 19: loss = 0.8693 (0.752 sec/step) INFO:tensorflow:global step 20: loss = 0.9911 (1.217 sec/step) INFO:tensorflow:global step 21: loss = 1.0741 (0.404 sec/step) INFO:tensorflow:global step 22: loss = 2.2188 (0.771 sec/step) INFO:tensorflow:global step 23: loss = 5.9905 (1.197 sec/step) INFO:tensorflow:global step 24: loss = 2.6961 (0.410 sec/step) Improper call to JPEG library in state 203 INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Invalid JPEG data, size 3210028 [[Node: case/If_0/decode_image/cond_jpeg/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/DecodeJpeg/Switch:1, ^case/Assert/AssertGuard/Merge, ^case/If_0/decode_image/cond_jpeg/Assert/Assert)]] INFO:tensorflow:global step 25: loss = 2.3532 (0.444 sec/step) INFO:tensorflow:Finished training! Saving model to disk. Traceback (most recent call last): File "object_detection/train.py", line 198, in <module> tf.app.run() File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "object_detection/train.py", line 194, in main worker_job_name, is_chief, FLAGS.train_dir) File "/home/fperez/Documentos/PruebasDosClases/5ModificarTrain/object_detection/trainer.py", line 290, in train saver=saver) File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 759, in train sv.saver.save(sess, sv.save_path, global_step=sv.global_step) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session self.stop(close_summary_writer=close_summary_writer) File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop stop_grace_period_secs=self._stop_grace_secs) File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run enqueue_callable() File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1063, in _single_operation_run target_list_as_strings, status, None) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/fperez/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid JPEG data, size 3210028 [[Node: case/If_0/decode_image/cond_jpeg/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/DecodeJpeg/Switch:1, ^case/Assert/AssertGuard/Merge, ^case/If_0/decode_image/cond_jpeg/Assert/Assert)]]

I don't know what is the problem.

when i do python train.py --checkpoint=model.ckpt-399731
Error:
Traceback (most recent call last):
File "train.py", line 30, in
import common_flags
File "E:Workplacetensorflowresultmodels-masterattention_ocrpythoncommon_flags.py", line 22, in
import datasets
File "E:Workplacetensorflowresultmodels-masterattention_ocrpythondatasets_init_.py", line 16, in
import fsns
ImportError: No module named 'fsns

@fistix you are probably in the wrong directory. you should run from the models/research directory, like this: python object_detection/train.py ...

@fistix you need to set up the PYTHONPATH system variable. E.g.
bash:
export PYTHONPATH=$PYTHONPATH:<your-path>/models/research/attention_ocr/python/datasets
fish-shell:
set -x PYTHONPATH <your-path>/models/research/attention_ocr/python/datasets

@FPerezHernandez92 , I meet a silimar question with you when i train models on google TensorFlow Object Detection API, May i ask if the question has been solved? what's the problem and how to solved? thank you!

I think it may have ran out of memory, trying halfing the size of the images, if it runs then it was.

@clovking the problem in his case was InvalidArgumentError: Invalid JPEG data, size 3210028 (among the last lines of the error). He probably had an incorrect JPEG. So check your input data

when i do python train.py
Error:
Traceback (most recent call last):
File "train.py", line 397, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 249, in main
FLAGS.dataset, FLAGS.train_split, dataset_dir=FLAGS.dataset_dir)
File "/home/mxx/deeplabv3+/models-master/research/deeplab/datasets/segmentation_dataset.py", line 141, in get_dataset
raise ValueError('The specified dataset is not supported yet.')
ValueError: The specified dataset is not supported yet.

I don't know what is the problem.

Thanks for the help.

Still need help with training own image dataset? @Heidisnaps

Hi everyone,

I worked with the vid2depth package for a whole summer and have some insight into the process of training a model with a private dataset. I have a bit of information at https://github.com/mrevsine/vid2depth_modifications. Feel free to reach out with any questions!

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Was this page helpful?
0 / 5 - 0 ratings