Would be really great to see a step by step guide line on how to use deeplabV3 to train on own datasets, including:
format of the image annotation (e.g. one folder contains the original images, the other folder contains their multi color image mask),
parameter settings for tf_initial_checkpoint, initialize_last_layer, fine_tune_batch_norm. when to set False/True for these values. Also, I wonder if the parameters should be set differently when training on a small datasets, e.g. 100 images with 10 classes?
I notice these two issues are not solved and I've encountered both: please google search for "deeplab CUDA_ERROR_LAUNCH_FAILED" and "deeplab Tensor had NaN values". Adjusting the above parameter helps thus it would be really useful to know how to use them.
Could you please provide the version of Cuda, cudnn, tensorflow, python that you have successfully tested?
Also, which initial checkpoint to use for different segmentation task, is it the deeplabv3_xception? but the recently update on the ade20k mentioned deeplabv3_pascal_aug. it's a bit confusing.
finally, the reason to remove colormap in the ground truth annotations for VOC but not ade20k
Thanks a lot!
Thank you, khcy82dyc, for the suggestion.
Regarding the question related to ADE20K, @walkerlala Could you please help answer those questions? Thanks!
I will try to submit a pull request to update the FAQ in the near future. For now, @khcy82dyc
regarding to preparing new datasets, I think reading the code in build_voc2012_data.py and build_ade20k_data.py will offer great help. Basically what they do is to convert those source images into files in tfrecord format (which is basically sstable...yes, leveldb)
regarding to the parameter setting, please see aquariusjay's answers in this issue. It is a bit difficult to explain all the parameters if one don't know how deeplab works. I will try to add more here in the future once I have read all the source code.
regarding to the checkpoint to use, whichever you use will be fine. Checkpoints are just starting points. You base your training on those checkpoints, such that you don't have to train the model from scratch. For the usage of each checkpoint, please see the description in the model zoo. Currently I can't produce a fine-tuned checkpoint for ADE20K. Hopefully Google can do it in the future! @aquariusjay . And, one thing you may want to know: when referring the the checkpoints in command line, use model.ckpt instead of model-30000.ckpt. Deeplab will automatically find the most recent checkpoint for you.
And, regarding to the two issues you referred, I am sorry I can't help that much now. Please search this repo with tag 'deeplab' for similar issue, for example, this one.
I currently use cuda-9.0, cudnn-7.0.4, tensorflow1.6.0, python2.7.5 on a Centos7 Linux Box. Below is my GPU info:
| NVIDIA-SMI 390.42 Driver Version: 390.42 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080Ti Off | 00000000:05:00.0 Off | N/A |
| 25% 46C P0 54W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080Ti Off | 00000000:09:00.0 Off | N/A |
| 0% 45C P0 53W / 250W | 0MiB / 11178MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
Can we also specify how much memory is required?
Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Most helpful comment
I will try to submit a pull request to update the FAQ in the near future. For now, @khcy82dyc
regarding to preparing new datasets, I think reading the code in build_voc2012_data.py and build_ade20k_data.py will offer great help. Basically what they do is to convert those source images into files in tfrecord format (which is basically sstable...yes, leveldb)
regarding to the parameter setting, please see aquariusjay's answers in this issue. It is a bit difficult to explain all the parameters if one don't know how deeplab works. I will try to add more here in the future once I have read all the source code.
regarding to the checkpoint to use, whichever you use will be fine. Checkpoints are just starting points. You base your training on those checkpoints, such that you don't have to train the model from scratch. For the usage of each checkpoint, please see the description in the model zoo. Currently I can't produce a fine-tuned checkpoint for ADE20K. Hopefully Google can do it in the future! @aquariusjay . And, one thing you may want to know: when referring the the checkpoints in command line, use model.ckpt instead of model-30000.ckpt. Deeplab will automatically find the most recent checkpoint for you.
And, regarding to the two issues you referred, I am sorry I can't help that much now. Please search this repo with tag 'deeplab' for similar issue, for example, this one.
I currently use cuda-9.0, cudnn-7.0.4, tensorflow1.6.0, python2.7.5 on a Centos7 Linux Box. Below is my GPU info: