Models: [deeplab] [feature request] update FAQ for 'train the model on other datasets'

Created on 10 Apr 2018  路  4Comments  路  Source: tensorflow/models

Would be really great to see a step by step guide line on how to use deeplabV3 to train on own datasets, including:

format of the image annotation (e.g. one folder contains the original images, the other folder contains their multi color image mask),

parameter settings for tf_initial_checkpoint, initialize_last_layer, fine_tune_batch_norm. when to set False/True for these values. Also, I wonder if the parameters should be set differently when training on a small datasets, e.g. 100 images with 10 classes?

I notice these two issues are not solved and I've encountered both: please google search for "deeplab CUDA_ERROR_LAUNCH_FAILED" and "deeplab Tensor had NaN values". Adjusting the above parameter helps thus it would be really useful to know how to use them.

Could you please provide the version of Cuda, cudnn, tensorflow, python that you have successfully tested?

Also, which initial checkpoint to use for different segmentation task, is it the deeplabv3_xception? but the recently update on the ade20k mentioned deeplabv3_pascal_aug. it's a bit confusing.

finally, the reason to remove colormap in the ground truth annotations for VOC but not ade20k

Thanks a lot!

Most helpful comment

I will try to submit a pull request to update the FAQ in the near future. For now, @khcy82dyc

  1. regarding to preparing new datasets, I think reading the code in build_voc2012_data.py and build_ade20k_data.py will offer great help. Basically what they do is to convert those source images into files in tfrecord format (which is basically sstable...yes, leveldb)

  2. regarding to the parameter setting, please see aquariusjay's answers in this issue. It is a bit difficult to explain all the parameters if one don't know how deeplab works. I will try to add more here in the future once I have read all the source code.

  3. regarding to the checkpoint to use, whichever you use will be fine. Checkpoints are just starting points. You base your training on those checkpoints, such that you don't have to train the model from scratch. For the usage of each checkpoint, please see the description in the model zoo. Currently I can't produce a fine-tuned checkpoint for ADE20K. Hopefully Google can do it in the future! @aquariusjay . And, one thing you may want to know: when referring the the checkpoints in command line, use model.ckpt instead of model-30000.ckpt. Deeplab will automatically find the most recent checkpoint for you.

And, regarding to the two issues you referred, I am sorry I can't help that much now. Please search this repo with tag 'deeplab' for similar issue, for example, this one.

I currently use cuda-9.0, cudnn-7.0.4, tensorflow1.6.0, python2.7.5 on a Centos7 Linux Box. Below is my GPU info:

| NVIDIA-SMI 390.42                 Driver Version: 390.42                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080Ti  Off  | 00000000:05:00.0 Off |                  N/A |
| 25%   46C    P0    54W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080Ti  Off  | 00000000:09:00.0 Off |                  N/A |
|  0%   45C    P0    53W / 250W |      0MiB / 11178MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

All 4 comments

Thank you, khcy82dyc, for the suggestion.

Regarding the question related to ADE20K, @walkerlala Could you please help answer those questions? Thanks!

I will try to submit a pull request to update the FAQ in the near future. For now, @khcy82dyc

  1. regarding to preparing new datasets, I think reading the code in build_voc2012_data.py and build_ade20k_data.py will offer great help. Basically what they do is to convert those source images into files in tfrecord format (which is basically sstable...yes, leveldb)

  2. regarding to the parameter setting, please see aquariusjay's answers in this issue. It is a bit difficult to explain all the parameters if one don't know how deeplab works. I will try to add more here in the future once I have read all the source code.

  3. regarding to the checkpoint to use, whichever you use will be fine. Checkpoints are just starting points. You base your training on those checkpoints, such that you don't have to train the model from scratch. For the usage of each checkpoint, please see the description in the model zoo. Currently I can't produce a fine-tuned checkpoint for ADE20K. Hopefully Google can do it in the future! @aquariusjay . And, one thing you may want to know: when referring the the checkpoints in command line, use model.ckpt instead of model-30000.ckpt. Deeplab will automatically find the most recent checkpoint for you.

And, regarding to the two issues you referred, I am sorry I can't help that much now. Please search this repo with tag 'deeplab' for similar issue, for example, this one.

I currently use cuda-9.0, cudnn-7.0.4, tensorflow1.6.0, python2.7.5 on a Centos7 Linux Box. Below is my GPU info:

| NVIDIA-SMI 390.42                 Driver Version: 390.42                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080Ti  Off  | 00000000:05:00.0 Off |                  N/A |
| 25%   46C    P0    54W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080Ti  Off  | 00000000:09:00.0 Off |                  N/A |
|  0%   45C    P0    53W / 250W |      0MiB / 11178MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

Can we also specify how much memory is required?

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

trungdn picture trungdn  路  3Comments

nmfisher picture nmfisher  路  3Comments

Mostafaghelich picture Mostafaghelich  路  3Comments

airmak picture airmak  路  3Comments

atabakd picture atabakd  路  3Comments