Deeplabcut: Network convergence criterion

Created on 5 May 2019 · 6Comments · Source: DeepLabCut/DeepLabCut

This is more of a question than an issue, but how does DeepLabCut determine when training is complete? I left a network to train overnight, and the GPU is still churning after 12+ hours. Currently it's on iteration 900,000 with a loss of 0.0003. The loss has been oscillating between 0.0003 and 0.0004 for the past 200,000 iterations. I think it should be fine to stop now--it's my first time, so I just wanted a rough idea of how well DLC works before tweaking some parameters.

Source

jchutrue

Most helpful comment

Ahhh, that makes sense. I think that would be a great parameter to include in Box 2! Thanks for the clarification.

jchutrue on 5 May 2019

👍2

All 6 comments

The network currently stops at the iteration you set in pose_config.yaml (default is ~1M). From the NN paper you can see that it typically converges around 200-400K (I personally just let it run to about 650K).

Please see Box 2 in the docs for details on setting network parameters.

Also, for questions and not bug reports we please ask you use Image Forum, thanks!

https://github.com/AlexEMG/DeepLabCut/blob/master/README.md

https://forum.image.sc/tags/deeplabcut

MMathisLab on 5 May 2019

I did make sure to consult the usage guide at https://www.biorxiv.org/content/biorxiv/early/2018/11/24/476531.full.pdf, specifically box 2 for relevant parameters, but I couldn't find a description for maximum number of iterations or something similar.

Neither did I see a parameter like that in my pose_config.yaml. (Note: I had to change the file extension so GitHub would allow me to upload it, but the content hasn't changed) If I totally missed something in the manual, I'd love to know!

And definitely, next time I will post in the appropriate place!

pose_cfg.txt

jchutrue on 5 May 2019

No worries.

The variable “multi_step” shows the learning rate followed by # iterations.

if you want it run less than 1030000 you can change that value to be 40000 or 650000, etc.

MMathisLab on 5 May 2019

Ahhh, that makes sense. I think that would be a great parameter to include in Box 2! Thanks for the clarification.

jchutrue on 5 May 2019

👍2

Good point; I’ll double check it’s in the final version of the manuscript before it comes out, but it is on GitHub:

https://github.com/AlexEMG/DeepLabCut/blob/master/docs/functionDetails.md#f-create-training-dataset

https://camo.githubusercontent.com/663a60c8ec32c35be321b4c827fe415a839a4e05/68747470733a2f2f737461746963312e73717561726573706163652e636f6d2f7374617469632f3537663664353163396637343536366635356563663237312f742f3563343065666265626261323233373330383832633264652f313534373735393635363332342f426f78322d30312e706e673f666f726d61743d3130303077

MMathisLab on 5 May 2019

@jchutrue there is also a variable called maxiters already, so you can pass to set this from the deeplabcut.train_network (ie. set maxiters=200000)

From the dostrings: ie. if you run

deeplabcut.train_network?

Signature: deeplabcut.train_network(config, shuffle=1, trainingsetindex=0, gputouse=None, max_snapshots_to_keep=5, autotune=False, 
displayiters=None, saveiters=None, maxiters=None)
Docstring:
Trains the network with the labels in the training dataset.

Parameter
----------
config : string
    Full path of the config.yaml file as a string.

shuffle: int, optional
    Integer value specifying the shuffle index to select for training. Default is set to 1

trainingsetindex: int, optional
    Integer specifying which TrainingsetFraction to use. By default the first (note that TrainingFraction is a list in config.yaml)
.

gputouse: int, optional. Natural number indicating the number of your GPU (see number in nvidia-smi). If you do not have a GPU put 
None.
See: https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries

Additional parameters:

max_snapshots_to_keep: int, or None. Sets how many snapshots are kept, i.e. states of the trained network. Every savinginteration many times 
a snapshot is stored, however only the last max_snapshots_to_keep many are kept! If you change this to None, then all are kept. 
See: https://github.com/AlexEMG/DeepLabCut/issues/8#issuecomment-387404835

autotune: property of TensorFlow, somehow faster if 'false' (as Eldar found out, see https://github.com/tensorflow/tensorflow/issues/13317). Default: False

displayiters: this variable is actually set in pose_config.yaml. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out 
the pose_config.yaml file for the corresponding project. If None, the value from there is used, otherwise it is overwritten! Default: None

saveiters: this variable is actually set in pose_config.yaml. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out 
the pose_config.yaml file for the corresponding project. If None, the value from there is used, otherwise it is overwritten! Default: None

maxiters: this variable is actually set in pose_config.yaml. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out 
the pose_config.yaml file for the corresponding project. If None, the value from there is used, otherwise it is overwritten! Default: None

Example
--------
for training the network for first shuffle of the training dataset.
>>> deeplabcut.train_network('/analysis/project/reaching-task/config.yaml')
--------

for training the network for second shuffle of the training dataset.
>>> deeplabcut.train_network('/analysis/project/reaching-task/config.yaml',shuffle=2)

MMathisLab on 7 May 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings