This is more of a question than an issue, but how does DeepLabCut determine when training is complete? I left a network to train overnight, and the GPU is still churning after 12+ hours. Currently it's on iteration 900,000 with a loss of 0.0003. The loss has been oscillating between 0.0003 and 0.0004 for the past 200,000 iterations. I think it should be fine to stop now--it's my first time, so I just wanted a rough idea of how well DLC works before tweaking some parameters.
The network currently stops at the iteration you set in pose_config.yaml (default is ~1M). From the NN paper you can see that it typically converges around 200-400K (I personally just let it run to about 650K).
Please see Box 2 in the docs for details on setting network parameters.
Also, for questions and not bug reports we please ask you use Image Forum, thanks!
I did make sure to consult the usage guide at https://www.biorxiv.org/content/biorxiv/early/2018/11/24/476531.full.pdf, specifically box 2 for relevant parameters, but I couldn't find a description for maximum number of iterations or something similar.
Neither did I see a parameter like that in my pose_config.yaml. (Note: I had to change the file extension so GitHub would allow me to upload it, but the content hasn't changed) If I totally missed something in the manual, I'd love to know!
And definitely, next time I will post in the appropriate place!
No worries.
The variable “multi_step” shows the learning rate followed by # iterations.
if you want it run less than 1030000 you can change that value to be 40000 or 650000, etc.
Ahhh, that makes sense. I think that would be a great parameter to include in Box 2! Thanks for the clarification.
Good point; I’ll double check it’s in the final version of the manuscript before it comes out, but it is on GitHub:
https://github.com/AlexEMG/DeepLabCut/blob/master/docs/functionDetails.md#f-create-training-dataset
@jchutrue there is also a variable called maxiters already, so you can pass to set this from the deeplabcut.train_network (ie. set maxiters=200000)
From the dostrings: ie. if you run
deeplabcut.train_network?
Signature: deeplabcut.train_network(config, shuffle=1, trainingsetindex=0, gputouse=None, max_snapshots_to_keep=5, autotune=False,
displayiters=None, saveiters=None, maxiters=None)
Docstring:
Trains the network with the labels in the training dataset.
Parameter
----------
config : string
Full path of the config.yaml file as a string.
shuffle: int, optional
Integer value specifying the shuffle index to select for training. Default is set to 1
trainingsetindex: int, optional
Integer specifying which TrainingsetFraction to use. By default the first (note that TrainingFraction is a list in config.yaml)
.
gputouse: int, optional. Natural number indicating the number of your GPU (see number in nvidia-smi). If you do not have a GPU put
None.
See: https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries
Additional parameters:
max_snapshots_to_keep: int, or None. Sets how many snapshots are kept, i.e. states of the trained network. Every savinginteration many times
a snapshot is stored, however only the last max_snapshots_to_keep many are kept! If you change this to None, then all are kept.
See: https://github.com/AlexEMG/DeepLabCut/issues/8#issuecomment-387404835
autotune: property of TensorFlow, somehow faster if 'false' (as Eldar found out, see https://github.com/tensorflow/tensorflow/issues/13317). Default: False
displayiters: this variable is actually set in pose_config.yaml. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out
the pose_config.yaml file for the corresponding project. If None, the value from there is used, otherwise it is overwritten! Default: None
saveiters: this variable is actually set in pose_config.yaml. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out
the pose_config.yaml file for the corresponding project. If None, the value from there is used, otherwise it is overwritten! Default: None
maxiters: this variable is actually set in pose_config.yaml. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out
the pose_config.yaml file for the corresponding project. If None, the value from there is used, otherwise it is overwritten! Default: None
Example
--------
for training the network for first shuffle of the training dataset.
>>> deeplabcut.train_network('/analysis/project/reaching-task/config.yaml')
--------
for training the network for second shuffle of the training dataset.
>>> deeplabcut.train_network('/analysis/project/reaching-task/config.yaml',shuffle=2)
Most helpful comment
Ahhh, that makes sense. I think that would be a great parameter to include in Box 2! Thanks for the clarification.