Deeplabcut: Inconsistency with dataset sizes

Created on 3 May 2020  路  5Comments  路  Source: DeepLabCut/DeepLabCut

Ubuntu 19.10 with an Conda Env & DeepLabCut 2.1.7.

With n=959 dataset and TrainingFraction 95% training Matlab dataset size is 824. Should it be 911, or why the value is lower?

The command was: deeplabcut.create_training_dataset(config_path). Console print lists indexes correctly 911 for the training and 48 for the test.

Most helpful comment

Yes, the training-set code excludes images that have no label whatsoever, see: https://github.com/AlexEMG/DeepLabCut/blob/master/deeplabcut/generate_training_dataset/trainingsetmanipulation.py#L561

(Note that the distribution of bodyparts per image otherwise is flexible and the training code is written such that it works no matter how many labels are present)

This design choice was taken to assure that images that were forgotten to be labeled are excluded. I do agree though, that for a well annotated datasets also using "empty" images is advantageous. (For cases with multiple animals, this is will for instance be changed)

All 5 comments

Still investigating, but this seems to result from images that have no annotated bodyparts. I have images that have no target to annotate and images where target is only partially visible. Images without any annotations are not included in the Matlab-file? Shuffle is done before excluding the images?

Yes, the training-set code excludes images that have no label whatsoever, see: https://github.com/AlexEMG/DeepLabCut/blob/master/deeplabcut/generate_training_dataset/trainingsetmanipulation.py#L561

(Note that the distribution of bodyparts per image otherwise is flexible and the training code is written such that it works no matter how many labels are present)

This design choice was taken to assure that images that were forgotten to be labeled are excluded. I do agree though, that for a well annotated datasets also using "empty" images is advantageous. (For cases with multiple animals, this is will for instance be changed)

Okay, thank you for verifying this. That line slipped under my radar, thanks. Do you think that it is safe to just use all images, if I make a workaround for this? Or is there something to be taken into account?

(For cases with multiple animals, this is will for instance be changed)

Hi @lauritk - in the next update this behavior will indeed change.

Hi @lauritk - in the next update this behavior will indeed change.

Good to know, thanks! Will test that when available.

Was this page helpful?
0 / 5 - 0 ratings