Deeplabcut: Letter 'c' in file name of labeled images results in no training set being generated (not using cropping)

Created on 5 Aug 2020  路  6Comments  路  Source: DeepLabCut/DeepLabCut

_This issue may be due to non-standard use of DLC:_
Having a letter 'c' in the file name of all labeled images (when not using cropping) has as a result that no training set is being generated. This is due to the calling of filename.split("c")[0] at line 107 and subsequently img_names = Data.index.map(strip_cropped_image_name).unique() at line 109 of multiple_individuals_trainingsetmanipulation.py
When all images have a 'c' in the file name, len(img_names) = 1 resulting in all frames being assigned to the test set and none to the training set. This does not produce an error, yet it creates an empty training set.

Work around:
While it is easy to remove the letter 'c' from all filenames if one is aware of this issue before hand, this is cumbersome to run into after all frames have been labeled. A work around is changing the search string 'c' in filename.split("c")[0], to something compatible with the file naming system a user may have. In order to make this compatible with cropimagesandlabels, one needs to change the new image name generated at line 361 of trainingsetmanipulation.py accordingly.

  • OS: Ubuntu 18.04 and macOS
  • DeepLabCut Version: 2.2b7

To Reproduce
Steps to reproduce the behavior, i.e.:

  1. Create custom code to extract frames, have 'c' in file name.
  2. run deeplabcut.create_multianimaltraining_dataset(config_path)
  3. empty training set is generated.

TRACEBACK

import os
from pathlib import Path
import deeplabcut

config_path = '/Users/felix/biteOscope/DLC_aedes/aedes-gefe-2020-07-23/config.yaml'

deeplabcut.create_multianimaltraining_dataset(config_path) 


Expected behavior
I understand that this issue would not occur when extracting frames using the standard code, so this is likely not a problem for typical use cases. However, it would be nice if e.g. custom file names in the labeled frames would not result in issues downstream.

Additional context
I use custom code to extract frames as my data comes in image sequences (tiff's) instead of video files, which is currently not supported by extract_frames.

Most helpful comment

@MMathisLab, although in that specific case, things will get wild as image names will then contain two Cs 馃槄

All 6 comments

Hey @felixhol, thanks for reporting this issue! I think I have a fix, I will update you soon 馃槉

good catch, thanks @felixhol (but do note, we really recommend running cropandlabel, it should be required honestly.

@MMathisLab, although in that specific case, things will get wild as image names will then contain two Cs 馃槄

awesome - thanks a lot for the (fast) fix!

hi! getting back to this (and to the already labeled data that has a 'c' in the file name) - the patch only produces the desired result when training on not cropped data, on cropped data it truncates all filenames at the first occurrence of a 'c' (and depending on file naming that gives very strange results). Splitting on the last 'c' of the filename would be preferable as that would (pretty much always) result in the desired string.
Work around for now is to use filename.split("c")[0] + filename.split("c")[1] if cfg["croppedtraining"] as all my cropped data has 1 c in the filename and another one added during cropping.

Was this page helpful?
0 / 5 - 0 ratings