The bug
Previously I commented glitches in the label positions when relabeling or labeling new frames https://github.com/AlexEMG/DeepLabCut/issues/574#issuecomment-582221262
I tracked down the bug. In my case, I tried to create a project in Windows but I had problems when creating a symbolic link or copying over the videos. To save time, I created the project and labeled frames in Unix. Then I trained the model, and then refining labels or labeling frames, I would often see the labels of another frame into the wrong frame. As commented in #574 the frame number would even go to negative indexes.
I did not dive into the code, but the reason is: if I have 5 images in a video labeled folder, the script reads the 5th image and calls the label file (h5/csv) and looks for the 5th row (+3 header rows), without taking into account the file name that is present in the first column and same raw. In addition, the path in the name is hardcoded with system-dependent slashes ('/', '\').
This bug leads to many unexpected behaviors. In my case, labels were done in linux, re-labeling in Windows. The label file is sorted in descending order, '/' precedes '\', so all the linux created images were before all the windows ones, leading to unmatched label-frames. If an image is deleted, the code fails to get the right label positions.
This unexpected behavior has previously caused issues, and there are some patch functions such as convertannotationdata_fromwindows2unixstyle or dropannotationfileentriesduetodeletedimages. However, these do not address the problematic code. In my case, dropannotationfileentriesduetodeletedimages doesn't work as it just flips the slash to the opposite side without taking into account the O.S.
Solution
Don't hard code the row index of the frame label positions, or match the first col. name with the file name being read, or when creating the label file, sort the images names by taking the base name, not the hardcoded path with forw./back. slashes.
To Reproduce
Create two labeled frames in linux and two in windows. The label positions will be wrong.
Fix for unsorted rows in the CollectedData.csv files
I provide the code I used to change all the paths in the first column of the collectedData.csv. Then it
This sorts the rows without changing the names in the first column:
import pandas as pd
from glob import glob
import numpy as np
from pathlib import Path
path_root = "D:/projects/Ataxia/FingerTapping-Adonay-2020-01-29/labeled-data/"
path_labels = glob(path_root + '*Finger_Tapping')
csv_first_cols = ['scorer', 'bodyparts', 'coords']
nfirst = len(csv_first_cols)
REMOVE_FIRST = True # case repeated remove the first entry, sorted: '/', '\'
for p in path_labels:
path_csv = glob(p + '/CollectedData_*.csv')
for c in path_csv:
df = pd.read_csv(c,index_col=None,header=None)
# make sure right file/order columns
for i, n in enumerate(csv_first_cols):
assert(df.iloc[i,0] == n)
frames_names = df.iloc[3:,0].to_list()
f = [Path(f).name for f in frames_names]
ix_sort = np.argsort(f)
# Remove repeated frame entries (one created when getting frames,
# other when getting frames for relabeling)
assert(len(f) == len(set(f)))
if not len(f) == len(set(f)):
print('Fixing folder: ', c[len(path_root):])
rep_ix = []
for rep in set([x for x in f if f.count(x) > 1]):
print ('Found repeted labeled frame image:', rep)
rep_ix.append([ i for i, x in enumerate(f) if x == rep])
ix_out = np.vstack([i[:-1] if REMOVE_FIRST
else i[1:] for i in rep_ix])
ix_sort = ix_sort[np.all(ix_sort != ix_out, axis=0)]
if not np.all(ix_sort == np.arange(len(f))):
print('Sorting entries from: \n\t', c[len(path_root):])
ix_sort += nfirst
ix_sort = np.append(np.arange(nfirst), ix_sort)
df = df.reindex(ix_sort)
df.to_csv(c,index=None, header=None, sep=",")
This changes the names in the first row of the CollectedData.csv using the current system slash type:
import pandas as pd
from glob import glob
from pathlib import Path
path_root = "D:/projects/Ataxia/FingerTapping-Adonay-2020-01-29/labeled-data/"
path_labels = glob(path_root + '*Finger_Tapping')
csv_first_cols = ['scorer', 'bodyparts', 'coords']
nfirst = len(csv_first_cols)
REMOVE_FIRST = True # case repeated remove the first entry, sorted: '/', '\'
for p in path_labels:
print(p)
path_csv = glob(p + '/CollectedData_*.csv')
for c in path_csv:
df = pd.read_csv(c,index_col=None,header=None)
for i, cv in enumerate(df.iloc[:, 0]):
df.iloc[i, 0] = Path(cv)
df.to_csv(c,index=None, header=None, sep=",")
cfg = 'path/config.yaml'
Then:
dp.convertcsv2h5(cfg)
Ok, now that might be true. I have never tested the case where one labels and refines on different platforms for the same project (i.e. linux/mac vs. windows). We'll test this problem. I think for non cross-OS projects this problem does not exist.
yes, if labels for the same video are created in windows and unix, it will have this problem. Same if one frame image is deleted in the folder, or whenever the assumption that the nth frame image in the folder corresponds to the nth + nheaders row in the label file does not hold, it will also fail.
‚Same if one frame image is deleted in the folder, or whenever the assumption that the nth frame image in the folder corresponds to the nth + nheaders row in the label file does not hold, it will also fail.‘
That is not true, as it uses the address of frame rather than the index position....
From: Adonay Nunes notifications@github.com
Sent: Sunday, February 9, 2020 14:52
To: AlexEMG/DeepLabCut
Cc: Mathis, Alexander Thomas; Comment
Subject: Re: [AlexEMG/DeepLabCut] Wrong indexing of label positions (#581)
yes, if labels for the same video are created in windows and unix, it will have this problem. Same if one frame image is deleted in the folder, or whenever the assumption that the nth frame image in the folder corresponds to the nth + nheaders row in the label file does not hold, it will also fail.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_AlexEMG_DeepLabCut_issues_581-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAE7CMXSXLMOXC34HNYN57BLRCAC77A5CNFSM4KRYCEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGM7BA-23issuecomment-2D583847812&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=11wEEDBv3Ke3n3b8dICjuQC5vgZ23dfGPax018VOZ2g&m=gW8eSylxe-BKcgV3bjuv1CuYO3i4iQiQm9ekpchTZkk&s=9g7g4jDn7hc9WYwr96DY2Aq-3xmPem6uMam-kpQm26M&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE7CMXXCYBV54PXSLAEFFNDRCAC77ANCNFSM4KRYCEHA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=11wEEDBv3Ke3n3b8dICjuQC5vgZ23dfGPax018VOZ2g&m=gW8eSylxe-BKcgV3bjuv1CuYO3i4iQiQm9ekpchTZkk&s=yEq9I4kRtAnEzrWoKagNr_gZ5wdJ9NpbTgLX75Y7in0&e=.
@AlexEMG you will not better the cause of the bug, based on what I have experienced I see that somewhere there is a function that mismatches the indexes. If, say, image 10 is deleted, when loading the label frames, the now 10th image (previously the 11th) will have the labels of the deleted image, and the following images will have their previous image labels.
Same problem if the rows of the label files are not sorted.
I haven't gone through the code to find where this comes from, but it seems the reason comes from an assumption broadly along these lines.
If you delete an image there is a function to delete the index from the annotation file!
From: Adonay Nunes notifications@github.com
Sent: Monday, February 10, 2020 2:15:31 PM
To: AlexEMG/DeepLabCut DeepLabCut@noreply.github.com
Cc: Mathis, Alexander Thomas amathis@fas.harvard.edu; Mention mention@noreply.github.com
Subject: Re: [AlexEMG/DeepLabCut] Wrong indexing of label positions (#581)
@AlexEMGhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_AlexEMG&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=11wEEDBv3Ke3n3b8dICjuQC5vgZ23dfGPax018VOZ2g&m=yZp8GaqK854w3fehURR9QR1pAQ-PUF6CFrafD2WRcsc&s=agbwx8Zdwtc7oYrVcCqGfu9H1ckJkGe23b4w2RR1Hgk&e= you will not better the cause of the bug, based on what I have experienced I see that somewhere there is a function that mismatches the indexes. If, say, image 10 is deleted, when loading the label frames, the now 10th image (previously the 11th) will have the labels of the deleted image, and the following images will have their previous image labels.
Same problem if the rows of the label files are not sorted.
I haven't gone through the code to find where this comes from, but it seems the reason comes from an assumption broadly along these lines.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_AlexEMG_DeepLabCut_issues_581-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAE7CMXQOBMM5YAOGY3ZIWOLRCFHPHA5CNFSM4KRYCEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELIOUIQ-23issuecomment-2D584116770&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=11wEEDBv3Ke3n3b8dICjuQC5vgZ23dfGPax018VOZ2g&m=yZp8GaqK854w3fehURR9QR1pAQ-PUF6CFrafD2WRcsc&s=QcENwjDbwSrK8yB2IBEhO7t2vDfc4n-5d-ypMzYhvGI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE7CMXRO4U7C4CCJKAUEIPTRCFHPHANCNFSM4KRYCEHA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=11wEEDBv3Ke3n3b8dICjuQC5vgZ23dfGPax018VOZ2g&m=yZp8GaqK854w3fehURR9QR1pAQ-PUF6CFrafD2WRcsc&s=PpNo6_K5LHFdf2xRUtIdhsFvoWcuDQBVLCNl94eoyQU&e=.
^ yes, if you delete images, you must run this function.