I am getting an error related to OpenCV during training the network (see below for the error). Error seems to appear at different iteration steps everytime I run it (e.g. 1st time it was iteration 22k second time it was 140k).
To Reproduce
Steps to reproduce the behavior, i.e.:
deeplabcut.train_network(config_path,shuffle=1, maxiters=600000)
See error:
TRACEBACK
deeplabcut.create_training_dataset(config_path,augmenter_type = 'default')
C:\Users\Mehmet Keles\Desktop\Dros Sleep Analysis\LiquidFoodDLC-Mehmet-2020-03-23\training-datasets\iteration-8\UnaugmentedDataSet_LiquidFoodDLCMar23 already exists!
C:\Users\Mehmet Keles\Desktop\Dros Sleep Analysis\LiquidFoodDLC-Mehmet-2020-03-23\dlc-models\iteration-8\LiquidFoodDLCMar23-trainset95shuffle1 already exists!
C:\Users\Mehmet Keles\Desktop\Dros Sleep Analysis\LiquidFoodDLC-Mehmet-2020-03-23\dlc-models\iteration-8\LiquidFoodDLCMar23-trainset95shuffle1/train already exists!
C:\Users\Mehmet Keles\Desktop\Dros Sleep Analysis\LiquidFoodDLC-Mehmet-2020-03-23\dlc-models\iteration-8\LiquidFoodDLCMar23-trainset95shuffle1/test already exists!
The training dataset is successfully created. Use the function 'train_network' to start training. Happy training!
[(0.95, 1, (array([487, 809, 290, 68, 693, 80, 727, 938, 710, 404, 84, 35, 875,
219, 864, 895, 309, 534, 496, 263, 441, 144, 249, 409, 772, 406,
798, 509, 613, 330, 591, 318, 495, 949, 642, 760, 32, 644, 916,
940, 700, 905, 342, 787, 483, 865, 269, 25, 951, 777, 6, 694,
374, 277, 23, 196, 773, 306, 359, 42, 334, 102, 828, 353, 801,
358, 100, 299, 257, 505, 384, 55, 363, 229, 179, 171, 799, 356,
573, 605, 90, 890, 245, 7, 455, 248, 603, 649, 540, 857, 894,
482, 830, 776, 267, 719, 682, 803, 560, 22, 421, 571, 609, 207,
899, 696, 233, 685, 667, 855, 794, 281, 759, 923, 543, 952, 183,
388, 582, 282, 900, 664, 769, 662, 936, 119, 722, 557, 54, 775,
570, 117, 699, 217, 191, 222, 910, 291, 389, 435, 587, 745, 860,
339, 653, 604, 265, 58, 453, 95, 59, 351, 848, 273, 198, 227,
391, 670, 817, 643, 75, 451, 184, 650, 39, 514, 539, 947, 246,
166, 188, 111, 146, 209, 430, 462, 390, 382, 598, 193, 873, 702,
165, 368, 324, 446, 466, 574, 596, 508, 285, 636, 366, 608, 69,
765, 407, 445, 346, 896, 681, 735, 542, 558, 492, 130, 310, 764,
563, 242, 311, 520, 753, 476, 922, 51, 278, 292, 810, 194, 195,
550, 897, 403, 88, 428, 412, 264, 750, 427, 400, 26, 592, 930,
795, 838, 422, 725, 398, 918, 1, 796, 413, 262, 203, 121, 552,
732, 711, 683, 645, 325, 237, 172, 806, 617, 528, 548, 463, 63,
71, 294, 510, 150, 555, 926, 46, 827, 16, 532, 878, 724, 628,
704, 43, 96, 730, 834, 580, 754, 124, 354, 442, 599, 646, 431,
914, 317, 943, 244, 867, 82, 231, 154, 271, 516, 920, 601, 235,
815, 276, 783, 52, 648, 296, 385, 381, 13, 525, 110, 338, 387,
303, 908, 674, 181, 72, 740, 182, 515, 881, 606, 298, 186, 948,
721, 825, 709, 185, 162, 48, 569, 341, 526, 307, 283, 29, 93,
347, 141, 846, 616, 232, 655, 641, 879, 224, 583, 708, 836, 475,
419, 874, 247, 675, 489, 280, 630, 418, 477, 768, 300, 142, 680,
123, 824, 579, 230, 47, 652, 457, 70, 170, 274, 86, 692, 302,
41, 44, 458, 350, 378, 842, 305, 932, 405, 935, 659, 561, 448,
200, 720, 790, 370, 15, 396, 638, 595, 887, 880, 220, 862, 461,
502, 863, 73, 76, 452, 107, 212, 739, 877, 157, 672, 168, 712,
527, 610, 87, 472, 327, 364, 493, 362, 755, 728, 486, 797, 835,
151, 201, 160, 8, 703, 729, 98, 594, 955, 340, 633, 695, 886,
28, 572, 78, 89, 625, 469, 337, 125, 888, 829, 950, 511, 335,
494, 687, 218, 156, 660, 174, 519, 780, 820, 575, 129, 623, 393,
858, 420, 602, 270, 614, 852, 627, 581, 686, 45, 568, 933, 187,
931, 928, 744, 774, 882, 915, 497, 238, 131, 164, 847, 450, 169,
392, 854, 236, 147, 945, 639, 903, 439, 531, 293, 297, 621, 116,
833, 841, 149, 240, 844, 733, 288, 506, 18, 118, 766, 544, 414,
177, 562, 103, 429, 211, 812, 253, 589, 661, 743, 677, 718, 221,
637, 503, 20, 444, 679, 27, 946, 158, 761, 206, 369, 891, 866,
634, 250, 410, 813, 40, 499, 243, 79, 345, 523, 402, 134, 112,
355, 109, 401, 792, 586, 786, 180, 261, 808, 352, 551, 449, 564,
437, 459, 4, 460, 541, 657, 524, 136, 205, 31, 3, 61, 884,
252, 161, 807, 367, 697, 12, 789, 756, 74, 859, 779, 138, 898,
360, 823, 856, 684, 97, 590, 173, 34, 436, 612, 83, 329, 717,
295, 190, 375, 36, 851, 481, 259, 53, 741, 676, 736, 688, 872,
713, 671, 840, 818, 99, 30, 738, 635, 105, 819, 845, 287, 416,
705, 893, 651, 313, 176, 941, 108, 800, 415, 716, 279, 284, 512,
921, 849, 258, 223, 62, 752, 467, 104, 394, 839, 837, 737, 706,
673, 804, 911, 440, 239, 566, 9, 814, 885, 411, 584, 934, 423,
266, 668, 811, 314, 417, 748, 593, 767, 91, 480, 376, 372, 770,
275, 892, 691, 361, 904, 577, 656, 155, 81, 0, 434, 901, 468,
343, 163, 545, 507, 620, 822, 954, 57, 473, 576, 192, 241, 175,
344, 624, 143, 92, 701, 159, 485, 726, 38, 333, 565, 547, 791,
365, 529, 549, 535, 64, 470, 189, 301, 734, 373, 869, 944, 65,
909, 902, 522, 559, 315, 474, 94, 49, 843, 618, 793, 669, 816,
889, 782, 145, 747, 632, 308, 132, 101, 208, 883, 433, 471, 214,
321, 210, 251, 831, 447, 578, 758, 690, 929, 678, 876, 491, 597,
395, 312, 906, 917, 167, 611, 254, 498, 771, 399, 50, 272, 927,
762, 615, 626, 533, 464, 204, 349, 731, 953, 106, 942, 37, 484,
386, 268, 784, 19, 629, 380, 714, 113, 178, 788, 640, 537, 56,
871, 67, 426, 139, 234, 924, 10, 490, 289, 432, 260, 500, 554,
140, 202, 316, 85, 319, 665, 443, 348, 133, 654, 868, 746, 781,
870, 504, 424, 778, 304, 919, 17, 286, 225, 377, 135, 5, 2,
137, 456, 538, 658, 763, 826, 619, 751, 199, 226, 488, 663, 631,
479, 322, 913, 332, 861, 937, 383, 256, 115, 126, 600, 148, 802,
513, 120, 546, 518, 33, 939, 397, 213, 152, 785, 127, 114, 122,
521, 465, 454, 60, 698, 723, 647, 707, 326, 153, 408]), array([689, 66, 832, 585, 536, 328, 11, 14, 216, 197, 805, 425, 21,
530, 438, 749, 379, 228, 588, 757, 255, 907, 331, 336, 128, 357,
215, 320, 742, 821, 853, 715, 567, 553, 925, 912, 850, 666, 478,
77, 371, 501, 323, 517, 24, 607, 556, 622])))]
>>> deeplabcut.train_network(config_path,shuffle=1, maxiters=600000)
Selecting single-animal trainer
Config:
{'all_joints': [[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9],
[10],
[11],
[12],
[13],
[14],
[15],
[16],
[17],
[18],
[19],
[20],
[21],
[22],
[23],
[24],
[25],
[26],
[27],
[28]],
'all_joints_names': ['a6',
'a5',
'atip',
'headtip',
'head_r',
'head_l',
'thor_ant',
'thor_post',
't1_r_tip',
'joint1_r',
'joint1_rtop',
't1_l_tip',
'joint1_l',
'joint1_ltop',
't2_r_tip',
'joint2_r',
'joint2_rtop',
't2_l_tip',
'joint2_l',
'joint2_ltop',
't3_r_tip',
'joint3_r',
'joint3_rtop',
't3_l_tip',
'joint3_l',
'joint3_ltop',
'halt_r',
'halt_l',
'prob'],
'batch_size': 1,
'crop_pad': 0,
'cropratio': 0.4,
'dataset': 'training-datasets\\iteration-8\\UnaugmentedDataSet_LiquidFoodDLCMar23\\LiquidFoodDLC_Mehmet95shuffle1.mat',
'dataset_type': 'default',
'deterministic': False,
'display_iters': 1000,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': 'C:\\Users\\Mehmet '
'Keles\\.conda\\envs\\DLC-GPU\\lib\\site-packages\\deeplabcut\\pose_estimation_tensorflow\\models\\pretrained\\resnet_v1_50.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 0.05,
'locref_stdev': 7.2801,
'log_dir': 'log',
'max_input_size': 1500,
'mean_pixel': [123.68, 116.779, 103.939],
'metadataset': 'training-datasets\\iteration-8\\UnaugmentedDataSet_LiquidFoodDLCMar23\\Documentation_data-LiquidFoodDLC_95shuffle1.pickle',
'min_input_size': 64,
'mirror': False,
'multi_step': [[0.005, 10000],
[0.02, 430000],
[0.002, 730000],
[0.001, 1030000]],
'net_type': 'resnet_50',
'num_joints': 29,
'optimizer': 'sgd',
'pairwise_huber_loss': False,
'pairwise_predict': False,
'partaffinityfield_predict': False,
'pos_dist_thresh': 17,
'project_path': 'C:\\Users\\Mehmet Keles\\Desktop\\Dros Sleep '
'Analysis\\LiquidFoodDLC-Mehmet-2020-03-23',
'regularize': False,
'rotation': 25,
'rotratio': 0.4,
'save_iters': 50000,
'scale_jitter_lo': 0.5,
'scale_jitter_up': 1.25,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': 'C:\\Users\\Mehmet Keles\\Desktop\\Dros Sleep '
'Analysis\\LiquidFoodDLC-Mehmet-2020-03-23\\dlc-models\\iteration-8\\LiquidFoodDLCMar23-trainset95shuffle1\\train\\snapshot',
'stride': 8.0,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
Starting with imgaug pose-dataset loader (=default).
Batch Size is 1
Initializing ResNet
Loading ImageNet-pretrained resnet_50
2020-09-20 19:41:46.554921: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-09-20 19:41:46.933615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:08:00.0
totalMemory: 8.00GiB freeMemory: 6.55GiB
2020-09-20 19:41:46.937859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-09-20 19:41:48.677640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-20 19:41:48.680598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-09-20 19:41:48.682523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-09-20 19:41:48.707675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6281 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:08:00.0, compute capability: 7.5)
Max_iters overwritten as 600000
Training parameter:
{'stride': 8.0, 'weigh_part_predictions': False, 'weigh_negatives': False, 'fg_fraction': 0.25, 'mean_pixel': [123.68, 116.779, 103.939], 'shuffle': True, 'snapshot_prefix': 'C:\\Users\\Mehmet Keles\\Desktop\\Dros Sleep Analysis\\LiquidFoodDLC-Mehmet-2020-03-23\\dlc-models\\iteration-8\\LiquidFoodDLCMar23-trainset95shuffle1\\train\\snapshot', 'log_dir': 'log', 'global_scale': 0.8, 'location_refinement': True, 'locref_stdev': 7.2801, 'locref_loss_weight': 0.05, 'locref_huber_loss': True, 'optimizer': 'sgd', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'regularize': False, 'weight_decay': 0.0001, 'crop_pad': 0, 'scoremap_dir': 'test', 'batch_size': 1, 'dataset_type': 'default', 'deterministic': False, 'mirror': False, 'pairwise_huber_loss': False, 'weigh_only_present_joints': False, 'partaffinityfield_predict': False, 'pairwise_predict': False, 'all_joints': [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]], 'all_joints_names': ['a6', 'a5', 'atip', 'headtip', 'head_r', 'head_l', 'thor_ant', 'thor_post', 't1_r_tip', 'joint1_r', 'joint1_rtop', 't1_l_tip', 'joint1_l', 'joint1_ltop', 't2_r_tip', 'joint2_r', 'joint2_rtop', 't2_l_tip', 'joint2_l', 'joint2_ltop', 't3_r_tip', 'joint3_r', 'joint3_rtop', 't3_l_tip', 'joint3_l', 'joint3_ltop', 'halt_r', 'halt_l', 'prob'], 'cropratio': 0.4, 'dataset': 'training-datasets\\iteration-8\\UnaugmentedDataSet_LiquidFoodDLCMar23\\LiquidFoodDLC_Mehmet95shuffle1.mat', 'display_iters': 1000, 'init_weights': 'C:\\Users\\Mehmet Keles\\.conda\\envs\\DLC-GPU\\lib\\site-packages\\deeplabcut\\pose_estimation_tensorflow\\models\\pretrained\\resnet_v1_50.ckpt', 'max_input_size': 1500, 'metadataset': 'training-datasets\\iteration-8\\UnaugmentedDataSet_LiquidFoodDLCMar23\\Documentation_data-LiquidFoodDLC_95shuffle1.pickle', 'min_input_size': 64, 'multi_step': [[0.005, 10000], [0.02, 430000], [0.002, 730000], [0.001, 1030000]], 'net_type': 'resnet_50', 'num_joints': 29, 'pos_dist_thresh': 17, 'project_path': 'C:\\Users\\Mehmet Keles\\Desktop\\Dros Sleep Analysis\\LiquidFoodDLC-Mehmet-2020-03-23', 'rotation': 25, 'rotratio': 0.4, 'save_iters': 50000, 'scale_jitter_lo': 0.5, 'scale_jitter_up': 1.25, 'covering': True, 'elastic_transform': True, 'motion_blur': True, 'motion_blur_params': {'k': 7, 'angle': [-90, 90]}}
Starting training....
iteration: 1000 loss: 0.0296 lr: 0.005
iteration: 2000 loss: 0.0232 lr: 0.005
iteration: 3000 loss: 0.0221 lr: 0.005
iteration: 4000 loss: 0.0205 lr: 0.005
iteration: 5000 loss: 0.0198 lr: 0.005
iteration: 6000 loss: 0.0189 lr: 0.005
iteration: 7000 loss: 0.0185 lr: 0.005
iteration: 8000 loss: 0.0175 lr: 0.005
iteration: 9000 loss: 0.0168 lr: 0.005
iteration: 10000 loss: 0.0165 lr: 0.005
iteration: 11000 loss: 0.0176 lr: 0.02
iteration: 12000 loss: 0.0165 lr: 0.02
iteration: 13000 loss: 0.0158 lr: 0.02
iteration: 14000 loss: 0.0149 lr: 0.02
iteration: 15000 loss: 0.0143 lr: 0.02
iteration: 16000 loss: 0.0140 lr: 0.02
iteration: 17000 loss: 0.0135 lr: 0.02
iteration: 18000 loss: 0.0134 lr: 0.02
iteration: 19000 loss: 0.0132 lr: 0.02
iteration: 20000 loss: 0.0128 lr: 0.02
iteration: 21000 loss: 0.0126 lr: 0.02
iteration: 22000 loss: 0.0122 lr: 0.02
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Users\Mehmet Keles\.conda\envs\DLC-GPU\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "C:\Users\Mehmet Keles\.conda\envs\DLC-GPU\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Mehmet Keles\.conda\envs\DLC-GPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\train.py", line 89, in load_and_enqueue
batch_np = dataset.next_batch()
File "C:\Users\Mehmet Keles\.conda\envs\DLC-GPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\dataset\pose_dataset_imgaug.py", line 298, in next_batch
) = self.get_batch()
File "C:\Users\Mehmet Keles\.conda\envs\DLC-GPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\dataset\pose_dataset_imgaug.py", line 233, in get_batch
image = imread(os.path.join(self.cfg.project_path, im_file), mode="RGB")
File "C:\Users\Mehmet Keles\.conda\envs\DLC-GPU\lib\site-packages\deeplabcut\utils\auxfun_videos.py", line 324, in imread
return cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-k3ngdfuh\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
Expected behavior
I used the same dataset in 2.1.8 to train and this problem first appeared when I switched to 2.2b8. I was expecting train_network to finish when reached max iters. Instead, I receive the above error. The error is not reproducible consistently in my own system but seems to appear sporadically at diff points in each run.
That is when you have a corrupt image -- in the earlier version we would just skip it in loading. @jeylau did that change?
I have just checked, and neither loading nor training dataset creation changed. The error happens when attempting to read nonexistent image; what I find strange though is that it did not fail when trying to get the image shape here https://github.com/DeepLabCut/DeepLabCut/blob/17daea7513371d69ee9b4979ac457e40b542b2cc/deeplabcut/generate_training_dataset/trainingsetmanipulation.py#L793
Ahh I think I got it. I initially did not run deeplabcut.check_labels(config_path) after refining labels and merging datasets. I ran it and tried training, no more errors.
hmm, nope. still got the error after 500k.
So that happens during re-training (i.e., after merging the datasets)? Or did you face the issue while training for the very first time too?
It didn't happen in 2.1.8, did 4 iterations fine. It started happening once I switched to 2.2b8 and after merging datasets. I didn't try just training without refining after switching to 2.2b8.
Now, it happens after downgrading to 2.1.8 it as well.
I tried on google colab and it worked fine, then did a fresh install on the system I was getting errors. I was able to get 600k without a hitch. I think I broke something in my prev installation.