Got the following error:
$ python train.py --data data/coco.data --cfg cfg/yolov3.cfg
Namespace(accumulate=2, adam=False, arc='default', batch_size=32, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', device='', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, name='', nosave=False, notest=False, prebias=False, rect=False, resume=False, transfer=False, var=None, weights='weights/ultralytics49.pt')
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1070', total_memory=8116MB)
Traceback (most recent call last):
File "train.py", line 444, in <module>
train() # train normally
File "train.py", line 111, in train
chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
File "train.py", line 111, in <dictcomp>
chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
KeyError: 'module_list.85.Conv2d.weight'
(base)
@Samjith888 your command automatically loads the ultralytics49.pt backbone, which requires yolov3-spp.cfg. You must remove the backbone by using --weights '', or specify a weights-cfg combination that is compatible.
This error is caused by a user supplying incompatible --weights and --cfg arguments. To solve this you must specify no weights (i.e. random initialization of the model) using --weights '' and any --cfg, or use a --cfg that is compatible with your --weights. If none are specified, the defaults are --weights ultralytics49.pt and --cfg cfg/yolov3-spp.cfg.
Compatible --weights --cfg combinations:
python3 train.py --weights yolov3.pt --cfg cfg/yolov3.cfg
python3 train.py --weights yolov3.weights --cfg cfg/yolov3.cfg
python3 train.py --weights yolov3-spp.pt --cfg cfg/yolov3-spp.cfg
python3 train.py --weights ultralytics49.pt --cfg cfg/yolov3-spp.cfg
python3 train.py --weights ultralytics68.pt --cfg cfg/yolov3-spp.cfg
To train from scratch (randomly initialized weights), use:
python3 train.py --weights '' --cfg cfg/*.cfg # any cfg will work here
ultralytics49.pt is currently the highest performing YOLOv3 model (trained from scratch using this repo) available at the default img-size of 416 (see https://github.com/ultralytics/yolov3/issues/310), which is the reason it is used as the default backbone.
if i don't want pre_weights,how should i do?
As @glenn-jocher said,
You must remove the backbone by using --weights ''
thanks,bro
I ran this: python3 train.py --data data/custom.data --cfg cfg/yolov3-spp-r.cfg
And got:
assert c.max() <= model.nc, 'Target classes exceed model classes'
AssertionError: Target classes exceed model classes
What am I mising?
I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.
Hi guys,
I'm trying to train on a custom CFG (therefore should be using a random initialization of weights). I understand that to do this we should set --weights ''
Unfortunately, even when I do that, it keeps trying to download the weights and I get this error:
Exception: '' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0
This is the full command I am using to train:
python train.py --weights '' --cfg cfg/yolov3-custom.cfg --data data/coco1.data
Any help would be great - thanks!
@rohan-pradhan no space: --weights ''
$ python3 train.py --weights '' --data coco16.data
Namespace(accumulate=4, adam=False, arc='default', batch_size=16, bucket='', cache_images=False, cfg='cfg/yolov3-spp.cfg', data='coco16.data', device='', epochs=273, evolve=False, img_size=[416], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, var=None, weights='')
Using CPU
Caching labels (16 found, 0 missing, 0 empty, 0 duplicate, for 16 images): 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 16/16 [00:00<00:00, 2515.70it/s]
Caching labels (16 found, 0 missing, 0 empty, 0 duplicate, for 16 images): 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 16/16 [00:00<00:00, 5567.35it/s]
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Using 8 dataloader workers
Starting training for 273 epochs...
Epoch gpu_mem GIoU obj cls total targets img_size
0/272 0G 7.7 13.3 7.87 28.9 211 416: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 1/1 [01:05<00:00, 65.12s/it]
Class Images Targets P R [email protected] F1: 0%| | 0/1 [00:00<?, ?it/s]
Thanks for the quick response, Glenn. Unfortunately, even when I copy and paste your command it still gives the same error.
`>python train.py --weights '' --data coco1.data
Namespace(accumulate=4, adam=False, arc='default', batch_size=16, bucket='', cache_images=False, cfg='cfg/yolov3-spp.cfg', data='coco1.data', device='', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, name='', nosave=False, notest=False, prebias=False, rect=False, resume=False, transfer=False, var=None, weights="''")
Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11264MB)
2020-01-23 11:02:59.119516: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Downloading https://pjreddie.com/media/files/''
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404 Not Found
'rm' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "train.py", line 463, in <module>
train() # train normally
File "train.py", line 108, in train
attempt_download(weights)
File "C:\Users\Rohan\Documents\Development\Thesis\yolov3\models.py", line 454, in attempt_download
raise Exception(msg)
Exception: '' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0`
Not sure why it is treating '' as a string.
Figured it out! Changed it to --weights "" and it seemed to work.
Thanks again!
@rohan-pradhan ah interesting. What's your OS?
@glenn-jocher I'm running Windows 10 in a Conda environment (Anaconda Prompt).
@rohan-pradhan hmm ok. Perhaps it's windows.
hi,guys
when i run python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights ""
python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights ''
python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights weights/yolov3.pt
python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights weights/yolov3.weights
the same error occured,as follows.
my pytorch is 1.5.1 + torchvision 0.6.0
Traceback (most recent call last):
File "train.py", line 431, in
train(hyp) # train normally
File "train.py", line 164, in train
model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/frontend.py", line 339, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 228, in _initialize
handle = amp_init(loss_scale=properties.loss_scale, verbose=(_amp_state.verbosity == 2))
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/amp.py", line 101, in init
try_caching, verbose)
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/wrap.py", line 33, in cached_cast
if not utils.has_func(mod, fn):
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/utils.py", line 132, in has_func
if isinstance(mod, torch.nn.backends.backend.FunctionBackend):
AttributeError: module 'torch.nn' has no attribute 'backends
`
@sunset326 update torch to latest version.
@sunset326 update torch to latest version.
thx,brother
i have solved the problem,the requirement.txt says python > = 3.7, i update my python,and the problem doesn't occures.
Most helpful comment
Figured it out! Changed it to
--weights ""and it seemed to work.Thanks again!