i train a model in my custom data, can get the weights (last.pt and best.pt)
i run:
python test.py --img 640 --batch 16 --data ./data/patrol.yaml --weights weights/last.pt --device 4
python test.py --img 640 --batch 16 --data ./data/patrol.yaml --weights weights/best.pt --device 4
both raise the error:
Traceback (most recent call last):
File "test.py", line 277, in
opt.verbose)
File "test.py", line 86, in test
names = model.names if hasattr(model, 'names') else model.module.names
File "/home/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
type(self).__name__, name))
AttributeError: 'Model' object has no attribute 'module'
However, i can run with the default weight yolov5s.pt
python test.py --img 640 --batch 16 --data ./data/patrol.yaml --device 4
pytorch = 1.5
Also, train with my weights (last.pt adn best.pt) is not avaliable:
Traceback (most recent call last):
File "train.py", line 408, in
train(hyp)
File "train.py", line 366, in train
print('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600))
UnboundLocalError: local variable 'epoch' referenced before assignment
am i missing something to do with the weight(.pt) after traning done?
Not 100% sure but most likely using the --resume flag with last.pt when running train.py should fix it
@ml5ah
i see the reason why i can't train with my weights now.
seems like the --resume is same as use --weights last.pt
the problem is in the --epochs n, n should be the total epochs not the additional epochs
ex:
i train 300 epochs get the weights(last.pt), if i want to train 200 more epochs on last.pt,
thing i should do is set --epoch 500, not --epoch 200
still, have no idea how to test the last.pt by test.py
Duplicate of https://github.com/ultralytics/yolov5/issues/94?
@glenn-jocher
i guess i slove the problem
in the 337 line in train.py
'model': ema.ema.module if hasattr(model, 'module') else ema.ema,
i train with the muti-gpu, so the model save is ema.ema.module
however, model.names = data_dict['names'] is define after the
model = torch.nn.ParellelDate(model)
instead of ema.ema.module.names we only have ema.ema.names which is not save if use muti-gpu
@yxNONG ah, ok thanks for the insight! Can you test your fix on single and multi-gpu and submit a PR with the proposed changes if all the tests pass? Thank you!
@glenn-jocher just see the message now, i will try