I did all the steps to train with 1 class but when I start training I get the error like in the screenshot.
I put batch = 64 and subdivision = 8 on the .cfg file and I do not have any bad.list file after running the command
./darknet detector train data/obj.data yolo-obj.cfg darknet53.conv.74
on the /darknet-master/build/darknet/x64 path.
Can anyone help me figure it out why this is happening?


Probably @AlexeyAB can give you some causes for that.
From my experience, the training process sometimes stops suddenly and it can be a problem specially during night when you are sleeping. To prevent losing hours of training due to a sudden error/stop I created a simple bash file:
#!/bin/bash
cd /home/...the location of your darknet executable
while :
do
./darknet detector train data/obj.data yolov3.cfg backup/yolov3_last.weights
done
This works very well for me since every 100 iterations the weights are saved as yolov3_last.weights. Basically, every time the training process is stopped, it's restarted again with the last saved weights
@lfares
How many CPU RAM do you have?
Do you use the latest version of Darknet?
What parameters do you set in the Makefile?
I put batch = 64 and subdivision = 8 on the .cfg file
As I see from your screenshot that you use batch=32. Does the error occur for both batch=64 and 32?
Do you get this issue with batch=16 subdivisions=2 ?
On Nvidia Jetson Xavier. Changing the batch to 8 and subdivisions to 1 worked for running the training.
@kevinrev26 It looks like batch=8 is required for training due to low CPU RAM capacity on Jetson.
i was able to train, but the detection is not working properly. Is there a lower limit for the dataset images or something?
@kevinrev26 You must train at least 4000 iterations with batch=64. Or 32 000 iterations with batch=8. And you should use pre-trained weights-file for training.
Probably @AlexeyAB can give you some causes for that.
From my experience, the training process sometimes stops suddenly and it can be a problem specially during night when you are sleeping. To prevent losing hours of training due to a sudden error/stop I created a simple bash file:
#!/bin/bash cd /home/...the location of your darknet executable while : do ./darknet detector train data/obj.data yolov3.cfg backup/yolov3_last.weights doneThis works very well for me since every 100 iterations the weights are saved as yolov3_last.weights. Basically, every time the training process is stopped, it's restarted again with the last saved weights
Funny, i also did it via a bash script in the crontab some weeks ago :-D
crontab looks like this
crontab -l
```...
...
*/1 * * * * /opt/start_training.sh
...
...```
/opt/start_training.sh looks like this:
```#!/bin/bash
if [ -e "/srv/storage/training/608_weights/608_weights_final.weights" ];
then
echo "[ $(date) ] Final Weights already reached - Exit :-)"
exit 0
fi
if [ "$(ps auxfw | grep -v grep | grep "detector train" -q; echo $?)" -ne 0 ];
then
echo -e "[ $(date) --- Start training ]";
telinit 2
cd /home/user/computer-vision/darknet2/;
# Check if we got a "..._last.weights" file, if yes, we will use it to start our detector
# from this position, so we use the last "..._last.weights" file as checkpoint
# If such a file doesnt exist, we know we can start from zero
if [ -e "/srv/storage/training/608_weights/608_weights_last.weights" ];
then
./darknet detector train data/
else
./darknet detector train data/
fi
else
echo -e "[ $(date) --- already training ]";
fi```
Probably @AlexeyAB can give you some causes for that.
From my experience, the training process sometimes stops suddenly and it can be a problem specially during night when you are sleeping. To prevent losing hours of training due to a sudden error/stop I created a simple bash file:
#!/bin/bash cd /home/...the location of your darknet executable while : do ./darknet detector train data/obj.data yolov3.cfg backup/yolov3_last.weights doneThis works very well for me since every 100 iterations the weights are saved as yolov3_last.weights. Basically, every time the training process is stopped, it's restarted again with the last saved weights
Hi , can you tell me where should i create this bash file? in the darknet directory or in some system directory ?
Hi, this is the code to write in a bash file, which should be saved as TheNameYouWant.sh
You can save it wherever you want in your disk, it doesn't matter as long as the location to the darknet executable is well written in the code.
Once it's saved just open a linux terminal, go to the directory where you saved the file and type:
sh TheNameYouWant.sh
Thank you so much.
On Fri, Dec 6, 2019 at 9:46 PM David Rapado notifications@github.com
wrote:
Hi, this is the code to write in a bash file, which should be saved as
TheNameYouWant.sh
You can save it wherever you want in your disk, it doesn't matter as long
as the location to the darknet executable is well written in the code.Once it's saved just open a linux terminal, go to the directory where you
saved the file and type:
sh TheNameYouWant.sh—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/AlexeyAB/darknet/issues/2728?email_source=notifications&email_token=AG4GR5EVCPBLMEE77QSUEYDQXJJUBA5CNFSM4HA4ICF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGEEAQI#issuecomment-562577473,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AG4GR5EBIAFX3SSBDQBFZ2TQXJJUBANCNFSM4HA4ICFQ
.
Most helpful comment
Probably @AlexeyAB can give you some causes for that.
From my experience, the training process sometimes stops suddenly and it can be a problem specially during night when you are sleeping. To prevent losing hours of training due to a sudden error/stop I created a simple bash file:
This works very well for me since every 100 iterations the weights are saved as yolov3_last.weights. Basically, every time the training process is stopped, it's restarted again with the last saved weights