I would like to visualize the training of this model as it progresses in order to visually determine when the loss plateaus. Is there a way to do this? Perhaps there's some way to send log messages someplace where TensorBoard could be used for monitoring the training?
-mjpeg_port 8090 -map
Use flags -mjpeg_port 8090 -map at the end of training command.
Then:
chart.png filehttp://127.0.0.1:8090Currently TensorBoard isn't supported.
This hasn't worked for me yet, maybe because I've failed to correctly tunnel from my AWS EC2 instance where the training takes place and my local machine?
I am running my training like so (on an AWS EC2 instance):
$ nohup ./darknet detector train build/darknet/x64/data/obj.data cfg/yolov3-tiny.cfg backup/yolov3-tiny_last.weights -mjpeg_port 8090 -map 2>&1 >> log_train_yolov3-tiny_20190924.txt &
On my local machine I am opening a tunnel to port 8090 like so:
$ ssh -i ~/.ssh/aws_keys/james.pem -L 127.0.0.1:8090:<ec2_ip_address>:8090 ubuntu@<ec2_ip_address>
If the reporting mechanism is working as advertised then I must be doing something wrong with the port 8090 tunnel and/or with the security group settings for the EC2 instance, no?
BTW every so often in the output from the training I see messages like the below, so it appears to be calculating some mAP values:
calculation mAP (mean average precision)...
detections_count = 5262, unique_truth_count = 1853
class_id = 0, name = handgun, ap = 75.64% (TP = 519, FP = 41)
class_id = 1, name = rifle, ap = 57.57% (TP = 296, FP = 45)
for conf_thresh = 0.25, precision = 0.90, recall = 0.44, F1-score = 0.59
for conf_thresh = 0.25, TP = 815, FP = 86, FN = 1038, average IoU = 68.44 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision ([email protected]) = 0.666059, or 66.61 %
The README has a bullet point "Train on Amazon EC2" that mentions the use of -dont_show, which I didn't include, as well as a compilation of the darknet executable with OpenCV, which I didn't do. Perhaps one or both of these is at play? I will try to find out...
Yes, when I recompiled with OpenCV and included the -dont_show flag on my training command I was then able to see the chart in my browser, and without using the SSH tunneling I was trying before (I just use the EC2 instance public IP address and port 8090 and it shows up).
Most helpful comment
Yes, when I recompiled with OpenCV and included the
-dont_showflag on my training command I was then able to see the chart in my browser, and without using the SSH tunneling I was trying before (I just use the EC2 instance public IP address and port 8090 and it shows up).