Darknet: How to visiualize training progress (TensorBoard possible?)

Created on 13 Sep 2019  路  6Comments  路  Source: AlexeyAB/darknet

I would like to visualize the training of this model as it progresses in order to visually determine when the loss plateaus. Is there a way to do this? Perhaps there's some way to send log messages someplace where TensorBoard could be used for monitoring the training?

Solved want enhancement

Most helpful comment

Yes, when I recompiled with OpenCV and included the -dont_show flag on my training command I was then able to see the chart in my browser, and without using the SSH tunneling I was trying before (I just use the EC2 instance public IP address and port 8090 and it shows up).

All 6 comments

-mjpeg_port 8090 -map

Use flags -mjpeg_port 8090 -map at the end of training command.

Then:

  • look at the Loss & accuracy window
  • look at the generated chart.png file
  • open in Chrome/Firefox URL: http://127.0.0.1:8090

Currently TensorBoard isn't supported.

This hasn't worked for me yet, maybe because I've failed to correctly tunnel from my AWS EC2 instance where the training takes place and my local machine?

I am running my training like so (on an AWS EC2 instance):

$ nohup ./darknet detector train build/darknet/x64/data/obj.data cfg/yolov3-tiny.cfg backup/yolov3-tiny_last.weights -mjpeg_port 8090 -map 2>&1 >> log_train_yolov3-tiny_20190924.txt &

On my local machine I am opening a tunnel to port 8090 like so:

$ ssh -i ~/.ssh/aws_keys/james.pem -L 127.0.0.1:8090:<ec2_ip_address>:8090 ubuntu@<ec2_ip_address>

If the reporting mechanism is working as advertised then I must be doing something wrong with the port 8090 tunnel and/or with the security group settings for the EC2 instance, no?

BTW every so often in the output from the training I see messages like the below, so it appears to be calculating some mAP values:

 calculation mAP (mean average precision)...

 detections_count = 5262, unique_truth_count = 1853  
class_id = 0, name = handgun, ap = 75.64%        (TP = 519, FP = 41) 
class_id = 1, name = rifle, ap = 57.57%      (TP = 296, FP = 45) 

 for conf_thresh = 0.25, precision = 0.90, recall = 0.44, F1-score = 0.59 
 for conf_thresh = 0.25, TP = 815, FP = 86, FN = 1038, average IoU = 68.44 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision ([email protected]) = 0.666059, or 66.61 % 

The README has a bullet point "Train on Amazon EC2" that mentions the use of -dont_show, which I didn't include, as well as a compilation of the darknet executable with OpenCV, which I didn't do. Perhaps one or both of these is at play? I will try to find out...

Yes, when I recompiled with OpenCV and included the -dont_show flag on my training command I was then able to see the chart in my browser, and without using the SSH tunneling I was trying before (I just use the EC2 instance public IP address and port 8090 and it shows up).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zihaozhang9 picture zihaozhang9  路  3Comments

qianyunw picture qianyunw  路  3Comments

HanSeYeong picture HanSeYeong  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments

Mididou picture Mididou  路  3Comments