Darknet: Weights not portable

Created on 27 Feb 2018 · 10Comments · Source: pjreddie/darknet

I trained my custom net on NVIDIA JETSON TK1, with GPU=1 in MakeFile, successfully ran my test on trained net. Transferred those weights to my machine (macOS High Sierra - 10.13.1). Re-compiled darknet with GPU=0 in Makefile, Surprisingly detector test doesn't detect objects on my local machine.

Any ideas?

Source

Schrodinger1926

Most helpful comment

@Schrodinger1926

Did you train on x86 with GPU or on nVidia Jetson ARM?
Do you use 64 bit OS on both x86 and Jetson ARM?

https://devtalk.nvidia.com/default/topic/1019947/install-jetpack/

Jetson is 32-bit ARM architecture (armhf)...this is normal, the JTK1 is 32-bit.

Probably related 64/32-bit version of the OS.

Original fork doesn't support transfer between 32bit <-> 64bit machines, due this 2 lines, size_t is 64bit on 64bit OS, and 32bit on 32bit OS, so weights-file isn't portable:

You can try to use my fork to train and detect, it supports transfer between 32bit <-> 64bit machines and weights-file is portable: https://github.com/AlexeyAB/darknet

AlexeyAB on 28 Feb 2018

👍3

All 10 comments

try to test with threshold =0 , if you are getting boxes on the predicted image then it means that training's confidence level is low at the moment try with different thresholds and see the results. If you still don't get any detections at thresh=0 , it means something went went wrong with the training.
./darknet detector test yourFile.data yourCFG.cfg yourTrainModel.weights file -thresh 0

ahsan856jalal on 27 Feb 2018

I'm running detector test with same cfg files on my local machine that I ran on my GPU server, no object detected on my local machine but successful ran on my GPU server.

My concern is that is there any problem with the trained weights or may darknet executable. because I darknet on my local machine is compiled with GPU=0 and darknet which trained those weights was compiled with GPU=1.

Schrodinger1926 on 27 Feb 2018

@Schrodinger1926

Do you use tiny-yolo?
Do you use cuDNN on machine with GPU?
Try to compress your Yolo source code (with cfg and weights file) on computer with GPU, then copy this zip-file to another computer, un-compress it, change GPU=0, do make, and run (so all paths will be left untouchable)

AlexeyAB on 27 Feb 2018

Hey @ahsan856jalal @AlexeyAB, thanks for your reply

No, I'm using YOLO net but for different set of classes on my custom data, gave successful test results on my GPU machine where I did the training part.
No, cuDNN=0 on my GPU machine
I did that, basically I tried executing same darknet executable on a machine without GPU. It outrightly gave an error - cannot execute executable.

If I understand it right, weights are ultimately some numeric value stuff, it shouldn't really care how that weights file was made.
are you guys able to produce this error?

Schrodinger1926 on 28 Feb 2018

I use my fork and I successfully train weights-file on one machine with GPU and run it on another machine with or without GPU.

AlexeyAB on 28 Feb 2018

anything that might be related system architecture? the machine NVIDIA JETSON actually has an ARM process and my local machine is x86

Schrodinger1926 on 28 Feb 2018

@Schrodinger1926

Did you train on x86 with GPU or on nVidia Jetson ARM?
Do you use 64 bit OS on both x86 and Jetson ARM?

https://devtalk.nvidia.com/default/topic/1019947/install-jetpack/

Jetson is 32-bit ARM architecture (armhf)...this is normal, the JTK1 is 32-bit.

Probably related 64/32-bit version of the OS.

Original fork doesn't support transfer between 32bit <-> 64bit machines, due this 2 lines, size_t is 64bit on 64bit OS, and 32bit on 32bit OS, so weights-file isn't portable:

You can try to use my fork to train and detect, it supports transfer between 32bit <-> 64bit machines and weights-file is portable: https://github.com/AlexeyAB/darknet

AlexeyAB on 28 Feb 2018

👍3

I checked the block size of size_t in my GPU machine and my local machine. bulls eye, it is indeed 32 - 64 bit problem

Thanks man.

Schrodinger1926 on 1 Mar 2018

Why can't the weights be stored as plain ASCII? Storing them as binary might be more compact but there are endianness issues, as well as the issue of being able to use them with other frameworks.