Darknet: How to train on grayscale images

Created on 30 Jan 2018 · 23Comments · Source: AlexeyAB/darknet

Hi all,

We are facing troubles while using this model for grayscale training set. Would like to ask some helps under this setting.

We have tried every ways to make our grayscale training set be able to work on this DL model:
1) .cfg set Ch=3, convert ch=1 gray images to ch=3 via Xnview
2) .cfg set Ch=1, working on ch=1 gray images directly
Yet, none of these steps work. We still notice that the possibility of "object" is keep dropping in every iteration. However, this situation won't happen on the same execution file when we use colorful dataset.

Also, we found there are two functions for loading color or gray images ( load_image_paths() / load_image_paths_gray() ). We didn't see the function being called during training.

Please suggest us if you met same problem but have solved it already. Thanks a lot!

question

Source

horngjason

Most helpful comment

@ssanzr
If you will use only 1-channel input instead of 3-channels input, then will be change only the 1st convolutional layer, but all rest will still the same.
As you can see, the 1st layer spend only 2.14% of total computation, so if total time 100ms (and 1st layer took 2.14ms), then use of 1-channel instead of 3-channels will recude this time to 98.6ms (1st layer 0.71ms), so speedup will be about 1.4% only.

What is the channel field for, then?

Once was plans to use ch= param in cfg-file for grayscale images.

AlexeyAB on 6 May 2018

👍13

All 23 comments

I am not sure but this channel is just multiplying with batch size and that is the total number of images in one iteration.
For my training , one iteration is done exactly after 192 (64*3) images

ahsan856jalal on 23 Feb 2018

@horngjason have you solve it? how to do it? thanks!

ss199302 on 19 Apr 2018

@ss199302 Sorry for the late response.

We have found there is a possible solution to this question in "pjreddie/darknet". Please refer to: https://github.com/pjreddie/darknet/issues/468 It was work for us.

Please try AlexeyAB's solution listed above as well, even though I haven't tested this way yet.

horngjason on 2 May 2018

@horngjason thanks!

ss199302 on 2 May 2018

@ss199302 Just train with grayscale-images as usuall in the same way as for color-images without any changes.

Speed of training and detection using grey images is similar to color images.

AlexeyAB on 2 May 2018

@horngjason @AlexeyAB hello,when used AlexeyAB's solutionto train,the result of recall is iou=58.68%,recall=70.40%,precision=83.98%,when i used preddie#468 ,the value are 40.99%,49 %,83.12%.But when i call demo function,i can hardly detect objects

ss199302 on 6 May 2018

@AlexeyAB A grayscale image has 1/3 of the information compared to a color image, so i woud assume that the operations required would also be ~1/3 , so the processing time should also be lower.

This is not the case. Is this because Yolo is only implemented for Color images and if you use grayscale ones it will transform them in color images by increasing the number of channels with no relevant information?

What is the channel field for, then?

Thanks!

ssanzr on 6 May 2018

What is the channel field for, then?

Once was plans to use ch= param in cfg-file for grayscale images.

AlexeyAB on 6 May 2018

👍13

@ss199302
By AlexyAB's suggested solution, I can detect targets in testing videos which are either ch=1 or ch=3. On my side, IOU=70% / recall=0.75 / mAP=80%. The performance of detection is not as good as training on color dataset.

Here are some check points:
1) Keep training and testing in the same color space: for example, both of them are in gray
2) Overfitting: do training again on the same training dataset but in ch=3 format; then, do detection again
3) Include some samples in the training set, if the content and object size is completely different in your original dataset: even if you already trained on "random setting"

*My testing refers to the net-cfg file as same as used in training.
*My code-based is for YOLOv2+mAP-enable-function

horngjason on 7 May 2018

@horngjason If your training image is a grayscale image, is your test video also processed as a grayscale image?

ss199302 on 8 May 2018

@ss199302
I have tried it on grayscale video sequence as well as on IR filtering video sequence. Either of them is working actually.

horngjason on 9 May 2018

@AlexeyAB hi Alexey,I'm sorry to bother you. My training pictures are all 8-bit grayscale images, I just want it to learn texture information, so these three channels are not very important.I don't want to waste resources on converting grayscale images into three-channel images because I am running on arm. Requires network input to be only 416x416x1.However, I changed the channel in cfg to 1 in the original author's framework but it will report an error, so I forced to modify "load_data_detection()" and change it to "d.x.cols = h * w ", it seems to work. I want to know if I want to make the network input size 416x416x1, is there any problem with this operation? Thanks!

xujunjie96 on 16 Feb 2019

@xujunjie96 Hi,

This repository supports channels=1 in cfg-file.

What should you change in the original framework: https://github.com/AlexeyAB/darknet/pull/936/files

AlexeyAB on 16 Feb 2019

@AlexeyAB Thank you for your quick answer. Wow, your upgraded version is perfect. Are you planning to update the ‘depthwise’ layer in your project?

xujunjie96 on 16 Feb 2019

@xujunjie96

Are you planning to update the ‘depthwise’ layer in your project?

Yes, after that XNOR-net will be optimized more.

AlexeyAB on 16 Feb 2019

@AlexeyAB Hi,I'm back again. I previously modified the original author source by referring to "https://blog.csdn.net/Chen_yingpeng/article/details/82682188" to add the ‘depthwise_convolution’ layer. I don't know if the method of this website is suitable for your framework. If there is nothing wrong with it, I can try to do it. By the way, when are you going to update the ‘depthwise_convolution’ layer? Thank you for your friendly answer!

xujunjie96 on 17 Feb 2019

@xujunjie96 Hi,

Unfortunately, this implementation does not support the parameter groups.
And here I do not see the use of cudnnConvolutionForward() function, that may be faster than custom implementation.

AlexeyAB on 17 Feb 2019

@AlexeyAB Ye~You are right. So when are you going to update?

xujunjie96 on 17 Feb 2019

Hello, @AlexeyAB. Thank you for nice software.

I succeeded to train custom objects using yolov3-tiny. And now, I want to train again using grayscale images, because my camera output is grayscale and I don't want to convert them to 3ch using extra computing resources.

Can I use same pre-trained weights obtained from yolov3-tiny.weights?

./darknet partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

Or should I train whole weights for 1 ch yolov3-tiny?

ledmonster on 3 Mar 2019

@ledmonster

If you fed the network using gray-scale images and set channels=1 in the cfg-file, then you can't use default pre-trained weights.
Just train without pre-trained weights files.

AlexeyAB on 3 Mar 2019

👍1

@AlexeyAB I see. Thank you.

ledmonster on 3 Mar 2019

Hello @AlexeyAB and all,
I am exited to try the yolo algorithm in the medical domain and will be grateful to hear your opinion.
I have 3 grayscale images (neighboring slices of 3D medical image) that I would like to represent the RGB channel in yolov3. Do I need the 3 jpg images to be with the same name in order to do that? Can you please help me to make the needed changes in the code?

This is how one slice looks like.
LNDb-0001_finding1_rad1

hbiserinska on 16 Apr 2020

@hbiserinska

The best way is to use additinaol software to convert 3xGrayScale -> 1xRGB or 4xGrayScale -> 1xRGBA (you should find or write this software by yourself by using Python/C/C++/...).

And the train it with commented lines hue= and saturation= in cfg-file.

AlexeyAB on 16 Apr 2020

Was this page helpful?

5 / 5 - 1 ratings