Darknet: How to train 4 channels image using yolov3?

Created on 23 Dec 2018 · 16Comments · Source: AlexeyAB/darknet

@AlexeyAB Hi~

As the title~

want enhancement

Source

MingPap

All 16 comments

@AlexeyAB Hi

I want to combine depth map and RGB images to 4 channels data as input fo YOLOv3 to train.

So which code I need to modify to train 4 channel pictures?

Best hopefully!

MingPap on 26 Dec 2018

@MingPap Hi,

You should do several changes in such places, as it was done for channel=1 : https://github.com/AlexeyAB/darknet/pull/936/files

But you should be careful during RGBA->HSV->RGBA transformation in these 2 places:

for OPENCV=1 : https://github.com/AlexeyAB/darknet/blob/95773cfb423266b9ac6aeea54e862db5817b5447/src/http_stream.cpp#L304-L323
for OPENCV=0: https://github.com/AlexeyAB/darknet/blob/95773cfb423266b9ac6aeea54e862db5817b5447/src/image.c#L1751-L1767

Add this line if(c == 4) flag = CV_LOAD_IMAGE_UNCHANGED; here: https://github.com/AlexeyAB/darknet/pull/936/files#diff-2ceac7e68fdac00b370188285ab286f7R708
as desribed here: https://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html?highlight=imread#imread
May be you should convert RGBA -> first_image_RGB + second_image_Depth, and then
apply code for 3-channel to the first_image_RGB
apply code for 1-channel to the second_image_Depth

I.e. add code something like this (if OPENCV=1), I did not check it for errors:

 if (ipl->nChannels == 4) 
 { 
     std::vector<cv::Mat> rgba_vec; 
     cv::split(sized, rgba_vec);  // split to 4 channels

    std::vector<cv::Mat> rgb_vec = rgba_vec;
    cv::Mat depth_mat = rgba_vec[3]; // save 4-th depth channel
    rgb_vec.resize(3); // remove the 4-th channel (if the 4-th channel is Depth)

    cv::Mat rgb;
    cv::merge(rgb_vec, rgb); // use channel=3 RGB image as usual
    cv::Mat hsv_src; 
    cvtColor(rgb, hsv_src, CV_BGR2HSV);    // also BGR -> RGB 

    std::vector<cv::Mat> hsv; 
    cv::split(hsv_src, hsv); // use channel=3 HSV image as usual

     hsv[1] *= dsat; // do data augmentation
     hsv[2] *= dexp; 
     hsv[0] += 179 * dhue; 

     cv::merge(hsv, hsv_src);
     cvtColor(hsv_src, rgb, CV_HSV2RGB);    // now RGB instead of BGR   

     depth_mat*= dexp; // do data augmentation for the Depth channel

     cv::split(rgb, rgb_vec);  // split to 3 channels
     rgba_vec = rgb_vec;
     rgba_vec.resize(4);
     rgba_vec[3] = depth_mat;
    cv::merge(rgba_vec, sized); // combine channel=4 RGBA image
    // with data augmented all 4 channels

 } else if (ipl->nChannels >= 3) 
 { 
     cv::Mat hsv_src; 
     cvtColor(sized, hsv_src, CV_BGR2HSV);    // also BGR -> RGB 

     std::vector<cv::Mat> hsv; 
     cv::split(hsv_src, hsv); 

     hsv[1] *= dsat; 
     hsv[2] *= dexp; 
     hsv[0] += 179 * dhue; 

     cv::merge(hsv, hsv_src); 

     cvtColor(hsv_src, sized, CV_HSV2RGB);    // now RGB instead of BGR 
 } 
 else 
 { 
     sized *= dexp; 
 }

AlexeyAB on 26 Dec 2018

@AlexeyAB Thank you.

I will try and give feedback soon

MingPap on 27 Dec 2018

👀2

any update on this? I'll try something similar soon

magistri on 26 Jan 2019

any update for this topic? I just changes the 4 channels using RGBA2BGRA, and I met problem, looks like stack overflow. I am using window x64, the stack for each thread is 1M maybe. Is it enough?

candisjesus on 26 Jul 2019

Any updates on this? I want to try something similar to this.

sourabhyadav on 4 Nov 2019

Hi @AlexeyAB ,
I see that this topic comes out recurrently and it is in the roadmap.
anyhow I would like to ask 2-3 things:

I tried to follow the suggestions above, but I find that respect to that time the code has changed much... can you confirm that those instructions are not anymore valid?
https://github.com/AlexeyAB/darknet/issues/2094#issuecomment-449971358
In issue #5008 it is mentioned that the number of channels is a parameter in the cfg file: can you confirm me that changing this parameter to 4 (or k > 3) is not enough? Indeed I have Seg Error with 4-channels tiffs
Should I also change the dimension of the first conv layer, no?

Thanks!

scaramouche88 on 13 Mar 2020

this should be valied
you should set [net] channels=4
dimension of the first conv-layer will be set automatically

May be some other changes are required

AlexeyAB on 13 Mar 2020

2-3. ok thanks

ok after some search I found it.
It is not anymore in src/http_stream.cpp but in src/image_opencv.cpp. Also it is not anymore if (ipl->nChannels >= 3) but If (img.channels() >= 3)

I will keep updated if I find any other changes

scaramouche88 on 13 Mar 2020

For me this wasn't enough.
I also getting a Seg Error maybe because RGB -> HSV convertion is not possible on 13-layers tiff files. I will try to keep you up to date, if I success

fatal69100 on 16 Mar 2020

@scaramouche88 any updates ?

fatal69100 on 17 Mar 2020

I was able to train, but something is for sure wrong: I am getting results bad results and it does not depends on the data.

I will look into it tomorrow (I hope). I would say that the problem should be in the various conversions, but I'm not sure.

@fatal69100 which kind of error do you have? Right now I'm looking on how to extend to 4 channels. I will look afterwards on how to generalise it.

scaramouche88 on 17 Mar 2020

I am facing a Seg Fault error at the beginning of my training set. I think there is too much to handle my 13-channels tiff layers

fatal69100 on 17 Mar 2020

Small update:

I don't use anymore the data augmentation (useless for my data btw), that was causing the bad results. Now I have results in line with expectation with 4 channels.
@fatal69100 have you solved? Do you have problem when you compile? I had some SegFault with some versions that had compilation issues
@AlexeyAB While calculating the metrics I have "OpenCV can't force load with 4 channels", that comes from here:
https://github.com/AlexeyAB/darknet/blob/0e063371500bc998584aa58313cee04b5cf354c4/src/image_opencv.cpp#L136-L156
If I understand, this means that images are loaded with flag cv::IMREAD_UNCHANGED, right? Does openCV reads nicely multichannel images?

Thanks!

scaramouche88 on 31 Mar 2020

Hi,

Can someone please summarize all the changes that are required in order to read and train with 4 channel images.

Thanks