Yolov3: Working with grayscale images

Created on 16 Oct 2019 · 13Comments · Source: ultralytics/yolov3

I would like to convert a custom Darknet model to ONNX using PyTorch as intermediate, my model has been trained with only one channel (grayscale images), custom image size and only one class.

I am not familiar with PyTorch but I can read some code. I can change the image size in the Darknet class, but I can't figure out how to set the number of channels and the number of classes to 1.

Stale bug

Source

nscotto

All 13 comments

@nscotto you set this is the cfg file. For greyscale you can set channels=1.
https://github.com/ultralytics/yolov3/blob/d1271941ad158dafd9385f4a9cfd997b51fcda5f/cfg/yolov3-spp.cfg#L10

glenn-jocher on 16 Oct 2019

👍1

@glenn-jocher I tried your solution but throws an error shown below, but when I use channels=3 there is no error. My dataset is a grayscale image.

RuntimeError: Given groups=1, weight of size 32 1 3 3, expected input[32, 3, 416, 416] to have 1 channels, but got 3 channels instead

rlgalvez on 17 Oct 2019

@glenn-jocher

@nscotto you set this is the cfg file. For greyscale you can set channels=1.
https://github.com/ultralytics/yolov3/blob/d1271941ad158dafd9385f4a9cfd997b51fcda5f/cfg/yolov3-spp.cfg#L10

Thanks, it works like a charm!
After trying conversion to different popular frameworks, I can say this repo is the best solution to convert yolo from darknet! I converted my custom tiny-yolov3 to pytorch then ONNX.
I was mislead by the fact that there is a img_size parameter somewhere in the Darknet class where it could be read in the cfg as well. I probably have not inspected the code enough.

@rlgalvez

@glenn-jocher I tried your solution but throws an error shown below, but when I use channels=3 there is no error. My dataset is a grayscale image.

RuntimeError: Given groups=1, weight of size 32 1 3 3, expected input[32, 3, 416, 416] to have 1 channels, but got 3 channels instead

What is happening is that if you are training/testing in python, the images are opened with opencv so if you don't know you need to pass the argument cv2.IMREAD_GRAYSCALE to imread for grayscale images, otherwise the resulting objects will have 3 channels.

So modify every instances of cv2.imread from cv2.imread(fname) to cv2.imread(fname, IMREAD_GRAYSCALE) and it should work.

nscotto on 17 Oct 2019

@glenn-jocher Thanks for the response, what python file should I modify? I tried editing train.py, datasets.py, utils.py but still throws an error. My dataset is already a grayscale image, I think the IMREAD_GRAYSCALE only loads the color image as grayscale.

rlgalvez on 17 Oct 2019

If you are on linux, grep is your friend for this type of tasks:
Go to the main folder and:

grep -nR imread

output:

utils/utils.py:631:        img = cv2.imread(file)  # BGR
utils/datasets.py:97:            img0 = cv2.imread(path)  # BGR
utils/datasets.py:354:                        img = cv2.imread(str(p))
utils/datasets.py:379:                img = cv2.imread(img_path)  # BGR
utils/datasets.py:393:                    _ = io.imread(file)
utils/datasets.py:511:        img = cv2.imread(img_path)  # BGR
utils/datasets.py:766:    # cv2.imread() jpg at 230 img/s, *.bmp at 400 img/s
utils/datasets.py:776:            cv2.imwrite(save_name, cv2.imread(f))
train.py:254:                    tb_writer.add_image(fname, cv2.imread(fname)[:, :, ::-1], dataformats='HWC')

The thing you need to understand is that if you load a grayscale image with only one channel, and you don't pass the argument cv2.IMREAD_GRAYSCALE, opencv will still return an object with three channels, that is how it is designed.

nscotto on 17 Oct 2019

@nscotto I see. Thanks for the info. I tried that also but throws a different error, I tried removing _ in line 515 below but the error still exists.

File "/home/yolov3/utils/datasets.py", line 515, in load_image h, w, _ = img.shape ValueError: not enough values to unpack (expected 3, got 2)

rlgalvez on 17 Oct 2019

This is because in opencv images with 1 channel are stored in 2-D array, as a tradeoff between efficiency and convenience.
replace by:

h, w = img.shape

You will probably have to do this type of modification in other part of the code, this will all come because the code will be written for 3-D array and you are providing 2-D arrays.

nscotto on 18 Oct 2019

@nscotto Thanks! I tried your suggestion but there are still errors on my code because the variable _ was not used. I don't know what to modify after this. Have you tried training a grayscale image using this repo?

rlgalvez on 18 Oct 2019

No, I just wanted to convert a trained model to ONNX so I have not tried anything with training in pytorch.

As stated by glenn-jocher, the model will be created with the number of channels read in corresponding the cfg file.

What you just said:

there are still errors on my code because the variable _ was not used.

Because it's _ is not used and img.shape is a 2-D tuple, omiting should solve this error and not create other ones.

I am not sure how the channels are handled in YOLO and pytorch, I guess maybe pytorch is expecting 3-D images (despite the fact that the number of channel is set to 1), so you can try artificially converting the 2-D images into 3-D images with one channel.

what you can try is replacing every cv2.imread(fname, cv2.IMG_GRAYSCALE) by:

np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0)

Then you should let h, w, _ = img.shape as it is as you are now working with 3-D arrays.

I hope that will do the trick.

nscotto on 18 Oct 2019

No, I just wanted to convert a trained model to ONNX so I have not tried anything with training in pytorch.

As stated by glenn-jocher, the model will be created with the number of channels read in corresponding the cfg file.

What you just said:

there are still errors on my code because the variable _ was not used.

Because it's _ is not used and img.shape is a 2-D tuple, omiting should solve this error and not create other ones.

I am not sure how the channels are handled in YOLO and pytorch, I guess maybe pytorch is expecting 3-D images (despite the fact that the number of channel is set to 1), so you can try artificially converting the 2-D images into 3-D images with one channel.

what you can try is replacing every cv2.imread(fname, cv2.IMG_GRAYSCALE) by:
np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0)
Then you should let h, w, _ = img.shape as it is as you are now working with 3-D arrays.

I hope that will do the trick.

I tried np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0) but there are still errors and I don't know what will do next, anyway thank you for your help @nscotto . I will study the code again.

rlgalvez on 19 Oct 2019

No, I just wanted to convert a trained model to ONNX so I have not tried anything with training in pytorch.
As stated by glenn-jocher, the model will be created with the number of channels read in corresponding the cfg file.
What you just said:

there are still errors on my code because the variable _ was not used.

Because it's _ is not used and img.shape is a 2-D tuple, omiting should solve this error and not create other ones.
I am not sure how the channels are handled in YOLO and pytorch, I guess maybe pytorch is expecting 3-D images (despite the fact that the number of channel is set to 1), so you can try artificially converting the 2-D images into 3-D images with one channel.
what you can try is replacing every cv2.imread(fname, cv2.IMG_GRAYSCALE) by:
np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0)
Then you should let h, w, _ = img.shape as it is as you are now working with 3-D arrays.
I hope that will do the trick.
I tried np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0) but there are still errors and I don't know what will do next, anyway thank you for your help @nscotto . I will study the code again.

@rlgalvez, @glenn-jocher @nscotto

I got the same problems when working on grayscale images.

First of all, I think that all above mentioned functions with np.expand_dims(cv2.imread([..], cv2.IMG_GRAYSCALE), axis=0) should be replaced to np.expand_dims(cv2.imread([..], cv2.IMG_GRAYSCALE), axis=2) (as you want to add an additional channel at the end of the array, a regular RGB image also has the dimensions of h,w,c)

Then, I got the same error as you:

RuntimeError: Given groups=1, weight of size 32 1 3 3, expected input[8, 3, 416, 416] to have 1 channels, but got 3 channels instead

I figured out that the constructed tensor in the train function is not correct yet (as you can see in the error above, the second argument should be 1 instead of 3, as we'll have one channel instead of 3, [batch_size, number_of_channels, height, width])

I've tested the code by initializing a tensor with random values of size [8, 1, 416, 416] and then the training properly starts. Implicitly this means that the problem was in LoadImagesAndLabels function of datasets.py.

I altered the following:
In the __getitem__ function of LoadImagesAndLabels, I first disabled the mosaic and the augmentation function (as both are not super crucial for this test):
mosaic = False
self.augment = False

Then, the load_image function is activated returning a gray-scaled image (remember that in the load_image function the img = cv2.imread(img_path) # BGR was replaced by img = np.expand_dims(cv2.imread(img_path, cv2.IMREAD_GRAYSCALE), axis=2))

Because the letterbox function returns a 2D array I've added an additional line of code (img = np.expand_dims(img, axis=2)) above the line img = img[:, :, ::-1].transpose(2, 0, 1) (in the __getitem__ function of LoadImagesAndLabels in datasets.py). This will make the img a 3D array again that can be properly processed in train function (if I print the tensor size it will give torch.Size([8, 1, 416, 416])). Then the training on gray images can properly start. Please note, that I've not yet completed the training...

@glenn-jocher I'm sorry to bother you, but you can please do a sanity check on the above mentioned approach / code alterations? And how can I alter the load_mosaic function (in datasets.py) so that it can work on gray-scaled images (I'm a bit confused with the img4 and labels4)?

I guess that the disabling of the augment_hsv function makes sense, because we are working on gray images.

pieterbl86 on 20 Jan 2020

👍1

@pieterbl86 for HSV augmentation on single channel images you would probably just retain the V augmentation.

The basic premise for grescale training is to simply pass [bs, 1, h, w] rather than [bs, 3, h, w] as you mentioned, and also to set channels=1 in the cfg. I don't have time to work on this now, but I'll leave the issue open here. Alternatively if this is for commercial product development, I can send you a quote. I don't image the work would take more than a few hours.

Of course, the proposed changes would need to retain the augmentation functions, which greatly improve training results for most custom datasets under most circumstances.

glenn-jocher on 20 Jan 2020

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.