Yolov3: Working with grayscale images

Created on 16 Oct 2019  路  13Comments  路  Source: ultralytics/yolov3

I would like to convert a custom Darknet model to ONNX using PyTorch as intermediate, my model has been trained with only one channel (grayscale images), custom image size and only one class.

I am not familiar with PyTorch but I can read some code. I can change the image size in the Darknet class, but I can't figure out how to set the number of channels and the number of classes to 1.

Stale bug

All 13 comments

@nscotto you set this is the cfg file. For greyscale you can set channels=1.
https://github.com/ultralytics/yolov3/blob/d1271941ad158dafd9385f4a9cfd997b51fcda5f/cfg/yolov3-spp.cfg#L10

@glenn-jocher I tried your solution but throws an error shown below, but when I use channels=3 there is no error. My dataset is a grayscale image.

RuntimeError: Given groups=1, weight of size 32 1 3 3, expected input[32, 3, 416, 416] to have 1 channels, but got 3 channels instead

@glenn-jocher

@nscotto you set this is the cfg file. For greyscale you can set channels=1.
https://github.com/ultralytics/yolov3/blob/d1271941ad158dafd9385f4a9cfd997b51fcda5f/cfg/yolov3-spp.cfg#L10

Thanks, it works like a charm!
After trying conversion to different popular frameworks, I can say this repo is the best solution to convert yolo from darknet! I converted my custom tiny-yolov3 to pytorch then ONNX.
I was mislead by the fact that there is a img_size parameter somewhere in the Darknet class where it could be read in the cfg as well. I probably have not inspected the code enough.

@rlgalvez

@glenn-jocher I tried your solution but throws an error shown below, but when I use channels=3 there is no error. My dataset is a grayscale image.

RuntimeError: Given groups=1, weight of size 32 1 3 3, expected input[32, 3, 416, 416] to have 1 channels, but got 3 channels instead

What is happening is that if you are training/testing in python, the images are opened with opencv so if you don't know you need to pass the argument cv2.IMREAD_GRAYSCALE to imread for grayscale images, otherwise the resulting objects will have 3 channels.

So modify every instances of cv2.imread from cv2.imread(fname) to cv2.imread(fname, IMREAD_GRAYSCALE) and it should work.

@glenn-jocher Thanks for the response, what python file should I modify? I tried editing train.py, datasets.py, utils.py but still throws an error. My dataset is already a grayscale image, I think the IMREAD_GRAYSCALE only loads the color image as grayscale.

If you are on linux, grep is your friend for this type of tasks:
Go to the main folder and:

grep -nR imread

output:

utils/utils.py:631:        img = cv2.imread(file)  # BGR
utils/datasets.py:97:            img0 = cv2.imread(path)  # BGR
utils/datasets.py:354:                        img = cv2.imread(str(p))
utils/datasets.py:379:                img = cv2.imread(img_path)  # BGR
utils/datasets.py:393:                    _ = io.imread(file)
utils/datasets.py:511:        img = cv2.imread(img_path)  # BGR
utils/datasets.py:766:    # cv2.imread() jpg at 230 img/s, *.bmp at 400 img/s
utils/datasets.py:776:            cv2.imwrite(save_name, cv2.imread(f))
train.py:254:                    tb_writer.add_image(fname, cv2.imread(fname)[:, :, ::-1], dataformats='HWC')

The thing you need to understand is that if you load a grayscale image with only one channel, and you don't pass the argument cv2.IMREAD_GRAYSCALE, opencv will still return an object with three channels, that is how it is designed.

@nscotto I see. Thanks for the info. I tried that also but throws a different error, I tried removing _ in line 515 below but the error still exists.

File "/home/yolov3/utils/datasets.py", line 515, in load_image h, w, _ = img.shape ValueError: not enough values to unpack (expected 3, got 2)

This is because in opencv images with 1 channel are stored in 2-D array, as a tradeoff between efficiency and convenience.
replace by:

h, w = img.shape

You will probably have to do this type of modification in other part of the code, this will all come because the code will be written for 3-D array and you are providing 2-D arrays.

@nscotto Thanks! I tried your suggestion but there are still errors on my code because the variable _ was not used. I don't know what to modify after this. Have you tried training a grayscale image using this repo?

No, I just wanted to convert a trained model to ONNX so I have not tried anything with training in pytorch.

As stated by glenn-jocher, the model will be created with the number of channels read in corresponding the cfg file.

What you just said:

there are still errors on my code because the variable _ was not used.

Because it's _ is not used and img.shape is a 2-D tuple, omiting should solve this error and not create other ones.

I am not sure how the channels are handled in YOLO and pytorch, I guess maybe pytorch is expecting 3-D images (despite the fact that the number of channel is set to 1), so you can try artificially converting the 2-D images into 3-D images with one channel.

what you can try is replacing every cv2.imread(fname, cv2.IMG_GRAYSCALE) by:

np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0)

Then you should let h, w, _ = img.shape as it is as you are now working with 3-D arrays.

I hope that will do the trick.

No, I just wanted to convert a trained model to ONNX so I have not tried anything with training in pytorch.

As stated by glenn-jocher, the model will be created with the number of channels read in corresponding the cfg file.

What you just said:

there are still errors on my code because the variable _ was not used.

Because it's _ is not used and img.shape is a 2-D tuple, omiting should solve this error and not create other ones.

I am not sure how the channels are handled in YOLO and pytorch, I guess maybe pytorch is expecting 3-D images (despite the fact that the number of channel is set to 1), so you can try artificially converting the 2-D images into 3-D images with one channel.

what you can try is replacing every cv2.imread(fname, cv2.IMG_GRAYSCALE) by:

np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0)

Then you should let h, w, _ = img.shape as it is as you are now working with 3-D arrays.

I hope that will do the trick.

I tried np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0) but there are still errors and I don't know what will do next, anyway thank you for your help @nscotto . I will study the code again.

No, I just wanted to convert a trained model to ONNX so I have not tried anything with training in pytorch.
As stated by glenn-jocher, the model will be created with the number of channels read in corresponding the cfg file.
What you just said:

there are still errors on my code because the variable _ was not used.

Because it's _ is not used and img.shape is a 2-D tuple, omiting should solve this error and not create other ones.
I am not sure how the channels are handled in YOLO and pytorch, I guess maybe pytorch is expecting 3-D images (despite the fact that the number of channel is set to 1), so you can try artificially converting the 2-D images into 3-D images with one channel.
what you can try is replacing every cv2.imread(fname, cv2.IMG_GRAYSCALE) by:

np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0)

Then you should let h, w, _ = img.shape as it is as you are now working with 3-D arrays.
I hope that will do the trick.

I tried np.expand_dims(cv2.imread(fname, cv2.IMG_GRAYSCALE), axis=0) but there are still errors and I don't know what will do next, anyway thank you for your help @nscotto . I will study the code again.

@rlgalvez, @glenn-jocher @nscotto

I got the same problems when working on grayscale images.

First of all, I think that all above mentioned functions with np.expand_dims(cv2.imread([..], cv2.IMG_GRAYSCALE), axis=0) should be replaced to np.expand_dims(cv2.imread([..], cv2.IMG_GRAYSCALE), axis=2) (as you want to add an additional channel at the end of the array, a regular RGB image also has the dimensions of h,w,c)

Then, I got the same error as you:

RuntimeError: Given groups=1, weight of size 32 1 3 3, expected input[8, 3, 416, 416] to have 1 channels, but got 3 channels instead

I figured out that the constructed tensor in the train function is not correct yet (as you can see in the error above, the second argument should be 1 instead of 3, as we'll have one channel instead of 3, [batch_size, number_of_channels, height, width])

I've tested the code by initializing a tensor with random values of size [8, 1, 416, 416] and then the training properly starts. Implicitly this means that the problem was in LoadImagesAndLabels function of datasets.py.

I altered the following:
In the __getitem__ function of LoadImagesAndLabels, I first disabled the mosaic and the augmentation function (as both are not super crucial for this test):
mosaic = False
self.augment = False

Then, the load_image function is activated returning a gray-scaled image (remember that in the load_image function the img = cv2.imread(img_path) # BGR was replaced by img = np.expand_dims(cv2.imread(img_path, cv2.IMREAD_GRAYSCALE), axis=2))

Because the letterbox function returns a 2D array I've added an additional line of code (img = np.expand_dims(img, axis=2)) above the line img = img[:, :, ::-1].transpose(2, 0, 1) (in the __getitem__ function of LoadImagesAndLabels in datasets.py). This will make the img a 3D array again that can be properly processed in train function (if I print the tensor size it will give torch.Size([8, 1, 416, 416])). Then the training on gray images can properly start. Please note, that I've not yet completed the training...

@glenn-jocher I'm sorry to bother you, but you can please do a sanity check on the above mentioned approach / code alterations? And how can I alter the load_mosaic function (in datasets.py) so that it can work on gray-scaled images (I'm a bit confused with the img4 and labels4)?

I guess that the disabling of the augment_hsv function makes sense, because we are working on gray images.

@pieterbl86 for HSV augmentation on single channel images you would probably just retain the V augmentation.

The basic premise for grescale training is to simply pass [bs, 1, h, w] rather than [bs, 3, h, w] as you mentioned, and also to set channels=1 in the cfg. I don't have time to work on this now, but I'll leave the issue open here. Alternatively if this is for commercial product development, I can send you a quote. I don't image the work would take more than a few hours.

Of course, the proposed changes would need to retain the augmentation functions, which greatly improve training results for most custom datasets under most circumstances.

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

Was this page helpful?
0 / 5 - 0 ratings