Scikit-image: raw data from skimage.io.imread differs from opencv cv2.imread

Created on 8 Sep 2016 · 38Comments · Source: scikit-image/scikit-image

0000001
Hi,

I load an jpeg file with scikit-image skimage.io.imread and opencv cv2.imread, but the raw data differs.

scikit-image == 0.12.3

import cv2
from skimage import io
import numpy as np

im1 = io.imread('0000001.jpg')
# convert RGB to BGR
im1 = im1[:,:,::-1]
print im1[:,:,2]


im2 = cv2.imread('0000001.jpg')
assert im1.shape == im2.shape
print im2[:,:,2]


print np.array_equal(im1, im2)

result

[[43 40 38 ..., 54 47 46]
 [44 41 39 ..., 54 50 49]
 [41 41 39 ..., 56 52 53]
 ...,
 [53 57 58 ..., 14 13 13]
 [53 56 56 ..., 10  9  9]
 [60 64 63 ...,  6  6  9]]
[[43 40 39 ..., 54 47 46]
 [44 41 39 ..., 54 50 49]
 [41 41 39 ..., 56 52 53]
 ...,
 [53 57 58 ..., 14 13 13]
 [53 56 56 ..., 10  9  9]
 [60 64 63 ...,  6  6  7]]
False

I am not sure the root cause.

needs decision bug

Source

twmht

❤1

Most helpful comment

I noticed the same thing. I had a CNN model trained using skimage io as input, and now due to other needs, I need to use opencv. However, the model just does not work at all with opencv.

bshao001 on 18 Nov 2017

😄6 👍3

All 38 comments

looks like a rounding issue...

sciunto on 8 Sep 2016

Have you tried different plugins? 'pil', 'matplotlib', 'qt', 'freeimage'

sciunto on 8 Sep 2016

pil, matplotlib:

(mismatch 44.079356201839424%)
 x: array([[[ 50, 132,  43],
        [ 51, 131,  40],
        [ 50, 131,  38],...
 y: array([[[ 52, 132,  43],
        [ 51, 131,  40],
        [ 50, 130,  39],...

freeimage:

(mismatch 76.13787685096409%)
 x: array([[[ 49, 132,  40],
        [ 48, 131,  39],
        [ 49, 131,  36],...
 y: array([[[ 52, 132,  43],
        [ 51, 131,  40],
        [ 50, 130,  39],...

soupault on 8 Sep 2016

@twmht the image as read by scikit-image renders correctly when displayed using matplotlib.pyplot.imshow. Do you have any indication that scikit-image's result is any less correct than cv2's?

jni on 8 Sep 2016

(As another diagnostic, you could compute the correlation coefficient between the CV2 and skimage arrays. I bet it will be very very high.)

jni on 8 Sep 2016

@jni I'm not sure if it helps, but I've tried to open the image in GIMP, and the editor shows the same data as cv2.imread.
Saving the image without compression to .png, and re-looping the test on it works fine (no differences detected). So, there must be something with out jpeg backend.

soupault on 8 Sep 2016

👀1

@soupault ah, thanks! That's indeed worrisome! =\

jni on 8 Sep 2016

@jni

Yup. Both displayed results are very close ( human eye can not tell their difference)

The uploaded jpeg file comes from io.imsave('0000001.jpg', ndarray, quality=90)

I try another png file, and it works as expected ( the raw data from both skimage and opencv is same)

Only jpeg images (or maybe other compressed images) have this problem.

twmht on 8 Sep 2016

👀1

Let's ping @almarklein and @blink1073 so they know about this issue.

sciunto on 8 Sep 2016

@twmht : out of curiosity, if you compare the data used to create the uploaded jpeg to the data read by cv2 and skimage.io, which one is closer?

jmetz on 20 Sep 2016

@jmetz : as @soupault said, he compared the data between GIMP and cv2.imread, and both raw data are same.

So cv2.imread is much more closer than skimage.io.

twmht on 20 Sep 2016

@twmht - I don't think I was asking the same thing; what @soupault showed was that whatever library GIMP uses gives the same result as cv2.imread.

What I was asking is which one is closer to the original data that was used to generate the jpeg.

jmetz on 20 Sep 2016

👍3

OK I added my own test to answer my question about which is closer to the original data;

import cv2
import skimage.io as skio
import numpy as np
import tempfile

# Data
data = np.random.randint(0, 255, (100,100,3), dtype="uint8")

# Save to temporary file
filename = tempfile.mktemp(".jpg")
skio.imsave(filename, data, quality=90)

# Now load using cv2 and skimage.io and compare
datacv2 = cv2.imread(filename)
dataskio = skio.imread(filename)

print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2-dataskio)))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2-data)))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data-dataskio)))

which on one run yielded

Comparing cv2 with skio
False
2532864
Comparing original with cv2
False
3828855
Comparing original with skio
False
3809161

which is interesting; the sum of the absolute differences between the skio.imread and the original data is slightly smaller than the same quantity for cv2.imread....

jmetz on 20 Sep 2016

Update

Having realised my SAD (sum of absolute differences) might be off because of the uint8 dtype, I added in a little casting for that measure, different values but the same result:

...
print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2.astype(int)-data.astype(int))))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data.astype(int)-dataskio.astype(int))))

yields

Comparing cv2 with skio
False
543666
Comparing original with cv2
False
1582326
Comparing original with skio
False
1397248

jmetz on 20 Sep 2016

👍3

Saving with cv2

When saving the image with cv2 (using cv2.imwrite(filename, data, [int(cv2.IMWRITE_JPEG_QUALITY), 90])), the situation is reversed, e.g.

SAME AGAIN BUT SAVING WITH cv2
Comparing cv2 with skio
False
537232
Comparing original with cv2
False
1386014
Comparing original with skio
False
1567090

jmetz on 20 Sep 2016

👍2

OpenCV and PIL/Matplotlib/etc may not use the same encoder, so it's not surprising that the results are slightly different. But if you average per pixel, the quantification errors look reasonable.

A better comparison would be to save to PNG and read back; that round trip should be flawless (or close to).

stefanv on 15 Feb 2017

👍3

I don't think this is a bug, but if you disagree please reopen.

stefanv on 15 Feb 2017

👍2

I noticed the same thing. I had a CNN model trained using skimage io as input, and now due to other needs, I need to use opencv. However, the model just does not work at all with opencv.

bshao001 on 18 Nov 2017

😄6 👍3

@bshao001 Can you elaborate on what doesn't work with your model when you read with cv2 (and do proper reshaping)? I am running a model where I read in inputs with cv2, it seems to work alas the accuracy is much lower than state of the art paper - I am trying to up the preprocessing though as that might be the reason

duygusar on 4 Jan 2018

@duygusar Nothing special. The CNN model was trained early last year, and all the image input were based on skimage. A few months later, I just converted everything into CV2 without retraining the model, and the model prediction was very poor. As long as your training and prediction are both using skimage or CV2, your issue should not be related to this.

bshao001 on 4 Jan 2018

Also, be aware of: http://scikit-image.org/docs/stable/user_guide/data_types.html#working-with-opencv

stefanv on 4 Jan 2018

@stefanv what causes this difference? Is it the reason for different encoder?
I have another question, I read @jmetz answer, but I am still confused, which library(opencv/skimage)
read image closer to the original image for jpg format?

ujsyehao on 2 Mar 2018

If you care at all about preserving data, do not use jpeg.

stefanv on 2 Mar 2018

❤1

Afaik, opencv reads images in BGR format, so you always have to have the step to convert BGR to RGB like below:
cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

tispratik on 30 Sep 2018

Thanks for the catch @tispratik - unfortunately it doesn't change anything.

The updated version (which also compares to saving with cv2 to begin with):

import cv2
import skimage.io as skio
import numpy as np
import tempfile

# Data
data = np.random.randint(0, 255, (100, 100, 3), dtype="uint8")

# Save to temporary file
filename = tempfile.mktemp(".jpg")
filename2 = tempfile.mktemp(".jpg")
skio.imsave(filename, data, quality=90)

# Now load using cv2 and skimage.io and compare
datacv2 = cv2.imread(filename)
datacv2 = datacv2[..., ::-1]
dataskio = skio.imread(filename)

print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2.astype(int)-data.astype(int))))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data.astype(int)-dataskio.astype(int))))

print("SAME AGAIN BUT SAVING WITH cv2")
cv2.imwrite(filename2, data, [int(cv2.IMWRITE_JPEG_QUALITY), 90])
# Now load using cv2 and skimage.io and compare
datacv2 = cv2.imread(filename2)
datacv2 = datacv2[..., ::-1]
dataskio = skio.imread(filename2)

print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2.astype(int)-data.astype(int))))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data.astype(int)-dataskio.astype(int))))

This gives very similar output to before (i.e. the SAD score is still non-negligible for all comparisons):

Comparing cv2 with skio
False
360851
Comparing original with cv2
False
1382338
Comparing original with skio
False
1312803
SAME AGAIN BUT SAVING WITH cv2
Comparing cv2 with skio
False
358669
Comparing original with cv2
False
1562336
Comparing original with skio
False
1636675

jmetz on 2 Oct 2018

@jmetz I can't seem to be able to reproduce.

I'm using conda+conda-forge to install everything and scikit-image 0.14.0

➤ tools/build_versions.py
docs.txt
              sphinx is not installed
            numpydoc is not installed
      sphinx-gallery is not installed
        scikit-learn is not installed
default.txt
               numpy 1.15.2
               scipy 1.1.0
          matplotlib 2.2.2
            networkx 2.2
              pillow 5.2.0
             imageio 2.3.0
          PyWavelets 1.0.1
                dask 0.19.2
         cloudpickle 0.5.6
build.txt
              Cython is not installed
               wheel 0.32.0
               numpy 1.15.2
 requirements-parser 0.2.0
test.txt
              pytest is not installed
          pytest-cov is not installed
              flake8 is not installed
             codecov is not installed
optional.txt
              imread is not installed
           SimpleITK is not installed
             astropy is not installed
            tifffile is not installed
                qtpy is not installed

OpenCV version is 3.4.3.

In [2]:  
   ...: print("Comparing cv2 with skio") 
   ...: print(np.array_equal(datacv2, dataskio)) 
   ...: print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int)))) 
Comparing cv2 with skio
True
0


In [3]: print("SAME AGAIN BUT SAVING WITH cv2") 
   ...: cv2.imwrite(filename2, data, [int(cv2.IMWRITE_JPEG_QUALITY), 90]) 
   ...: # Now load using cv2 and skimage.io and compare 
   ...: datacv2 = cv2.imread(filename2) 
   ...: datacv2 = datacv2[..., ::-1] 
   ...: dataskio = skio.imread(filename2)                                                                
SAME AGAIN BUT SAVING WITH cv2

In [4]:                                                                                                  

In [4]: print("Comparing cv2 with skio") 
   ...: print(np.array_equal(datacv2, dataskio)) 
   ...: print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))                                  
Comparing cv2 with skio
True
0

hmaarrfk on 2 Oct 2018

@hmaarrfk why not 0.14.1? ;)

jni on 2 Oct 2018

🎉1

Hi @hmaarrfk - curios, I have the same skimage and cv2 versions, though I imagine it simply means that on my system skimage and cv2 use a different encoder as pointed out by @stefanv.

IMO this is more a curiosity rather than anything else and simply related to the JPEG library used under the hood by cv2 vs skimage - I replied to @tispratik with the updated code as there seemed to be the suggestion that this could be the cause of the issue, but it wasn't.

On your system it seems both cv2 and skimage are using either the same or at least quantitatively equivalent JPEG libraries.

jmetz on 2 Oct 2018

@jmetz, what OS are you on? I am on linux+conda-forge. If jpeg is really important this seems like a serious issue. You can try conda and conda-forge for more updated versions of packages.

@jni I didn't use 0.14.1 because we changed the default plugin backend. That said, 0.14.1 also doesn't see any differences between scikit-image and opencv.

hmaarrfk on 2 Oct 2018

@hmaarrfk - I'm on linux also (Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux), but use standard pip and a python install alongside the system python.
My guess is that my less polished approach to my python package management is why opencv and scikit-image are using different JPEG libraries, while as you installed everything via conda's ecosystem they're probably both using the same JPEG library.

To verify I just ran the code using anaconda and I get the same results as you.

jmetz on 2 Oct 2018

👀1

Seems like a pretty serious error to have. Even if we attribute 360_851 to rounding errors, it is alot more than the allowable 100 * 100 * 3 = 30_000 rounding errors.

Glad to hear that vanilla anconda also works. I specified conda-forge because they basically repackaged most things and some C libraries might differ and be incompatible.

hmaarrfk on 2 Oct 2018

You are totally right, the pip install is strange! I'm on Ubuntu 16.04.

Well if somebody wants to get to the bottom of this they should likely file an issue with imageio/PIL or OpenCV

hmaarrfk on 2 Oct 2018

@hmaarrfk - to point again to @stefanv's comment,

If you care at all about preserving data, do not use jpeg.

it highlights why such a big discrepancy isn't that strange; as JPEG basically throws away what it deems as unimportant information (via the discreet cosine transform) in order to compress the data, it's not that surprising that the sum of the pixel-level differences are much much higher than rounding error, as we're not dealing with rounding error, but having picked slightly different compression algorithm.

jmetz on 2 Oct 2018

*Albeit slightly different version of (in theory) the same compression algorithm that is!

jmetz on 2 Oct 2018

I'm only interested in the line: print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))

Decompression should yield identical results no?

hmaarrfk on 2 Oct 2018

Ahh I see your point, and why that could potentially be the case.
I wasn't sure either but found this discussion: https://groups.google.com/forum/#!topic/rec.photo.digital/yAxoW9HyHPQ

To summarise; there should be only one way to decode a JPEG, but apparently some libraries / applications take short-cuts to optimise the decode speed, creating different decodings.

jmetz on 2 Oct 2018

👍3

@jmetz whoa, I didn't know that. That is super useful to know. I always thought the loss process was only during the encoding stage! Thanks

hmaarrfk on 2 Oct 2018

For those who are curious about the "lossy" part of JPEG encoding, it occurs when JPEGs perceptual model is applied—the part that says our eyes care less about certain frequencies than others. More about JPEG quantization factors at https://en.wikipedia.org/wiki/JPEG#Quantization

@jmetz I never knew about the decoding approximations & shortcuts; good to know!

stefanv on 3 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

camera(man) image must be removed for license issues

carandraug · 29Comments

Enhancement of peak detection features

sciunto · 24Comments

Fixes for data types and other common sources of errors

jni · 27Comments

Work on Transonic for scikit-image kernels: need small help

paugier · 35Comments

feature.match_template gives out-of-range match values

zpincus · 48Comments