
Hi,
I load an jpeg file with scikit-image skimage.io.imread and opencv cv2.imread, but the raw data differs.
scikit-image == 0.12.3
import cv2
from skimage import io
import numpy as np
im1 = io.imread('0000001.jpg')
# convert RGB to BGR
im1 = im1[:,:,::-1]
print im1[:,:,2]
im2 = cv2.imread('0000001.jpg')
assert im1.shape == im2.shape
print im2[:,:,2]
print np.array_equal(im1, im2)
result
[[43 40 38 ..., 54 47 46]
[44 41 39 ..., 54 50 49]
[41 41 39 ..., 56 52 53]
...,
[53 57 58 ..., 14 13 13]
[53 56 56 ..., 10 9 9]
[60 64 63 ..., 6 6 9]]
[[43 40 39 ..., 54 47 46]
[44 41 39 ..., 54 50 49]
[41 41 39 ..., 56 52 53]
...,
[53 57 58 ..., 14 13 13]
[53 56 56 ..., 10 9 9]
[60 64 63 ..., 6 6 7]]
False
I am not sure the root cause.
looks like a rounding issue...
Have you tried different plugins? 'pil', 'matplotlib', 'qt', 'freeimage'
pil, matplotlib:
(mismatch 44.079356201839424%)
x: array([[[ 50, 132, 43],
[ 51, 131, 40],
[ 50, 131, 38],...
y: array([[[ 52, 132, 43],
[ 51, 131, 40],
[ 50, 130, 39],...
freeimage:
(mismatch 76.13787685096409%)
x: array([[[ 49, 132, 40],
[ 48, 131, 39],
[ 49, 131, 36],...
y: array([[[ 52, 132, 43],
[ 51, 131, 40],
[ 50, 130, 39],...
@twmht the image as read by scikit-image renders correctly when displayed using matplotlib.pyplot.imshow. Do you have any indication that scikit-image's result is any less correct than cv2's?
(As another diagnostic, you could compute the correlation coefficient between the CV2 and skimage arrays. I bet it will be very very high.)
@jni I'm not sure if it helps, but I've tried to open the image in GIMP, and the editor shows the same data as cv2.imread.
Saving the image without compression to .png, and re-looping the test on it works fine (no differences detected). So, there must be something with out jpeg backend.
@soupault ah, thanks! That's indeed worrisome! =\
@jni
Yup. Both displayed results are very close ( human eye can not tell their difference)
The uploaded jpeg file comes from io.imsave('0000001.jpg', ndarray, quality=90)
I try another png file, and it works as expected ( the raw data from both skimage and opencv is same)
Only jpeg images (or maybe other compressed images) have this problem.
Let's ping @almarklein and @blink1073 so they know about this issue.
@twmht : out of curiosity, if you compare the data used to create the uploaded jpeg to the data read by cv2 and skimage.io, which one is closer?
@jmetz : as @soupault said, he compared the data between GIMP and cv2.imread, and both raw data are same.
So cv2.imread is much more closer than skimage.io.
@twmht - I don't think I was asking the same thing; what @soupault showed was that whatever library GIMP uses gives the same result as cv2.imread.
What I was asking is which one is closer to the original data that was used to generate the jpeg.
OK I added my own test to answer my question about which is closer to the original data;
import cv2
import skimage.io as skio
import numpy as np
import tempfile
# Data
data = np.random.randint(0, 255, (100,100,3), dtype="uint8")
# Save to temporary file
filename = tempfile.mktemp(".jpg")
skio.imsave(filename, data, quality=90)
# Now load using cv2 and skimage.io and compare
datacv2 = cv2.imread(filename)
dataskio = skio.imread(filename)
print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2-dataskio)))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2-data)))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data-dataskio)))
which on one run yielded
Comparing cv2 with skio
False
2532864
Comparing original with cv2
False
3828855
Comparing original with skio
False
3809161
which is interesting; the sum of the absolute differences between the skio.imread and the original data is slightly smaller than the same quantity for cv2.imread....
Update
Having realised my SAD (sum of absolute differences) might be off because of the uint8 dtype, I added in a little casting for that measure, different values but the same result:
...
print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2.astype(int)-data.astype(int))))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data.astype(int)-dataskio.astype(int))))
yields
Comparing cv2 with skio
False
543666
Comparing original with cv2
False
1582326
Comparing original with skio
False
1397248
Saving with cv2
When saving the image with cv2 (using cv2.imwrite(filename, data, [int(cv2.IMWRITE_JPEG_QUALITY), 90])), the situation is reversed, e.g.
SAME AGAIN BUT SAVING WITH cv2
Comparing cv2 with skio
False
537232
Comparing original with cv2
False
1386014
Comparing original with skio
False
1567090
OpenCV and PIL/Matplotlib/etc may not use the same encoder, so it's not surprising that the results are slightly different. But if you average per pixel, the quantification errors look reasonable.
A better comparison would be to save to PNG and read back; that round trip should be flawless (or close to).
I don't think this is a bug, but if you disagree please reopen.
I noticed the same thing. I had a CNN model trained using skimage io as input, and now due to other needs, I need to use opencv. However, the model just does not work at all with opencv.
@bshao001 Can you elaborate on what doesn't work with your model when you read with cv2 (and do proper reshaping)? I am running a model where I read in inputs with cv2, it seems to work alas the accuracy is much lower than state of the art paper - I am trying to up the preprocessing though as that might be the reason
@duygusar Nothing special. The CNN model was trained early last year, and all the image input were based on skimage. A few months later, I just converted everything into CV2 without retraining the model, and the model prediction was very poor. As long as your training and prediction are both using skimage or CV2, your issue should not be related to this.
Also, be aware of: http://scikit-image.org/docs/stable/user_guide/data_types.html#working-with-opencv
@stefanv what causes this difference? Is it the reason for different encoder?
I have another question, I read @jmetz answer, but I am still confused, which library(opencv/skimage)
read image closer to the original image for jpg format?
If you care at all about preserving data, do not use jpeg.
Afaik, opencv reads images in BGR format, so you always have to have the step to convert BGR to RGB like below:
cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
Thanks for the catch @tispratik - unfortunately it doesn't change anything.
The updated version (which also compares to saving with cv2 to begin with):
import cv2
import skimage.io as skio
import numpy as np
import tempfile
# Data
data = np.random.randint(0, 255, (100, 100, 3), dtype="uint8")
# Save to temporary file
filename = tempfile.mktemp(".jpg")
filename2 = tempfile.mktemp(".jpg")
skio.imsave(filename, data, quality=90)
# Now load using cv2 and skimage.io and compare
datacv2 = cv2.imread(filename)
datacv2 = datacv2[..., ::-1]
dataskio = skio.imread(filename)
print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2.astype(int)-data.astype(int))))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data.astype(int)-dataskio.astype(int))))
print("SAME AGAIN BUT SAVING WITH cv2")
cv2.imwrite(filename2, data, [int(cv2.IMWRITE_JPEG_QUALITY), 90])
# Now load using cv2 and skimage.io and compare
datacv2 = cv2.imread(filename2)
datacv2 = datacv2[..., ::-1]
dataskio = skio.imread(filename2)
print("Comparing cv2 with skio")
print(np.array_equal(datacv2, dataskio))
print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
print("Comparing original with cv2")
print(np.array_equal(data, datacv2))
print(np.sum(np.abs(datacv2.astype(int)-data.astype(int))))
print("Comparing original with skio")
print(np.array_equal(data, dataskio))
print(np.sum(np.abs(data.astype(int)-dataskio.astype(int))))
This gives very similar output to before (i.e. the SAD score is still non-negligible for all comparisons):
Comparing cv2 with skio
False
360851
Comparing original with cv2
False
1382338
Comparing original with skio
False
1312803
SAME AGAIN BUT SAVING WITH cv2
Comparing cv2 with skio
False
358669
Comparing original with cv2
False
1562336
Comparing original with skio
False
1636675
@jmetz I can't seem to be able to reproduce.
I'm using conda+conda-forge to install everything and scikit-image 0.14.0
➤ tools/build_versions.py
docs.txt
sphinx is not installed
numpydoc is not installed
sphinx-gallery is not installed
scikit-learn is not installed
default.txt
numpy 1.15.2
scipy 1.1.0
matplotlib 2.2.2
networkx 2.2
pillow 5.2.0
imageio 2.3.0
PyWavelets 1.0.1
dask 0.19.2
cloudpickle 0.5.6
build.txt
Cython is not installed
wheel 0.32.0
numpy 1.15.2
requirements-parser 0.2.0
test.txt
pytest is not installed
pytest-cov is not installed
flake8 is not installed
codecov is not installed
optional.txt
imread is not installed
SimpleITK is not installed
astropy is not installed
tifffile is not installed
qtpy is not installed
OpenCV version is 3.4.3.
In [2]:
...: print("Comparing cv2 with skio")
...: print(np.array_equal(datacv2, dataskio))
...: print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
Comparing cv2 with skio
True
0
In [3]: print("SAME AGAIN BUT SAVING WITH cv2")
...: cv2.imwrite(filename2, data, [int(cv2.IMWRITE_JPEG_QUALITY), 90])
...: # Now load using cv2 and skimage.io and compare
...: datacv2 = cv2.imread(filename2)
...: datacv2 = datacv2[..., ::-1]
...: dataskio = skio.imread(filename2)
SAME AGAIN BUT SAVING WITH cv2
In [4]:
In [4]: print("Comparing cv2 with skio")
...: print(np.array_equal(datacv2, dataskio))
...: print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
Comparing cv2 with skio
True
0
@hmaarrfk why not 0.14.1? ;)
Hi @hmaarrfk - curios, I have the same skimage and cv2 versions, though I imagine it simply means that on my system skimage and cv2 use a different encoder as pointed out by @stefanv.
IMO this is more a curiosity rather than anything else and simply related to the JPEG library used under the hood by cv2 vs skimage - I replied to @tispratik with the updated code as there seemed to be the suggestion that this could be the cause of the issue, but it wasn't.
On your system it seems both cv2 and skimage are using either the same or at least quantitatively equivalent JPEG libraries.
@jmetz, what OS are you on? I am on linux+conda-forge. If jpeg is really important this seems like a serious issue. You can try conda and conda-forge for more updated versions of packages.
@jni I didn't use 0.14.1 because we changed the default plugin backend. That said, 0.14.1 also doesn't see any differences between scikit-image and opencv.
@hmaarrfk - I'm on linux also (Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux), but use standard pip and a python install alongside the system python.
My guess is that my less polished approach to my python package management is why opencv and scikit-image are using different JPEG libraries, while as you installed everything via conda's ecosystem they're probably both using the same JPEG library.
To verify I just ran the code using anaconda and I get the same results as you.
Seems like a pretty serious error to have. Even if we attribute 360_851 to rounding errors, it is alot more than the allowable 100 * 100 * 3 = 30_000 rounding errors.
Glad to hear that vanilla anconda also works. I specified conda-forge because they basically repackaged most things and some C libraries might differ and be incompatible.
You are totally right, the pip install is strange! I'm on Ubuntu 16.04.
Well if somebody wants to get to the bottom of this they should likely file an issue with imageio/PIL or OpenCV
@hmaarrfk - to point again to @stefanv's comment,
If you care at all about preserving data, do not use jpeg.
it highlights why such a big discrepancy isn't that strange; as JPEG basically throws away what it deems as unimportant information (via the discreet cosine transform) in order to compress the data, it's not that surprising that the sum of the pixel-level differences are much much higher than rounding error, as we're not dealing with rounding error, but having picked slightly different compression algorithm.
*Albeit slightly different version of (in theory) the same compression algorithm that is!
I'm only interested in the line: print(np.sum(np.abs(datacv2.astype(int)-dataskio.astype(int))))
Decompression should yield identical results no?
Ahh I see your point, and why that could potentially be the case.
I wasn't sure either but found this discussion: https://groups.google.com/forum/#!topic/rec.photo.digital/yAxoW9HyHPQ
To summarise; there should be only one way to decode a JPEG, but apparently some libraries / applications take short-cuts to optimise the decode speed, creating different decodings.
@jmetz whoa, I didn't know that. That is super useful to know. I always thought the loss process was only during the encoding stage! Thanks
For those who are curious about the "lossy" part of JPEG encoding, it occurs when JPEGs perceptual model is applied—the part that says our eyes care less about certain frequencies than others. More about JPEG quantization factors at https://en.wikipedia.org/wiki/JPEG#Quantization
@jmetz I never knew about the decoding approximations & shortcuts; good to know!
Most helpful comment
I noticed the same thing. I had a CNN model trained using skimage io as input, and now due to other needs, I need to use opencv. However, the model just does not work at all with opencv.