Hello all,
I have found that in at least certain modes, skimage.feature.match_template calculates very incorrect cross-correlation values, well outside of the [-1,1] range that is expected. I have attached a simple test case, with a large image and a small template. This old StackOverflow answer by @stefanv seems to suggest that such a bug was fixed a while ago, but if so, it's back.
Different padding modes change the scale of the misbehavior, but even pad_input=False still yields wrong cross-correlation values.
import skimage
skimage.__version__ # '0.12dev'
from skimage import feature
from skimage import io
template = io.imread('template.png')
image = io.imread('image.png')
peaks = feature.match_template(image, template, pad_input=True, mode='edge')
peaks.min() # -26.632982
peaks.max() # 1.7565135
Strangely, if I crop the image a little, to include the positions of the erroneous min and max values, then the values change. But cross-correlation should be a local operation! Maybe the problem appears if the image is too large?
import numpy
numpy.unravel_index(peaks.argmin(), peaks.shape) # (2449, 885)
numpy.unravel_index(peaks.argmax(), peaks.shape) # (2494, 404)
image.shape # (2500, 1660)
template.shape # (76, 76)
small = image[2000:, :1200]
peaks_small = feature.match_template(small, template, pad_input=True, mode='edge')
peaks_small.min() # -0.5584954
peaks_small.max() # 0.82173121


Well, it's a precision issue!
Changing skimage/feature/template.py:116 from:
image = np.array(image, dtype=np.float32, copy=False)
to
image = np.array(image, dtype=np.float64, copy=False)
fixes the problem.
That is rather surprising.
I agree, and I don't really know what to make of it. Where could the imprecision be creeping in? And in what way would cropping the image solve the problem? That suggests the integral image, but that seems odd too.
So, ah, should I submit a PR for increasing the internal precision to float64? It feels like switching precision is just papering over some more fundamental error... but I don't know this code well enough to figure out what that might be. Who originally wrote this? Maybe he or she has a sense.
I took a cursory glance at the problem and it does seem to me to be a problem with the round off in the integral images. So there is no error per say, its the drawback of the implemented approach. With large images even float64 is going to start rounding off.
This problem would probably be solved by either a direct implementation of template matching. This will remove any roundoff errors, however it will be very slow. The problem could also be solved using a fourier transform (DFT), which should be faster. DFT will round off as well, however the result of that error will not be as catastrophic.
I think the problem isn't the integral image cumulative imprecision per se -- I suspect it is in the fact that the numerator (based on the integral of image) and the denominator (the integral of image**2) have different cumulative errors, so dividing them can yield unexpected results.
Is there a way to compute only a single integral image, and then derive both the numerator and denominator from that? If so, the cumulative imprecision would very likely not be a problem. Given that a sum of squares is definitely not equal to the square of a sum, this might not be trivial.
What if we use convolution instead of integral images in window _window_sum_*d functions to calculate the window sums?
That seems likely to work. Hopefully not drastically slower than the integral images! But correct results are probably a little more important than fast ones.
This is a remarkable bug. If I slice the images even ever so slightly, it does not appear!
Slightly smaller test example:
from skimage import feature, io
import numpy as np
image = io.imread('/tmp/image_.png')
template = np.ones((11, 6), dtype=bool)
template[:3, :] = False
feature.match_template(image, template).max()

@stefanv, I'm not sure what to make of your example. Guess I'll just have to run it.
Do you agree with the proposed solution?
@MartinSavc I was trying to find something we could include in the test suite.
I'm not sure if what you suggest will fix the problem. Have you tried?
I've replaced the function with:
def _window_sum_2d(image, window_shape):
window_sum = fftconvolve(image, np.ones(window_shape), mode='full')
N,M = window_shape
window_sum = window_sum[N:-N,M:-M]
return window_sum
This fixed the former problem: my results were -0.566894 for min, and 0.931608 for max. However I still get unexpected results for your example: 77.1218 (3.87901 is the original result I've gotten).
This feels like a floating point precision issue. Perhaps somewhere, we are subtracting two floating point numbers (bad), and then amplifying the result.
This could potentially be avoided by finding a different expression that calculates the same thing.
Your example uses a boolean template - if a change it to float32, then using the current match template example I get 6.64278, but with my modification I get 0.98226.
Can you calculate the differences between the convolution method and the table method of getting the window sum so we can gauge the magnitude of that error? What worries me is that we may simply be hiding the error, without a really good understanding of what is happening under the hood. Changing almost anything about this example makes it work under the old code as well.
I think the boolean template doesn't work, because when summing the values from the template we simply get True which equals to 1. This might be something else to fix.
So these are the differences between the windows sum of the original method and the one using convolution. On the left is sum of values, on the right is some of squared values.

The absolute differences are very small, on the scale of 1e-11. So there must be some later division, that upsets the calculation. I'm looking into it further.
Thanks for your persistence, Martin. It's so much better not to slog through these types of things alone :) I'm re-reading the original paper to see if there is anything that can be improved in the formulation.
So I've compared the denominator for both approaches and compared those:

The differences really manifested here. And they are also very large, order of +/- 1e19.
OK, great, I'll focus my attention there.
So, is this denomenator calculated using that input differing by 1e-11, or are you calculating the denomenator itself in a different way?
The denominator is calculated using those inputs - once using inputs from integral images, once from convolution.
So I've done a mistake - I've forgot to cast and pad the image appropriately, before calculating the sums. I've corrected this now (I added the first part of the script, and got somewhat different results, which are more in line with my previous assumptions about integral images. These are the comparisons of 'sum of value' and 'sum of squared value' images calculated using integral image and convolution. It is infact the sum of squares that has a large error at the bottom, where the integral image is losing precision in a 32 bit floating point value. Convolution does not have the same
problem, as it doesn't need to sum all the values in those columns, but sums only the values inside the windows (either directly or using DFT).

Is that the right image? Those values are still very small.
IMHO:
The difference image on the left shows a pattern, consistent with random round-off errors (not white noise random, but little correlation with image content). Even so these have produced a non random difference in the calculated denominator, so I would still need to take a look into that.
The difference image on the right show a much more structured error, that is most likely the result of accumulating round-off errors in the integral image. It is highly dependant on image content.
Edit: These are all absolute errors. Do you have a suggestion for calculating a relative error? In relation to what should we calculate it? Maybe against a calculation using double data.
So, if you add noise with magnitude 1e-11 to the convolution-calculated result, does it also mess up the results? If such small errors have such large effects (answers going up to 35, I've seen so far!) one should be extremely cautious.
Well, I no longer think it's 1e-11, but -40 (the integral is smaller than convolution). And the difference isn't in the form of noise, but integral is consistently smaller than convolution.
The results with 1e-11 were produced using a 8 bit image without padding. I'm doing all of this is a notebook, If you wish I can share it with you (a sharing platform with skimage would be great).
A 1D demonstration of the problem (as I perceive it) with the code:
win_size = 100
s = np.float32((np.random.randn(10000)*100)**2)
s_int = s.cumsum()
s_int_win = (s_int[win_size:] - s_int[:-win_size])
s_conv_win = fftconvolve(s, np.ones(win_size))
s_conv_win = s_conv_win[win_size:-win_size+1]
clf()
subplot(211)
plot(s_int_win, label='filtered using int')
plot(s_conv_win, label='filtered using conv')
legend(loc='best')
subplot(212)
plot(s_int_win - s_conv_win,label='s_int_win - s_conv_win')
legend(loc='best')

As you can see from the figure, the error between using convolution and integral images is not large. But the problem is, the error is growing with the number of samples. In template matching the results of this calculation (filtering) is used as a denominator. It's only when the sum is large enough does this manifest as an observable error.
If the convolution is calculated using DFT the issue is resolved by using the Cooley-Tukey or similar algorithm. Using that algorithm you avoid summing the values of the image sequentially (adding a small value to an already large partial sum). A direct DFT would most likely have a similar problem, when calculating the mean.
So I've looked into stable summation in numpy. I guess the best solution would replace the np.cumsum with a stable cumulative summation method, such as Hakan algorithm. That should fix the problem here with minimal changes in the existing code. However I have not found any such replacement in numpy. For some methods this might also be dependent on the underlying BLAS and LAPACK libraries used, which might have similar issues.
I've also identified an issue with using Boolean or integer templates on Python 2.x. The template is not converted to floating point values. In line 144 of scikit-image/skimage/feature/template.py
nom = xcorr - image_window_sum * (template.sum() / template_volume)
the template values are summed and divided by volume. For boolean and integer arrays the sum will be an integer, the template volume is also an integer. In python 2.x this results in integer division, while in python 3.x all division are converted to floating point. Using:
from __future__ import division
fixed it.
@MartinSavc Take a look at https://github.com/scikit-image/scikit-image/pull/1829
I tried to replace cumsum with your Kahan implementation. However your implementation requires double data type, while match_template uses image data type float. Our current examples are solved if double is used. I will try to find a more suitable example, that will fail regardless of data type.
Is there a way to fix your implementation to work with both float and double? (I'm not that familiar with cython so I don't know).
I've uploaded a new version with fused types that should do the trick.
By using:
t = np.arange(0, 1024)/1024.
x = np.sin(t*np.pi*5).reshape(-1, 1)
x += np.sin(t*np.pi*11).reshape(-1, 1)*0.5
y = np.sin(t*np.pi*7).reshape(1, -1)
y += np.sin(t*np.pi*4).reshape(1, -1)
img = x.dot(y)
img **= 8
img /= img.max()
to generate an image, I've gotten wrong results for template matching using double precision images and using fftconvolve (which uses double internally) instead of integral images. It looks like:

Once the image is quantized to 8 bits, than I get similar results as already observed - wrong with single precision (float), correct with double and using fftconvolve. I have not yet tested Kahan summation method.
@MartinSavc Did you get a chance to try the new version of Kahan I sent?
Not yet. Sorry for being somewhat slow. I planned on testing it by the end of the week.
Ok, I've tried Kahan summation. It fixed the problems with both real image samples (first and second error example) but still fails with my generated one (the third example). Ill post some code later today, so you may check it out.
That's excellent--it means we're on the right track. We can try to implement one of the more accurate summation methods, and see if that helps even on your pathological example.
Ok, I've just realized I can simply copy the raw *.ipynb file to gist: here
The uploaded notebook has match_template modified, so that it converts image to float64 instead of float32. That means that the examples loading from image files work. So in the notebook the only example that fails is the generated synthetic example at the end (blocks 47 and 48).
The example has multiple _window_sum_2d implementations such as _window_sum_2d_conv, _window_sum_2d_int and _window_sum_2d_raw. The _window_sum_2d is set to one of them as a reference before calling match_template.
The _window_sum_2d_raw implementation is a slow direct implementation that is supposed to work as a reference. Based on the last test it seems to work better than the other implementations.
Any news on this? I observed values larger than one with scikit-image version 0.14.2.
@bitzl thanks for surfacing this. It doesn't look like it was ever resolved. I agree with @stefanv though — the quick fix of using a standard convolution is just a band-aid, and a computationally expensive one at that. Our integral image code is used elsewhere and we really need to ensure it's accurate.
This paper seems relevant. Anyone interested, please email me if you don't have access to it:
https://ieeexplore.ieee.org/document/8249556
Surprisingly, we use the algorithm they call MRCLSEQ, which has decent numerical accuracy compared to the naive implementation. However there is an order of magnitude accuracy improvement by using the hierarchical "sweep" algorithms, so it might be worth implementing those.
I intend to work on this but it might take a while... Contributions would be welcome!
The short term easy fix is to use double precision — that will get rid of a bunch of error cases.
Errmm
The short term easy fix is to use double precision — that will get rid of a bunch of error cases.
Nope, we are already using double precision:
https://github.com/scikit-image/scikit-image/blob/6d048c17b96b38979bc689c67609aa29d02a5502/skimage/feature/template.py#L123
Another easy-fix option is a slow_but_accurate flag that can be turned on to use standard convolution, but is off by default...
Can't reproduce
>>> import skimage
>>> skimage.__version__
'0.17.dev0'
>>> from skimage import feature
>>> from skimage import io
>>> template = io.imread('template.png')
>>> image = io.imread('image.png')
>>> peaks = feature.match_template(image, template, pad_input=True, mode='edge')
>>> peaks.min()
-0.566894118051496
>>> peaks.max()
0.9316078746009169
Closing ;-)
@rfezzani we typically don't close when we can't reproduce until the user confirms that their problem has gone away also. In this case, it might be that NumPy has begun using a more accurate summation algorithm. @stefanv do you know whether this is the case?
@zpincus can you still reproduce this issue with current skimage and NumPy?
I for sure been too fast closing this one, sorry... :pray: Reopening!
Need to verify that the failure no longer exists for the example in https://github.com/scikit-image/scikit-image/issues/1712#issuecomment-164779091
OK, I hope that I correctly understood https://github.com/scikit-image/scikit-image/issues/1712#issuecomment-164779091. Here are the results that I obtain
>>> t = np.arange(0, 1024)/1024.
>>> x = np.sin(t*np.pi*5).reshape(-1, 1)
>>> x += np.sin(t*np.pi*11).reshape(-1, 1)*0.5
>>> y = np.sin(t*np.pi*7).reshape(1, -1)
>>> y += np.sin(t*np.pi*4).reshape(1, -1)
>>> img = x.dot(y)
>>> img **= 8
>>> img /= img.max()
>>> img_u8 = skimage.img_as_ubyte(img)
>>> img.shape
(1024, 1024)
>>> peaks = feature.match_template(img, template, pad_input=True, mode='edge')
>>> peaks.min()
-0.4641918323737716
>>> peaks.max()
0.7053328286042656
>>> peaks = feature.match_template(img_u8, template, pad_input=True, mode='edge')
>>> peaks.min()
-0.39587459317094925
>>> peaks.max()
0.7278294380600715
>>> peaks = feature.match_template(img_u8.astype(np.float32), template, pad_input=True, mode='edge')
>>> peaks.max()
0.7278294380600715
>>> peaks.min()
-0.39587459317094925
>>>
Results seems correct.
@rfezzani @stefanv but the failure might be architecture or compiler specific, hence why I wanted to be cautious about closing it.