Following this thread on the mailing list, I initiate this issue to enhance the peak detection in scikit image.
Currently, we have two functions:
In particular, peak_local_max (and derivatives) is used in scikit-image at different places.
Our functions seem to be sub-optimal and lead to questionable peak detection, especially for plateaus as pointed out on the mailing list. It is also questioned to keep two separate functions.
Recently, @lagru committed to scipy https://github.com/scipy/scipy/pull/8264 a new function to detect peaks based on their prominence. This function is particularly easy to use and efficient for 1D data. Then, it would be worth to consider whether it can be generalized to 2D data. @lagru, would you like to share your expertise on this and eventually contribute to scikit-image if you wish and if it makes sense to you?
Thanks.
Hello.
First, I think that it is perfectly okay to have several peak detection algorithms, because you can have different definitions.
The morphological local_maxima definition will detect local maxima (and the h-local maxima) irrespective of their proximity to other local maxima, just based on the actual grey levels. This may or may not be what is wanted in a certain circumstance. Nevertheless, it is normally the preferable solution when dealing with other morphological operators, like the watershed transformation.
Hi @ThomasWalter, good to see you again! =)
I fully agree that multiple algorithms are totally ok, just like we have multiple segmentation or thresholding algorithms. However, we have more than once found that whatever algorithm peak_local_maxima implements, it's doesn't seem very good. If you follow @sciunto's link to the mailing list, you'll see an example I came up with that gave very surprising results, to say the least.
In short, I vote for multiple functional algorithms. =P
I'd like peak detection algorithms to at least do what is intuitively expected. And there really aren't that many solutions that pass that test. Flat peaks? Oh, ok, it highlights all pixels. Or: Flat peaks? Oh, ok, it highlights the center of the peak.
Both of those are valid options, and the documentation should specify what each algorithm delivers. No included algorithm should ever return a value that is not in a maximum or maximum plateau though, imho.
Other than that, there was the issue with computational efficiency. The local_maxima relies solely on the reconstruction algorithm. It seems that there are some developers who think that the current version is too slow, and they seem to have a better solution. As far as I understand, the actual strategy of maxima detection is not touched by this, and it is rather an improvement of the reconstruction algorithm.
Recently, @lagru committed to scipy scipy/scipy#8264 a new function to detect peaks based on their prominence. This function is particularly easy to use and efficient for 1D data. Then, it would be worth to consider whether it can be generalized to 2D data. @lagru, would you like to share your expertise on this and eventually contribute to scikit-image if you wish and if it makes sense to you? -- @sciunto
First of all, I'm always happy to help and learn something new in the process. I already thought about whether the function in question _argmaxima1d could be generalized to the 2D case. I haven't tried to implement it yet but I think it might be possible by using it in a two step algorithm:
This should't be too hard (I think?) because it's at actually a very straightforward approach. This could work for flat maxima as well and return all positions in the plateau.
I'll think on it and play around with it. However I haven't looked at the solutions already present in skimage and whether this would have any advantages over the existing approaches. I'll report back if I find anything useful to add.
Evaluating the prominence of maxima is an entirely different beast in this context and unfortunately I don't think my approach can be generalized to the 2D case in an easy way.
Furthermore it would be useful to agree on a definition and terminology on how a peak prominence is measured before moving on. What I used for the 1D case is actually derived from the 2D case: topographic prominence. That basically requires an algorithm that calculates the prominence by finding the lowest contour line for each peak in question. The 1D algorithm might be useful as a step in that but I currently have no idea how to build on that for the 2D case.
Flat peaks? Oh, ok, it highlights all pixels. Or: Flat peaks? Oh, ok, it highlights the center of the peak. -- @stefanv
Both of those are valid options, and the documentation should specify what each algorithm delivers. No included algorithm should ever return a value that is not in a maximum or maximum plateau though, imho. -- @jni
For the 1D case we decided to approximate the center of the flat peak and return that. But in the future one could provide a keyword to control that behavior.
However I don't think this maps well to the 2D space. A solution might be to return all samples in a flat peak as the default case and extend on that by providing several ways / functions that can approximate a center.
That's an interesting read @ThomasWalter. I'll try to chime in with my perspective coming from the simpler 1D space.
The only uncertainty in this definition is thus what happens with an entirely constant image (I would just say, that in this case, there is no maximum).
This can be solved if we use read the definition for peaks like this: A peak must be surrounded on all sides by smaller values. For a flat image this is obviously never the case. That's how find_peaks handles the 1D case.
Some people also want to use distance to other local maxima. This is of course also an ambiguous criterion, as if two or more local maxima are too close to each other, it is unclear which to remove.
I solved that ambiguity partly by assigning a priority to each peak. Currently this priority is based on the height of each peak meaning smaller peaks are removed until all remaining peaks fulfill the distance criterion. However if the priority value is the same for two peaks the ambiguity resurfaces.
The way I implemented it it's possible to use other peak properties to define the priority which I plan to add in the future.
On a final note, I think a simple peak finding function for the 2D case might actually be more useful as an addition to SciPy instead of scikit-image. But that's not really an important decision.
Hello.
Thanks for your comments.
I haven't tried to implement it yet but I think it might be possible by using it in a two step algorithm:
- Consider only the 1D case use the existing algorithm to find all local maxima in each row.
- Evaluate each maxima in the other direction. How this evaluation (diagonal / direct neighbors) might look would be something to figure out.
This should't be too hard (I think?) because it's at actually a very straightforward approach.
I think that passing from 1D to 2D is not so straightforward. Points belonging to a local maximum in 1D do not necessarily belong to a local maximum in 2D. You could argue that once you have local maxima in 1D, they will lead you to a local maximum in 2D, if you follow the signal upstream. Whether this leads to an efficient algorithm, I do not know. For instance, the following matrix has only one local maximum:
A = np.zeros((8,8))
A[np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6]), np.array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6])] = np.arange(0, 13)
This can be solved if we use read the definition for peaks like this: A peak must be surrounded on all sides by smaller values. For a flat image this is obviously never the case. That's how find_peaks handles the 1D case.
That means, we would disregard any local maximum that touches the border. For instance, if you consider:
A = np.zeros((8,8))
A[2,7] = 10
A[np.array([1, 1, 2, 3, 3]), np.array([7, 6, 6, 7, 6])] = 5
the matrix A would not have any local maximum.
That means, we would disregard any local maximum that touches the border. For instance, if you consider: [...] the matrix A would not have any local maximum.
Correct. It depends on what definition is used for a peak: "must be surrounded by lower values" vs "must not border a higher value".
@lagru I would argue that both those conditions are essential in defining a "peak".
Sorry for the closing, wrong button!
Follow-up here: #3022
Please take a look if you're interested.
Now that #3022 is merged I'd like to continue this conversation. In my opinion there is still some functionality missing. It would be nice if users could filter detected peaks with conditions like in scipy.signal.find_peaks. Possible conditions are:
local_maxima (I already have a working draft). However I'd like that function to focus on maxima detection only and not on peak measurement / filtering. Especially when we introduce other conditions there should be a consistent way to evaluate them independently of local_maxima's implementation.In my opinion filtering maxima by there distance and prominence are really desirable features. Perhaps there is a different way (as an alternative to prominence) to measure how much a peak stands out compared to its surroundings...
Hi @lagru!
Awesome that you're already thinking about this!
I agree about the two most important features: prominence and distance. If we discount maximum efficiency, both can be done reasonably efficiently, I think. Here are my proposals for initial algorithms:
prominence: invert the image; perform watershed with the local maxima as seeds; find the highest (originally lowest) point in the border of the watershed basin corresponding to that peak; the difference in height between the seed and the border pixel is the prominence.
distance between maxima: initialise empty "accepted" set of maxima; sort the maxima by decreasing prominence / height (this choice should be a parameter to be set, I think); traverse this list: when a maximum is considered, check the distance to any accepted maxima; if any are too close, reject this maximum; otherwise, accept it. Continue until all maxima have been considered.
The second algorithm is complicated by plateaus, but I would be ok with ignoring this complication for now. Ideally you would compute the distance as the shortest distance between any two points in the plateaus being considered. This can be done naively in n^2 complexity, which might be ok initially — plateaus are quite pathological. I think there was an algorithm for O(n) computation of Hausdorff distance, which "smells" similar to this problem =P, so probably there is interesting stuff to be done. But again, I wouldn't worry about it initially: let's get the simple stuff moving and iterate. That worked really well last time! =) e.g. for a start, you could measure the distance from an arbitrary point in the plateau to another.
@jni Those are really exciting ideas!
I think I'll start working on the prominence measurement as soon as I find the time. I'll raise a PR when I have something worth showing.
Hello,
I agree that the two easy criteria are value and plateau size, the latter being probably less useful.
Then there is the distance. The problem is that if you want to suppress maxima that are too close to each other, it is not clear which of the maxima is to be suppressed. The idea of @jni is therefore to use prominence to choose the minima which are to be suppressed.
Which leads to the prominence. What you propose is close (albeit not identical) to morphological dynamics I think. Morphological dynamics work as follows: we look at all paths that connect a maximum to any higher maximum in the image. Along this path, we will need to decrease by some value. Morphological dynamics are the minimum of this decrease (minimised over all possible paths). That sounds laborious, but it can be efficiently calculated by making a watershed like operation or by max-trees.
What you suggested, @jni, is very close to this except for the fact that it is symmetric: if a maximum M1 is connected by path to M2 and there is only little decrease on the path, both maxima will have a low prominence. That sounds attractive at first sight, but when you want to decide which to take, you have the same problem as for spatial distance. Importantly, while I like this notion of prominence, it is clearly dependent on the other maxima, i.e. you might find that both maxima M1 and M2 have low prominence according to your definition, but when you remove one of them, you will find that the remaining one has high prominence.
Other than that, max-trees and morphological dynamics work in nD.
@ThomasWalter I'm not sure whether I get the full meaning of your post but I'm aware of some of the problems from my work on the 1D case in SciPy. Maybe we should first find a definition of how to measure the prominence in n dimensions.
E.g. a defnition might be: The prominence is the difference between the height of a maximum M1 and the height of its contour line. Given all lines that encircle a maximum M1 and don't contain a maximum with a larger height than M1, the line with the lowest height is the contour line of M1. For the 3d case the contour line becomes a contour surface, for the 4d case a contour volume, ... (See also Wikipedia)
I think the watershed algorithm (as @jni suggests) can be used to find these contour lines.
if a maximum M1 is connected by path to M2 and there is only little decrease on the path, both maxima will have a low prominence.
Using the definition above this is not the case. Assuming that one maximum M1 is larger than the other M2, then the contour line of M1 encircles M2. M2's contour line musn't encirle M1 so both will have a different prominence. However if they have the same height, their contour lines are identical and encircle both maxima.
The problem is that if you want to suppress maxima that are too close to each other, it is not clear which of the maxima is to be suppressed.
I think we should let the user decide in which order (height, prominence, ...) maxima are removed when satisfying the distance condition. Both approaches have different drawbacks, advantages and applications. So I think it would be useful to make the basic algorithmic building blocks accessible. E.g. the API could look like this (function names are just suggestions):
def local_maxima(...):
...
def prominence(image, extrema, ...):
"""
Calculate prominence of each extrema.
"""
...
def select_by_distance(extrema, priority, min_distance, ...):
"""
Remove extrema in order of priority (can be height, prominence, ...) until
minimal distance is satisfied for all extrema.
"""
...
def select_maxima(image, height, prominence, min_distance, ...):
"""
Select maxima in an image that satisfy given conditions.
(Wraps / combines above functions for most common use cases not unlike
scipy.signal.find_peaks)
"""
...
I sent my post a bit too rapidly, there is an error. Of course what @jni proposed is not symmetric, i.e. the statement:
What you suggested, @jni, is very close to this except for the fact that it is symmetric: if a maximum M1 is connected by path to M2 and there is only little decrease on the path, both maxima will have a low prominence. That sounds attractive at first sight, but when you want to decide which to take, you have the same problem as for spatial distance.
is wrong.
@lagru thanks for the link. I think this is indeed very close to morphological dynamics (I do not have the time now to check whether it is truly identical though)
The idea is the following: you start the flooding on the inverted image (as @jni suggests). When two lakes meet (at the lowest point between them, i.e. the "col") you compare the depth of the lakes. To the higher-valued minimum, you assign the difference between the current point (meeting point) and the value of the minimum. Then the lakes are fused. You continue doing this until every minimum but one got a value assigned. The last one gets the maximal value.
@ThomasWalter I suspect "col" here is French, but it is confusing in the context of image processing in English, as it most frequently refers to "column". =) The term in English is "saddle".
Overall, a very good point regarding prominence, I had my definitions wrong, i.e. in "Given all lines that encircle a maximum M1 and don't contain a maximum with a larger height than M1," I was missing the emphasised part.
Are you saying that this is quick to solve with max-trees? Can you explain how max trees help? This is a good argument for getting #2680 finished ASAP! =D
I've put some thought into creating a function that can select extrema by their distance to each other. I have written up a quick demo of a naive algorithm that can handle the nd-case.
However this has revealed 3 problems:
skimage.measure.label doesn't support more than 3 dimensions. Maybe I could do this with a consecutive flood fill that fills each plateau with a unique value. Seems like porting label to nd would be a good idea...set-like container. I've done some googling but didn't find a satisfying solution. I think using Pythons set in Cython itself is our best bet. Although this prohibits us from releasing the GIL I think its the best solution compared to the other two I found / could think of:std::unordered_set which is included in Cython. This is a C++ feature and I'm not sure if this is wanted.@lagru just a brief comment because I have a massive backlog currently: since each plateau is by definition not touching, you could use scipy.ndimage.label, no?
@jni Good tip, seems like that particular problem is solved already.
just a brief comment because I have a massive backlog currently
No worries there. I know you guys are doing this in your free time. ;) Thank you very much!
Related: The function photutils.detection.find_peaks might have some overlap with what this issue is trying to achieve.
Most helpful comment
Other than that, there was the issue with computational efficiency. The local_maxima relies solely on the reconstruction algorithm. It seems that there are some developers who think that the current version is too slow, and they seem to have a better solution. As far as I understand, the actual strategy of maxima detection is not touched by this, and it is rather an improvement of the reconstruction algorithm.