Hi,
I'm working on a project using Shapely (currently 1.6.4.post2), and I tried to update my version of Shapely. However, I noticed a huge performance decrease using the most recent release.
After a quick investigation, I noticed this problem appears for all version of Shapely >= 1.7a2.
Example :
from shapely.geometry import Polygon
import shapely
import time
p1 = Polygon([[0,0],[5,0],[5,5],[0,5]])
p2 = Polygon([[0,0],[5,0],[5,5],[0,5]])
t1 = time.time()
for i in range(10000):
p1.difference(p2)
print('Execution time :',time.time()-t1,'seconds')
This small piece of code runs in ~0.6 second on my computer using older versions (<=1.7a1).
Using 1.7a2 and newer versions of Shapely, this same code runs in 2.6 seconds.
Fedora 30
Versions >= 1.6.4.post2 installed with pip.
I'm using python 3.7.5.
This might be related to #834
@ELIONET thank you for the report and the script. That method of timing isn't reliable. I recommend the timeit module. Below I'm using it in two different Python environments on my macbook.
$ python -m timeit -s "from shapely.geometry import Polygon" -s "p1 = Polygon([[0,0],[5,0],[5,5],[0,5]])" -s"p2 = Polygon([[0,0],[5,0],[5,5],[0,5]])" "p1.difference(p2)"
10000 loops, best of 3: 106 usec per loop
$ pip list | grep Shapely
Shapely 1.6.4.post2
$ python -m timeit -n 10000 -r 3 -s "from shapely.geometry import Polygon" -s "p1 = Polygon([[0,0],[5,0],[5,5],[0,5]])" -s"p2 = Polygon([[0,0],[5,0],[5,5],[0,5]])" "p1.difference(p2)"
10000 loops, best of 3: 157 usec per loop
$ pip list | grep Shapely
Shapely 1.7.0
I see a 50% slowdown. The wheels for 1.6.4.post2 include GEOS 3.6.2. The wheels for 1.7.0 include GEOS 3.8.0. It is possible that the newer GEOS is slower or I am building it in an unoptimized way, and I will check into both of these possibilities.
In #834 I see signs that this is caused by GEOS 3.8.0, not by anything in Shapely.
For what it's worth, I've done the same timeit tests on a Windows computer with Anaconda, and here's a summary of my data:
conda create --no-default-packages -n geos-env python
conda activate geos-env
conda config --env --add channels conda-forge
conda install --yes shapely==1.6.4 geos==3.7.2
# 2000 loops, best of 5: 83.3 usec per loop
conda install --yes shapely==1.6.4 geos==3.8.0
# 5000 loops, best of 5: 62.7 usec per loop
conda install --yes shapely==1.7.0 geos==3.8.0
# 5000 loops, best of 5: 48.6 usec per loop
so I'm only seeing good news with newer versions. But this all depends on how the conda-forge folks have prepared the packages.
Thanks @mwtoews. How curious.
I also see a speed up in my environments (conda forge on linux):
Shapely 1.6.4, GEOS 3.7.2:
In [3]: %timeit p1.difference(p2)
42.1 碌s 卤 366 ns per loop (mean 卤 std. dev. of 7 runs, 10000 loops each)
Shapely 1.7.0, GEOS 3.8.0
In [3]: %timeit p1.difference(p2)
28.3 碌s 卤 322 ns per loop (mean 卤 std. dev. of 7 runs, 10000 loops each)
Haven't seen other reports of poor performance in GEOS 3.8, so a bit puzzled by this. GEOS CMake even defaults to a Release (optimized) build now, so I'm not sure how you could end up with an un-optimized build.
That said, the optimization focus is usually towards improved performance of expensive operations. Possible that newer versions are slower when taking the difference of two squares.
I'm going to test whether the lack of these lines https://github.com/rasterio/rasterio-wheels/blob/master/config.sh#L6-L7 in the shapely-wheels builds is involved. Will report here soon.
@dbaston could you have a look at the example in https://github.com/Toblerity/Shapely/issues/834#issuecomment-589570875 ?
Because that is one that I actually could reproduce with conda builds (in contrast to the example above), and it's a buffer of a complex polygon that is much slower in GEOS 3.8
I've tested some new wheels that incorporate these changes to the GEOS build https://github.com/shapely/shapely-wheels/commit/df1f48c8eaf677d7a5af5a47a414e1d71bb44eec and see this:
$ python -m timeit -n 10000 -r 3 -s "from shapely.geometry import Polygon" -s "p1 = Polygon([[0,0],[5,0],[5,5],[0,5]])" -s"p2 = Polygon([[0,0],[5,0],[5,5],[0,5]])" "p1.difference(p2)"
10000 loops, best of 3: 61.8 usec per loop
I'm going to close this issue. There's no performance regression in Shapely that I can see. If there is one in GEOS related to simple polygon differences, it only seems to occur in unoptimized builds.
When we publish wheels for shapely 1.7.1, users can expect some performance boosts.
To repeat again: it's #834 that has an actual slowdown example (but that issue was closed in favor of this one)
@jorisvandenbossche that seems worth looking into on the GEOS side, I just don't have time to do so at the moment.
I suppose it is then worth opening an issue on the GEOS side (can I open an issue on github, or is trac preferred?)
Sure, either is fine
@jorisvandenbossche Actually went ahead and tried this in GEOS directly. Not seeing any slowdown in the buffer...57 usec in master and 70 usec in 3.7.
@dbaston thanks, that's interesting. I tried with another python binding (pygeos), and also there seeing the slowdown (with the only difference being the GEOS version).
The main difference with your example is that Shapely/PyGEOS are using GEOSBufferWithParams/GEOSBufferWithStyle, so it might depend on one of those defaults.
Maybe conda builds are also unoptimized. See https://github.com/conda-forge/geos-feedstock/pull/43
Indeed, I built GEOS locally, and then my equivalent PyGEOS example is also much faster:
import numpy as np
import pygeos
t = np.arange(1, 10000)
arr = pygeos.points(100000*np.sin(t), 100000*np.sin(4*t))
arr2 = pygeos.buffer(arr, radius=200, quadsegs=32)
union = pygeos.union_all(arr2)
%timeit pygeos.buffer(union, radius=1)
3+ seconds with GEOS 3.8 from conda-forge, ~60ms with GEOS master build from source (using the release build type)
this might be interesting: http://blog.light42.com/wordpress/?p=3634
Most helpful comment
I've tested some new wheels that incorporate these changes to the GEOS build https://github.com/shapely/shapely-wheels/commit/df1f48c8eaf677d7a5af5a47a414e1d71bb44eec and see this:
I'm going to close this issue. There's no performance regression in Shapely that I can see. If there is one in GEOS related to simple polygon differences, it only seems to occur in unoptimized builds.
When we publish wheels for shapely 1.7.1, users can expect some performance boosts.