Splitting a linestring A with an intersecting linestring B does not produce the expected output (returns A). Splitting A with a multilinestring containing B and C (C intersecting A as well) produces the desired output (3 segments). This seems inconsistent.
from shapely.ops import split
from shapely.geometry import LineString, MultiLineString
A = LineString([(0, 0), (10, 0)])
B = LineString([(5, 0), (5, 5)])
C = LineString([(1, -1), (1, 1)])
assert (split(A, B).wkt == "GEOMETRYCOLLECTION (LINESTRING (0 0, 10 0))")
# Does not split
# Expected: "GEOMETRYCOLLECTION (LINESTRING (0 0, 5 0), LINESTRING (5 0, 10 0))"
assert (A.intersection(B).wkt == "POINT (5 0)")
# Although A and B do intersect
assert (split(A, MultiLineString([B, C])).wkt ==
"GEOMETRYCOLLECTION (LINESTRING (0 0, 1 0), LINESTRING (1 0, 5 0), LINESTRING (5 0, 10 0))")
# OK - A is split by both B and C - but A was not split by B in the previous example.
Ubuntu 18.04.2 LTS
Python 3.6.7 - Shapely 1.6.4.post2 (installed from PyPI using pip)
Ok. I see the problem here.
Visually, those linestrings look like this:
When you split the horizontal (A) by the large vertical on top (B) which only touches A, nothing happens. But splitting A by C does work because C crosses A.
Now, when you split A by MultiLineString([B, C])
, it splits into three linestrings, not two as you would expect. I agree that this seems inconsistent.
The code causing this is actually pretty accessible (permalink).
@staticmethod
def _split_line_with_line(line, splitter):
"""Split a LineString with another (Multi)LineString or (Multi)Polygon"""
# if splitter is a polygon, pick it's boundary
if splitter.type in ('Polygon', 'MultiPolygon'):
splitter = splitter.boundary
assert(isinstance(line, LineString))
assert(isinstance(splitter, LineString) or isinstance(splitter, MultiLineString))
if splitter.crosses(line):
# The lines cross --> return multilinestring from the split
return line.difference(splitter)
elif splitter.relate_pattern(line, '1********'):
# The lines overlap at some segment (linear intersection of interiors)
raise ValueError('Input geometry segment overlaps with the splitter.')
else:
# The lines do not cross --> return collection with identity line
return [line]
You can see that the splitter
is required to cross line
for any splitting to commence. But once that checkpoint is passed, a simple call to difference
is made to generate the split geometries. But difference
does not discriminate between crossing and merely touching! So if at any point splitter
crosses line
, then all merely touching points will be split on.
@Jeremiah-England thanks for digging into this! It seems like, for lines at least, we could change the crosses test to a touches test and get less surprising results? What do you think about that?
I think that would work fine, yes. And the behavior makes sense to me.
For a bit, I thought we could filter the goems in a MultiLineString splitter
for those crossing line
and take the difference of only those. But that would not solve the case where a single LineString splitter
hooks back to touch the line
it crosses at another point. I think the only way to get consistent behavior here while using difference
for the heavy lifting is to split touching linear geometries.
I suspect that maybe crosses
was used so the function returned early when only the boundary of line
touches the splitter
. In fact this case is tested for. But our current case, where the boundary of splitter
touches line
, is not in the tests, and may have been a blind spot. However, I'm a Shapely native without much experience with other simple feature libraries. And I don't know whether only splitting when crossing is standard behavior for linear geometries.
It was 5 years ago, but if @georgeouzou remembers maybe he could confirm or explain why crosses
was used?
We can have our cake and eat it too if we want to return early in the case tested for, but carry on with splitting in this issue's case. That would require using relate
directly. But I doubt that there would be a large performance difference either way.
@Jeremiah-England my memory is a little cloudy on this one, but seeing the code again i can confirm that this was a blind spot. The code should also test for the current case to be complete. Could you continue on with the changes?
Hey @georgeouzou, thanks for taking the time to look at that!
And, yes, if it's OK with everyone I would like to hammer out this case and submit a PR at some point.
Most helpful comment
I think that would work fine, yes. And the behavior makes sense to me.
For a bit, I thought we could filter the goems in a MultiLineString
splitter
for those crossingline
and take the difference of only those. But that would not solve the case where a single LineStringsplitter
hooks back to touch theline
it crosses at another point. I think the only way to get consistent behavior here while usingdifference
for the heavy lifting is to split touching linear geometries.I suspect that maybe
crosses
was used so the function returned early when only the boundary ofline
touches thesplitter
. In fact this case is tested for. But our current case, where the boundary ofsplitter
touchesline
, is not in the tests, and may have been a blind spot. However, I'm a Shapely native without much experience with other simple feature libraries. And I don't know whether only splitting when crossing is standard behavior for linear geometries.It was 5 years ago, but if @georgeouzou remembers maybe he could confirm or explain why
crosses
was used?We can have our cake and eat it too if we want to return early in the case tested for, but carry on with splitting in this issue's case. That would require using
relate
directly. But I doubt that there would be a large performance difference either way.