Splitting a linestring A with an intersecting linestring B does not produce the expected output (returns A). Splitting A with a multilinestring containing B and C (C intersecting A as well) produces the desired output (3 segments). This seems inconsistent.
from shapely.ops import split
from shapely.geometry import LineString, MultiLineString
A = LineString([(0, 0), (10, 0)])
B = LineString([(5, 0), (5, 5)])
C = LineString([(1, -1), (1, 1)])
assert (split(A, B).wkt == "GEOMETRYCOLLECTION (LINESTRING (0 0, 10 0))")
# Does not split
# Expected: "GEOMETRYCOLLECTION (LINESTRING (0 0, 5 0), LINESTRING (5 0, 10 0))"
assert (A.intersection(B).wkt == "POINT (5 0)")
# Although A and B do intersect
assert (split(A, MultiLineString([B, C])).wkt ==
"GEOMETRYCOLLECTION (LINESTRING (0 0, 1 0), LINESTRING (1 0, 5 0), LINESTRING (5 0, 10 0))")
# OK - A is split by both B and C - but A was not split by B in the previous example.
Ubuntu 18.04.2 LTS
Python 3.6.7 - Shapely 1.6.4.post2 (installed from PyPI using pip)
Ok. I see the problem here.
Visually, those linestrings look like this:

When you split the horizontal (A) by the large vertical on top (B) which only touches A, nothing happens. But splitting A by C does work because C crosses A.
Now, when you split A by MultiLineString([B, C]), it splits into three linestrings, not two as you would expect. I agree that this seems inconsistent.
The code causing this is actually pretty accessible (permalink).
@staticmethod
def _split_line_with_line(line, splitter):
"""Split a LineString with another (Multi)LineString or (Multi)Polygon"""
# if splitter is a polygon, pick it's boundary
if splitter.type in ('Polygon', 'MultiPolygon'):
splitter = splitter.boundary
assert(isinstance(line, LineString))
assert(isinstance(splitter, LineString) or isinstance(splitter, MultiLineString))
if splitter.crosses(line):
# The lines cross --> return multilinestring from the split
return line.difference(splitter)
elif splitter.relate_pattern(line, '1********'):
# The lines overlap at some segment (linear intersection of interiors)
raise ValueError('Input geometry segment overlaps with the splitter.')
else:
# The lines do not cross --> return collection with identity line
return [line]
You can see that the splitter is required to cross line for any splitting to commence. But once that checkpoint is passed, a simple call to difference is made to generate the split geometries. But difference does not discriminate between crossing and merely touching! So if at any point splitter crosses line, then all merely touching points will be split on.
@Jeremiah-England thanks for digging into this! It seems like, for lines at least, we could change the crosses test to a touches test and get less surprising results? What do you think about that?
I think that would work fine, yes. And the behavior makes sense to me.
For a bit, I thought we could filter the goems in a MultiLineString splitter for those crossing line and take the difference of only those. But that would not solve the case where a single LineString splitter hooks back to touch the line it crosses at another point. I think the only way to get consistent behavior here while using difference for the heavy lifting is to split touching linear geometries.
I suspect that maybe crosses was used so the function returned early when only the boundary of line touches the splitter. In fact this case is tested for. But our current case, where the boundary of splitter touches line, is not in the tests, and may have been a blind spot. However, I'm a Shapely native without much experience with other simple feature libraries. And I don't know whether only splitting when crossing is standard behavior for linear geometries.
It was 5 years ago, but if @georgeouzou remembers maybe he could confirm or explain why crosses was used?
We can have our cake and eat it too if we want to return early in the case tested for, but carry on with splitting in this issue's case. That would require using relate directly. But I doubt that there would be a large performance difference either way.
@Jeremiah-England my memory is a little cloudy on this one, but seeing the code again i can confirm that this was a blind spot. The code should also test for the current case to be complete. Could you continue on with the changes?
Hey @georgeouzou, thanks for taking the time to look at that!
And, yes, if it's OK with everyone I would like to hammer out this case and submit a PR at some point.
Most helpful comment
I think that would work fine, yes. And the behavior makes sense to me.
For a bit, I thought we could filter the goems in a MultiLineString
splitterfor those crossinglineand take the difference of only those. But that would not solve the case where a single LineStringsplitterhooks back to touch thelineit crosses at another point. I think the only way to get consistent behavior here while usingdifferencefor the heavy lifting is to split touching linear geometries.I suspect that maybe
crosseswas used so the function returned early when only the boundary oflinetouches thesplitter. In fact this case is tested for. But our current case, where the boundary ofsplittertouchesline, is not in the tests, and may have been a blind spot. However, I'm a Shapely native without much experience with other simple feature libraries. And I don't know whether only splitting when crossing is standard behavior for linear geometries.It was 5 years ago, but if @georgeouzou remembers maybe he could confirm or explain why
crosseswas used?We can have our cake and eat it too if we want to return early in the case tested for, but carry on with splitting in this issue's case. That would require using
relatedirectly. But I doubt that there would be a large performance difference either way.