Like Python, Julia's zip(a, b)
ignores the tail end of its input sequences when they are of unequal length (but see #17928). I've yet to encounter a problem where I wanted that behavior, and I would much prefer seeing an error when the inputs are of unequal length. Are there applications where truncation is very convenient?
I believe that truncation is generally useful for combining infinite iterators with finite ones. I do agree that if two finite iterators have different lengths, an error is better.
Duplicate of #17928?
I would close that one in favor of this since that was a specific confusing case, whereas this is explicitly a decision issue about whether the underlying general behavior is a good idea or not.
A similar issue arises from Generator(f, iters...)
, which on discourse was noted to cause silent truncation for map(+, (1,2), 3)
.
My gut feeling is that if all the iterators have lengths (iteratorsize(itr)::Union{HasLength,HasShape}
), then a length mismatch should be an exception in both zip
and Generator
.
I think that we should be even stricter and only allow truncation of infinite streams. Or does !haslength mean that a stream is infinite? I would interpret it as having unknown length, but I may be misunderstanding the trait.
No, you're right, !haslength just means the length is unknown. But what if you want to do something like zip the line iterators of two files, deciding whether to stop as you go? That should be allowed.
Yes, fair enough, although it might be better to opt into that sort of thing somehow.
I feel like providing both zip
and truncating_zip
is a cleaner, simpler solution than special-casing zip
's behaviour.
It might be nice to provide a keyword argument to zip
that dictates whether to truncate to a common length. For example, zip(1:10, 1:3, trunc=true)
would iterate over (1,1), (2,2), (3,3)
and zip(1:10, 1:3, trunc=false)
would throw an error because the inputs are not the same length.
Since we allow isbits values for type parameters, perhaps we could even parameterize Zip
based on whether it should truncate.
Anyway, just an idea.
I would spell that keyword out fully, otherwise it kind of looks like you want to apply the trunc
function to every value (which makes no sense).=, i.e. zip(1:10, 1:3, truncate=true)
. Some gut feeling tells me that it would be better to have a function which does the truncation for you, but then that function ends up just being a truncating zip
, at which point having a keyword seems better.
I think we should fix this for 0.7. See #25583 for a problematic illustration.
I believe this is only a problem with reverse
. If reverse
is the problematic case, it should throw the error instead of punishing all users of zip
(such as myself).
My feeling is that both zip
and reverse
should have checks.
zip
should throw an error for zipping HasLength
iterators of unequal lengths, but should allow infinite or unknown lengths to truncate.
reverse
should also throw an error for zip
of infinite or unknown lengths.
I continue to find it hard to see why truncating zip is a problem. Maybe part of the reason is that iterators are inherently lazy; by design you can take as much or as little from an iterator as you want. So it seems weird for it to be an error not to consume all of an iterator.
This thread is very short on real, compelling arguments. I see an "I would much prefer" and a "gut feeling", and not much more than that. So I remain opposed to this change.
The rationale is that if zip
is nearly always used on same-length arguments, then throwing an error on unequal-length arguments might catch several logic errors.
hcat([1,2], [3,4,5])
throws a DimensionError
, which is IMHO much nicer than silent truncation would be.
In my view, that's because hcat
explicitly operates on the shapes of arguments, so they have to match in a certain way. But iterators just pull a sequence of values.
Triage: we're already getting late in the game here and there are significant design and feasibility questions about how to even make this require same length without breaking things. Python and Lisp have the same truncating behavior as we have now, so this seems like it can't be that bad.
If it was decided to truncate zip, then why in some cases it gives error?
julia> collect(zip(1:3, 2:5))
ERROR: DimensionMismatch("dimensions must match")
julia> [i for i in zip(1:3, 2:5)]
ERROR: DimensionMismatch("dimensions must match")
julia> for i in zip(1:3, 2:5)
@show i
end
i = (1, 2)
i = (2, 3)
i = (3, 4)
julia> length(zip(1:3, 2:5))
3
Same issue was raised here #17928, but was closed as duplicate, which it isn't.
Most helpful comment
I believe that truncation is generally useful for combining infinite iterators with finite ones. I do agree that if two finite iterators have different lengths, an error is better.