Julia: Do not truncate zip inputs

Created on 7 Feb 2017 · 18Comments · Source: JuliaLang/julia

Like Python, Julia's zip(a, b) ignores the tail end of its input sequences when they are of unequal length (but see #17928). I've yet to encounter a problem where I wanted that behavior, and I would much prefer seeing an error when the inputs are of unequal length. Are there applications where truncation is very convenient?

collections decision

Source

cstjean

👎2

Most helpful comment

I believe that truncation is generally useful for combining infinite iterators with finite ones. I do agree that if two finite iterators have different lengths, an error is better.

StefanKarpinski on 7 Feb 2017

👍4

All 18 comments

I believe that truncation is generally useful for combining infinite iterators with finite ones. I do agree that if two finite iterators have different lengths, an error is better.

StefanKarpinski on 7 Feb 2017

👍4

Duplicate of #17928?

JeffBezanson on 7 Feb 2017

I would close that one in favor of this since that was a specific confusing case, whereas this is explicitly a decision issue about whether the underlying general behavior is a good idea or not.

StefanKarpinski on 8 Feb 2017

A similar issue arises from Generator(f, iters...), which on discourse was noted to cause silent truncation for map(+, (1,2), 3).

My gut feeling is that if all the iterators have lengths (iteratorsize(itr)::Union{HasLength,HasShape}), then a length mismatch should be an exception in both zip and Generator.

stevengj on 11 Apr 2017

👍2

I think that we should be even stricter and only allow truncation of infinite streams. Or does !haslength mean that a stream is infinite? I would interpret it as having unknown length, but I may be misunderstanding the trait.

StefanKarpinski on 13 Apr 2017

No, you're right, !haslength just means the length is unknown. But what if you want to do something like zip the line iterators of two files, deciding whether to stop as you go? That should be allowed.

JeffBezanson on 13 Apr 2017

Yes, fair enough, although it might be better to opt into that sort of thing somehow.

StefanKarpinski on 13 Apr 2017

I feel like providing both zip and truncating_zip is a cleaner, simpler solution than special-casing zip's behaviour.

cstjean on 15 Apr 2017

It might be nice to provide a keyword argument to zip that dictates whether to truncate to a common length. For example, zip(1:10, 1:3, trunc=true) would iterate over (1,1), (2,2), (3,3) and zip(1:10, 1:3, trunc=false) would throw an error because the inputs are not the same length.

Since we allow isbits values for type parameters, perhaps we could even parameterize Zip based on whether it should truncate.

Anyway, just an idea.

ararslan on 13 Jul 2017

👍2

I would spell that keyword out fully, otherwise it kind of looks like you want to apply the trunc function to every value (which makes no sense).=, i.e. zip(1:10, 1:3, truncate=true). Some gut feeling tells me that it would be better to have a function which does the truncation for you, but then that function ends up just being a truncating zip, at which point having a keyword seems better.

StefanKarpinski on 18 Jul 2017

I think we should fix this for 0.7. See #25583 for a problematic illustration.

nalimilan on 16 Jan 2018

I believe this is only a problem with reverse. If reverse is the problematic case, it should throw the error instead of punishing all users of zip (such as myself).

JeffBezanson on 17 Jan 2018

My feeling is that both zip and reverse should have checks.

zip should throw an error for zipping HasLength iterators of unequal lengths, but should allow infinite or unknown lengths to truncate.

reverse should also throw an error for zip of infinite or unknown lengths.

stevengj on 17 Jan 2018

I continue to find it hard to see why truncating zip is a problem. Maybe part of the reason is that iterators are inherently lazy; by design you can take as much or as little from an iterator as you want. So it seems weird for it to be an error not to consume all of an iterator.

This thread is very short on real, compelling arguments. I see an "I would much prefer" and a "gut feeling", and not much more than that. So I remain opposed to this change.

JeffBezanson on 17 Jan 2018

👍2

The rationale is that if zip is nearly always used on same-length arguments, then throwing an error on unequal-length arguments might catch several logic errors.

hcat([1,2], [3,4,5]) throws a DimensionError, which is IMHO much nicer than silent truncation would be.

cstjean on 18 Jan 2018

In my view, that's because hcat explicitly operates on the shapes of arguments, so they have to match in a certain way. But iterators just pull a sequence of values.

JeffBezanson on 18 Jan 2018

👍2

Triage: we're already getting late in the game here and there are significant design and feasibility questions about how to even make this require same length without breaking things. Python and Lisp have the same truncating behavior as we have now, so this seems like it can't be that bad.

StefanKarpinski on 18 Jan 2018

👍1

If it was decided to truncate zip, then why in some cases it gives error?

julia> collect(zip(1:3, 2:5))
ERROR: DimensionMismatch("dimensions must match")

julia> [i for i in zip(1:3, 2:5)]
ERROR: DimensionMismatch("dimensions must match")

julia> for i in zip(1:3, 2:5)
           @show i
       end
i = (1, 2)
i = (2, 3)
i = (3, 4)

julia> length(zip(1:3, 2:5))
3

Same issue was raised here #17928, but was closed as duplicate, which it isn't.