We definitely need an unzip function in Base.

StefanKarpinski on 11 Nov 2015

👍13

Or even simpler

unzip(a) = zip(a...)

ivarne on 11 Nov 2015

No

yuyichao on 11 Nov 2015

Trying the zip(...) thing on any non-toy problem shows that it's not so good pretty fast, unfortunately.

StefanKarpinski on 11 Nov 2015

👍1

I just ran into this issue when I absentmindedly used the Python-esque zip(arr...) where length(arr)=6000. It was not pretty. A dedicated unzip would be nice. Maybe even throw a warning that you shouldn't splat anything nontrivial.

tlnagy on 6 May 2016

Maybe even throw a warning that you shouldn't splat anything nontrivial.

Like almost all other performance tips in julia, doing so is perfectly fine and sometimes wanted for various reasons. (OK, splatting 6000 elements is a bit crazy, but splatting unknown size can be useful....). The only thing that matters is to not put it in performance critical path.

It looks like this is not covered in the performance tip?

yuyichao on 6 May 2016

Like almost all other performance tips in julia, doing so is perfectly fine and sometimes wanted for various reasons. (OK, splatting 6000 elements is a bit crazy, but splatting unknown size can be useful....). The only thing that matters is to not put it in performance critical path.

Yeah, a warning is a bad idea.

It looks like this is not covered in the performance tip?

I think this should also be marked under differences from other languages. People coming from Python are probably used to the zip(*arr) syntax and zip(arr...) might be a tempting target.

tlnagy on 6 May 2016

You can dispatch only on arrays of tuples:

function unzip{T<:Tuple}(A::Array{T})
    res = map(x -> x[], T.parameters)
    res_len = length(res)
    for t in A
        for i in 1:res_len
            push!(res[i], t[i])
        end
    end
    res
end

hayd on 6 May 2016

Bump, we still need and unzip function.

StefanKarpinski on 7 Feb 2017

BUMP if anyone is bored...

StefanKarpinski on 28 Mar 2017


function unzip(zipped::Base.Iterators.Zip2{Array{T1,1}, Array{T2,1}}) where {T1,T2}
    n = length(zipped)
    vec1 = Vector{T1}(n)
    vec2 = Vector{T2}(n)
    i = 1
    for pair in enumerate(zipped)
        vec1[i], vec2[i] = pair[2]
        i = i+1
    end
    return vec1, vec2
end

JeffreySarnoff on 8 Aug 2017

maybe <= 1.5x is

function unzip(zipped::Base.Iterators.Zip2{Array{T1,1}, Array{T2,1}}) where {T1,T2}
    return map(first,zipped), map(last,zipped)
 end

JeffreySarnoff on 8 Aug 2017

unzip(zipped::Base.Iterators.Zip2) = (zipped.a, zipped.b)

But all those differ in how they treat length(a) != length(b). I think I'd favor the behavior of the first.

This trivial case apart, the desired semantics are not absolutely clear to me: Given an iterator itr, should unzip(itr) return an iterator unzipped, that generates iterators unzipped[1] to unzipped[N], where N is the length of the iterators generated by itr? What if the first 10000 values generated by itr are Tuples of length N==3, but then comes a Tuple of length 2? Throw when iterating at least one unzipped[n] up to that point? Or eagerly collect itr the moment unzip is called?

Think about

unzip(f(n) for n in Base.Iterators.countfrom(0))

for different fs to get an idea of my open questions.

martinholters on 30 Aug 2017

👍1

There seems to be multiple use cases for an unzip, with different requirements.

A common case is involves Arrays of Tuples, in which the size is known and array shape must be preserved. Some suggestions here do not obey this constraint.

I have come up with a solution for this specific case using Base.Cartesian. It's up to twice as fast as the alternative commonly suggested alternative for NTuples, and more than 10x for heterogeneous tuples.
See: https://github.com/spalato/Destruct.jl
Feel free to use and steal.

spalato on 19 Mar 2018

👍2

Can we get this for generalized iterators and not just arrays? Basically just a collect_many modeled after collect?

bramtayl on 15 Nov 2018

I could give it a stab? The trickiest bit (I think) will be getting an equivalent of default_eltype for zero length eltype unknown iterators

bramtayl on 15 Nov 2018

👍3

Go for it, @bramtayl!

StefanKarpinski on 15 Nov 2018

Tada, this seems to work. Great excuse not to finish my grading.

```julia
struct Unzip{Iterator}
iterator::Iterator
end

iterate(iterator::Unzip) = iterate(iterator.iterator)
iterate(iterator::Unzip, state) = iterate(iterator.iterator, state)

IteratorEltype(iterator::Unzip) = IteratorEltype(iterator.iterator)
eltype(iterator::Unzip) = eltype(iterator.iterator)

IteratorSize(iterator::Unzip) = IteratorSize(iterator.iterator)
axes(iterator::Unzip) = axes(iterator.iterator)
size(iterator::Unzip) = size(iterator.iterator)
length(iterator::Unzip) = length(iterator.iterator)

struct ModelArray{ElementType, NumberOfDimensions, Model, Rest <: Tuple} <:
AbstractArray{ElementType, NumberOfDimensions}
model::Model
rest::Rest
end

model_array(model, rest...) =
ModelArray{
Tuple{eltype(model), eltype.(rest)...},
ndims(model),
typeof(model),
typeof(rest)
}(model, rest)

axes(array::ModelArray) = axes(array.model)
size(array::ModelArray) = size(array.model)

IndexStyle(array::ModelArray) = IndexLinear()

@propagate_inbounds getindex(array::ModelArray, index::Int) = (
getindex(array.model, index),
getindex.(array.rest, index)...
)

@propagate_inbounds setindex!(array::ModelArray, value::Tuple, index::Int) = (
setindex!(array.model, value[1], index),
setindex!.(array.rest, tail(value), index)...
)

push!(array::ModelArray, value::Tuple) = (
push!(array.model, value[1]),
push!.(array.rest, tail(value))
)

TODO: use fieldtypes instead of value_field_types in a future with more constant propagation

value_field_types(a_type) =
ntuple(index -> Val(fieldtype(a_type, index)), fieldcount(a_type))

function similar(
array::Unzip,
::Type{ElementType},
dims::Dims
) where {ElementType, NumberOfDimensions}

inner_array(::Val{InnerElementType}) where InnerElementType =
    Array{InnerElementType}(undef, dims)
model_array(inner_array.(value_field_types(ElementType))...)

end

Unzip needs to replicate AbstractArray similar machinery

similar(a::Unzip, ::Type{T}) where {T} = similar(a, T, to_shape(axes(a)))
similar(a::Unzip{T}, dims::Tuple) where {T} = similar(a, T, to_shape(dims))
similar(a::Unzip{T}, dims::DimOrInd...) where {T} = similar(a, T, to_shape(dims))
similar(a::Unzip, ::Type{T}, dims::DimOrInd...) where {T} = similar(a, T, to_shape(dims))
similar(a::Unzip, ::Type{T}, dims::Tuple{Union{Integer, OneTo}, Vararg{Union{Integer, OneTo}}}) where {T} = similar(a, T, to_shape(dims))

similar(array::ModelArray, ::Type{ElementType}, dims::Dims) where ElementType =
similar(Unzip(nothing), ElementType, dims)

_similar_for(c::Unzip{Nothing}, T, itr, ::SizeUnknown) =
similar(c, T, 0)
_similar_for(c::Unzip{Nothing}, T, itr, ::HasLength) =
similar(c, T, Int(length(itr)::Integer))
_similar_for(c::Unzip{Nothing}, T, itr, ::HasShape) =
similar(c, T, axes(itr))

collect(iterator::Unzip) = _collect(
Unzip(nothing),
iterator,
IteratorEltype(iterator),
IteratorSize(iterator)
)

Examples

bramtayl on 16 Nov 2018

🎉6

The above strategy turns out to be extremely sub-optimal for EltypeUnknown iterators, but it still works. I'm not sure I have the programming wherewithall to design a better system.

bramtayl on 16 Nov 2018

👍1

I'm thinking it would probably just need specialized ::ModelArray methods for grow_to! and collect_to! to be efficient, I'll see what I can cook up

bramtayl on 18 Nov 2018

Ok now that #30076 has gone through here are the last two methods to make this efficient (I think):

import Base: setindex_widen_up_to

maybe_setindex_widen_up_to(dest::AbstractArray{T}, el, i) where T = 
    if isa(el, T)
        @inbounds dest[i] = el
        dest
    else 
        setindex_widen_up_to(dest, el, i)
    end

setindex_widen_up_to(dest::ModelArray, el, i) = 
    model_array(maybe_setindex_widen_up_to.((dest.model, dest.rest...), el, i)...)

Should this go into base? Live as a package? If so, any comments before I cook up a PR?

bramtayl on 11 Dec 2018

I'm leaving the code here for now https://github.com/bramtayl/Unzip.jl

bramtayl on 12 Dec 2018

Is there a reason that it does not build?

JeffreySarnoff on 12 Dec 2018

It should only work on master

bramtayl on 12 Dec 2018

Can this become a PR, and then get merged?
Some docs would also need to be updated

oxinabox on 15 Jan 2019

👍7

I'm not sure I have the programming chops to make a PR. Or rather, I have some questions: what file should the code go into? What modifications do I need to make to take into account bootstrap order? Do I need boundschecking in the ModelArray constructor to make sure all the inputs are the same size? How can I adjust the code to take into account offset axes? What would be a good way to effectively benchmark this? How can I tell if the widening optimizations are working as intended?

bramtayl on 15 Jan 2019

Also, I've run into a bit of a problem. Not sure what do in the case of EltypeUnknown, SizeUnknown iterators where @default_eltype yields Union{}. We need to know at least the length of the tuple to start unzipping... I suppose, for an inference free implementation, we'd have to require the tuple length explicitly... ugh...

bramtayl on 21 Jan 2019

I think it is okay to make some assumption on the iterator which is to be unzipped (e.g. not returning tuples of different length) and it would okay to give an error message in the case this contract is not fulfilled.

mschauer on 21 Jan 2019

Well, well. take that in python,
write this in julia:

s1,s2 = (1,2,3),('a','b','c')
s = tuple(zip(s1,s2)...)

@test tuple(zip(s...)...) == (s1,s2)
Test Passed

so

unzip(z) = tuple(zip(z...)...)

@test (unzip ∘ zip)(s1,s2) == (s1,s2)
Test Passed

One more impl, quickly usable.
Let's call it the haiku version of julia unzip

PS: edited v1 w array removed, was not type stable. v2 w tuple seems ok

o314 on 28 Jan 2019

https://github.com/JuliaLang/julia/issues/13942#issuecomment-155668226 and the comment below.

KristofferC on 5 Feb 2019

👍2

We have an implementation; all that's required is for someone to make a PR to add it to Base. The issue is already tagged as "good first issue" and "help wanted" so hopefully someone feels like trying it.

StefanKarpinski on 5 Feb 2019

@KristofferC

Sorry, code was so short it pass under my radar

using Test
unzip(z) = tuple(zip(z...)...)

s,t = [1:1_000_000...],[1:1_000_000...];
z = zip(s,t) |> collect;
@test_throws StackOverflowError unzip(z)

s,t = [1:100...],[1:100...];
z = zip(s,t) |> collect;
@time unzip(z);
# 4.105195 seconds (26.41 M allocations: 1.021 GiB, 17.28% gc time)

it's an useless toy

o314 on 5 Feb 2019

Yes, that's been noted:

Trying the zip(...) thing on any non-toy problem shows that it's not so good pretty fast, unfortunately.

I was referring to @bramtayl's Unzip package, which has a real implementation... but also appears to have been deleted from GitHub. So I guess we do not actually have an implementation. @bramtayl's is there any chance you can resurrect your Unzip code? Was there some reason you deleted it?

StefanKarpinski on 5 Feb 2019

Of course, that's was clearly understandable.
I was just answering to Kristoffer about the splat option.
Didn't figure numbers were so bad.

Thanks for pointing out the correct track.
Still a bit hard for me at this point to contribute to Base.
Good luck to other people...

o314 on 5 Feb 2019

No worries. I didn't bother measuring the splatting option because I knew it was going to be terrible 😬

StefanKarpinski on 5 Feb 2019

OK. That is surprisingly pretty good however in python

def unzip(z): return zip(*z)

def bench_zip_unzip():
    z = [*zip(range(1_000_000), range(1_000_000))]
    u = [*unzip(z)]

import timeit
timeit.timeit(bench_zip_unzip, number=1)
# 0.4816088059997128 second

o314 on 5 Feb 2019

one other thing that would be useful would be to allow it to take a function first argument, so that you could do e.g.

s,c = unzip(sincos, x)

for a vector (or array) x.

simonbyrne on 6 Feb 2019

👍1

Copying some thoughts from a thread on Slack, the utility of unzip is that it works like collect does on a single iterable, materializing it into a concrete vector but producing as many vectors as there are elements in each tuple that the iterable yields. The iterable _must_ yield the same sized tuple each time; it need not, however, yield the same element types each time; unzip should handle this the same way that collect or comprehensions do (they're slightly different, so I guess that's a design choice), for each vector that it outputs. So the result of unzip(itr) should be the same as

ntuple(length(first(itr))) do i
    [t[i] for t in itr]
end

except that it only consumes itr a single time and it should error if any of the tuples has a different length than the first one. That will be a bit tricky to write 🤔. Mocked up example:

julia> itr = [(1,2,3),(3,1.5,'x')]
2-element Array{Tuple{Int64,Real,Any},1}:
 (1, 2, 3)
 (3, 1.5, 'x')

julia> unzip(itr)
([1, 3], Real[2, 1.5], Any[3, 'x'])

StefanKarpinski on 7 Feb 2019

See #30987

bramtayl on 7 Feb 2019

👍1

Thanks, @bramtayl!

StefanKarpinski on 8 Feb 2019

It will be good if we can unzip Array{Pair}, Base.Iterators.Pairs too. eg.

unzip([k=>v for (k,v) in zip('a':'c',1:3)]) == unzip(zip('a':'c',1:3))

o314 on 11 Feb 2019

bump

JeffreySarnoff on 5 Jun 2019

https://gist.github.com/bramtayl/2d6d323582f7c3158f9a1b5299997d68

Not skilled enough to make a PR, but I think this should be a pretty robust implementation

bramtayl on 13 Jun 2019

❤2

Would this be considered an acceptable solution?

unzip(z::Iterators.Zip) = z.is
unzip(itr) = unzip(collect(itr))
unzip(::AbstractArray) = error("Can only unzip array of tuples.")

@generated function unzip(a::AbstractArray{T}) where {T<:Tuple}
    types = T.types
    types[end] <: Vararg && error("Tuples in the array should have the same length.")
    isempty(types) && return :(())
    n = length(types)
    quote
        @nexprs $n i -> out_i = similar(a,$types[i])
        @inbounds for j in eachindex(a)
            @nexprs $n i -> out_i[j] = a[j][i]
        end
        return @ntuple $n out
    end
end

90% of this implementation is taken from the package Destruct.jl by @spalato, who commented earlier in this thread.

This implementation is maximally efficient for dense arrays of tuples (which was the motivating example for this issue). It is slightly less efficient for arrays with non-standard indexing, due to reliance on eachindex.

For the Zip iterator, it just returns the underlying iterators without truncating them to a common size. I.e. a,b,c = unzip(zip(a,b,c)) is a no-op.

For other iterators and generators, it first collects the output into an array and unzips that. This is not ideal, since it requires twice as much memory, but I simply don't have enough experience with iterator internals to implement it better. (This is also why I haven't tried to ressurect @bramtayl's implementation: I simply couldn't wrap my head around it yet :smile:).

AFAICT, the above code fulfils all the functional requirements set out by @StefanKarpinski above. If this implementation is deemed acceptable, despite using generated functions and sacrificing efficiency in some cases, I can add tests and documentation and open a PR.

zzyxyzz on 11 Jul 2019

The code in the gist was written relatively recently, so I think it should be copy-pastable? It sounds like its a lot more fully featured than what your proposing (works with arbitrary iterators, returning a variable number of items). Just need someone to figure out the PR details (something I never seem to be able to do successfully)

bramtayl on 11 Jul 2019

I'm also happy to explain whatever is helpful. Lispy-tuple-recursion and iterative widening are both very useful to know how to do regardless.

bramtayl on 11 Jul 2019

It sounds like its a lot more fully featured than what your proposing (works with arbitrary iterators, returning a variable number of items).

Iterators are supported, but with some overhead. I think the main difference between the two variants is that you produce arrays with missing values if the tuples have different length. Stefan's comment implies that an error should be thrown in this case, but I can see how using missing values could be sometimes more useful. If this behavior is preferred, I think it can be implemented in the generating-function-based approach as well.

I'm also happy to explain whatever is helpful. Lispy-tuple-recursion and iterative widening are both very useful to know how to do regardless.

Thanks, I might have to take you up on that offer yet :smile:. But I really don't feel qualified to finish what you started there. I just spent several hours trying to understand your implementation and still only have a general idea of how it works. And that's without even touching on the missing and widening stuff.

I also have to ask, why is this style preferred in Julia (in array library code, at least)? As far as I can tell, the end result is that a specialized method is produced for every combination of tuple and container type that is invoked in user code. Which is more or less the same as what a generated function would do. On the other hand, a generated function does exactly what it advertises, while the other solution goes back and forth through twenty layers of abstraction (this is not a metaphor; I think the call chain really is this deep, even though it all gets inlined in the end). I'm sorry if this sounds argumentative, and I'm also starting to think this may be the wrong place to ask such questions, but I'm genuinely curious why layered abstractions relying on vararg splatting on every other step are preferred to gen-funcs in general.

zzyxyzz on 11 Jul 2019

Two things:

1) Collecting into a array first is unworkable when some of the columns are small unions, because the eltype will be Any
2) Because Julia is a dynamic language, there is no return type. So you can't look at an iterator and guess what types each of the columns will end up being, or for that matter, how many columns there are going to be in the first place.

bramtayl on 11 Jul 2019

Collecting into a array first is unworkable when some of the columns are small unions, because the eltype will be Any

You're right about that, of course. Though I think you are holding yourself to a higher standard than the standard library (no pun intended). Maybe it would be preferable to use your proposed changes to improve collect, so that it doesn't widen types more than necessary? Then less skilled people, such as myself, could get away with falling back on it without too high a cost :smile:.

Also, if the "iterator" passed to unzip is an actual array, there is the same problem: [(1,"a"), (2,3)] infers as Array{Tuple{Int64,Any},1}. Does your implementation tighten the output type to Tuple{Array{Int64,1}, Array{Union{Int64,String},1}} in this case as well?

zzyxyzz on 11 Jul 2019

Yes

bramtayl on 11 Jul 2019

Sorry, I didn't mean for that to sound short. I really appreciate all the work you're doing, and happy to collaborate on any solutions.

bramtayl on 11 Jul 2019

That's impressive. Can you explain, in general terms, how you do it? In my understanding that would require backtracking and converting some of the columns when you encounter a tuple with an unexpected length or type. And the result type could not be determined until the operation is complete, no?

zzyxyzz on 11 Jul 2019

Yes and yes. push_widen and setindex_widen_up_to already do this in collect for Base (and I outsource to them). Whether or not the result type can be determined depends on how smart inference is being that day, but the short answer is that because Julia is a dynamic language, it doesn't matter anyway.

bramtayl on 11 Jul 2019

But I do have a doctest for inference:

julia> stable(x) = (x, x + 0.0, x, x + 0.0, x, x + 0.0);

julia> @inferred unzip(Generator(stable, 1:4))

So, in some cases, inference can figure out what's going on just fine.

bramtayl on 11 Jul 2019

But in general, this is not type-stable, I guess, especially with arrays rather than generators?

zzyxyzz on 11 Jul 2019

Yes. Which is ok, that's par for the course in Julia.

bramtayl on 11 Jul 2019

Thanks for the explanations! Gives me a lot to think about.

Last question: Do you know why this behavior is not implemented for collect? Is it just that nobody got around to it yet, or is there some other hidden cost to consider?

zzyxyzz on 11 Jul 2019

I'm not sure I'm qualified to answer that question, but I think #25553 is the place to start looking

bramtayl on 11 Jul 2019

The answer is here https://github.com/JuliaLang/julia/pull/26501#issuecomment-373957902 I think

bramtayl on 11 Jul 2019

👍1

Hmm... correct me if I'm wrong, but #26501 (comment) and #26501 (comment) seems to suggest that diverging tuple fields widening to Any rather than Union is by design (because compile times). So the tradeoff here is partly in how much we want to tax the compiler.

@StefanKarpinski, could you chime in with an opinion?

zzyxyzz on 11 Jul 2019

Notice that there is some mismatch between what we expect from iterator protocol v1

annex - unzip as iterators dot

And how unzip works internally

annex - unzip as parallelized-iterator dot

Without seekable io / indexable iterator and/or buffering, that could never fit.

o314 on 2 Oct 2019

I took at stab at simplifying the code in my gist. I made a little headway but not much. If it's any comfort, many of the functions (map_unrolled, partial_map, zip_missing) would fit nicely into a unexported mini-module for unrolling tuple operations. I've been kicking around a plan for that for a long time.

bramtayl on 9 May 2020

❤3

It would be really nice to get this solved for 1.6.

oxinabox on 7 Sep 2020

👍4

@StefanKarpinski I remember that you had a full grasp of this -- is anything of deep import holding it back?

JeffreySarnoff on 8 Sep 2020

Only performance, Iirc.

StefanKarpinski on 8 Sep 2020

@JeffreySarnoff, I may be misremembering some things, but I think there were some unresolved questions aside from performance:

What to do if tuples have different length?
- Fill in missings?
- Throw error?
What to do if the same field has different types in different tuples?
- Error?
- Promote to Any?
- Promote to smallest possible Union?
- Promote to parent type?
- Widen numeric types?
Lazy or strict semantics?
- In particular, do we want unzip of zip be the identity? (Rather than throwing an error if the input arrays are of different size, as zip currently does).

Maybe some other things too, but it's been a while since I read through the entire thread.

Note that these issues only really arise in TypeUnknown iterators. For homogeneous tuple arrays there is the Destruct package, which implements a simple and efficient unzip operation.

zzyxyzz on 8 Sep 2020

33324 seems pretty good to me still. If we want to simplify the code I think the first step would be to solve #31909

bramtayl on 8 Sep 2020

I wrote up https://github.com/bramtayl/Unzip.jl and it should be registered 3 days from now.

bramtayl on 9 Sep 2020

👀2

_afaic a clear candidate after looking at_ dump(zip(avec, bvec))

unzip that which is zipped

    unzip(x::Base.Iterators.Zip) = x.is

JeffreySarnoff on 9 Sep 2020

👍1

Feel free to open up a PR over there, but I'm not so sure about that method. The behavior in my version unzip is to collect new vectors, not pass along old ones. z.is also pretty easy to type if you already have a zip.

bramtayl on 9 Sep 2020

_iknow uknow this is furiously obvious_
* `unzip` that which is `zip`ped
  unzip(x::Base.Iterators.Zip) = z.is

How is this obvious? It's one possible way to define unzip of zip, and has the (IMO major) disadvantage that unzip is not guaranteed to return arrays of equal length.

zzyxyzz on 9 Sep 2020

👍1

(it works better with z.is replaced by x.is)
It will return the arrays that were given to zip.
How is that a major disadvantage for unzip?
I conceived that its essential purpose is to recover what zip got.

Certainly, the unzip of a collected zip operates on equilength firsts and .. and lasts, irrespective of what zip got. And that is appropriate, as one can only process information that exists when the processing occur. The most performant approaches to unzip(x::Vector{Tuple{_}}) cannot be expected to perform as quickly as does doing x.is.

To forgo that very high throughput advancing some computation because it provides unequal length vectors iff unequal length vectors were given to zip seems unnecessary. Where that is undesirable, one may apply e.g.rtrim_longer_vecs(unzip(zipped)) and obtain equi-sized vectors.

JeffreySarnoff on 9 Sep 2020

@zzyxyzz I meant "obvious" in the sense that this is what dump(zip(a,b)) shows. I had not known about the inner structure of our zips until today .. so, I revised that phrase: "afaic a clear candidate after looking at dump(zip(avec, bvec))".

JeffreySarnoff on 10 Sep 2020

return arrays of equal length

good point

mschauer on 10 Sep 2020

Maybe parents would be a good name for what function (x::Base.Iterators.Zip) = x.is end does

bramtayl on 10 Sep 2020

@JeffreySarnoff, I'm not saying that making unzip(zip(...)) a no-op is necessarily a bad idea. It's just that unzip has a surprising number of loose ends and subtle trade-offs for something that is so simple in principle. This is marked as "good first issue", but it really isn't. Over the years several people have taken a stab at it (@bramtayl has come furthest), but it's no accident that there is still no generally accepted solution.

zzyxyzz on 10 Sep 2020

Maybe this issue should be renamed, following Julia tradition, "Taking unzip seriously". :stuck_out_tongue:

zzyxyzz on 10 Sep 2020

👍1

Registered!

bramtayl on 13 Sep 2020

❤2

For unzip(x::Base.Iterators.Zip) where the lengths of x.is are equal, would you rather return x.is or a copies of the constituents?

JeffreySarnoff on 13 Sep 2020

@JeffreySarnoff

For unzip(x::Base.Iterators.Zip) where the lengths of x.is are equal, would you rather return x.is or a copies of the constituents?

My personal take on this:

In any decent implementation, unzip should call its source iterator exactly once for each input tuple. This forces unzip to construct the output arrays in lock-step; it cannot finish the first array before starting on the second. Thus, unzip cannot act as a lazy iterator in any reasonable sense. It is fundamentally a collect-like operation. So the safe, consistent thing to do would be to emulate the behavior of collect and always return freshly allocated arrays.

But I do admit that the temptation to special-case unzip(::Base.Iterators.Zip) is strong. And it wouldn't be the first time that safe, predictable behavior is sacrificed for performance in Julia. :smile:

zzyxyzz on 13 Sep 2020

Julia: Base.unzip()

Most helpful comment

All 80 comments

TODO: use fieldtypes instead of value_field_types in a future with more constant propagation

Unzip needs to replicate AbstractArray similar machinery

Examples

33324 seems pretty good to me still. If we want to simplify the code I think the first step would be to solve #31909

Related issues