Julia: Simpler syntax for creating uninitialized arrays

Created on 16 Feb 2020  路  19Comments  路  Source: JuliaLang/julia

I find the syntax for creating uninitialized arrays a bit verbose, while there are nice and short options for almost all other common cases of creating arrays:

# compare to
Float64[1, 2, 3, 4, 5]
zeros(Float64, 10)
ones(Float64, 10)
fill(1.0, 10)

But for uninitialized arrays you always have to use curly bracket syntax if I'm correct:

v = Vector{Float64}(undef, 10)

How about one of these two alternatives, which both seem to be available:

v = Float64[undef, 10]
arr = Int32[undef, 3, 4, 5]

# or

v = undef(Float64, 10)
arr = undef(Int32, 3, 4, 5)

The second one is actually easy to get via:

(::UndefInitializer)(T::Type, dims::Vararg{Int}) = Array{T}(undef, dims...)
arrays

Most helpful comment

This is indeed somewhat intentional, to discourage uninitialized arrays. But we also wanted to move towards more general and regular syntax instead of all the special cases like zeros(...). The syntax undef(T, dims) is ok, but I question whether having more ways to write it is actually easier to use.

All 19 comments

See this issue and this Discourse discussion.

These posts are both very long and mostly discuss something other than your concrete proposal. However, there is at least one relevant point, namely that undef(Int, 10) returns an Array, when there are so many other AbstractArrays that could be usable.

Not sure I agree, though. The same could be said for zeros, ones and fill. Array still is by far the most used AbstractArray.

So think your proposed undef(T, dims...) syntax would be nice. It's short and explicit, and would probably be used quite often.

would probably be used quite often.

...which might be a reason not to do it...

...which might be a reason not to do it...

It may be important to make the "uninitialized" part explicit, but I don't think it's necessary to make the syntax harder to use.

This is indeed somewhat intentional, to discourage uninitialized arrays. But we also wanted to move towards more general and regular syntax instead of all the special cases like zeros(...). The syntax undef(T, dims) is ok, but I question whether having more ways to write it is actually easier to use.

If we want to increase uniformity, one thought for 2.0: deprecate zeros, ones in favor of fill(v, axes), and consider allowing fill(Undef{T}, axes) for an uninitialized array with eltype T. (fill(T, axes) won't work because what if you want to create an array of types?)

deprecate zeros, ones

Wasn't this already discussed, cf https://github.com/JuliaLang/julia/issues/24444?

I guess I'm consistent!

I鈥檝e actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this:

Array{T}(undef, m, n)
Array{T}(zeros, m, n)
Array{T}(ones, m, n)

Why? It makes it easier to swap out any of the properties of what鈥檚 being done: it cleanly separates the container type, the element type, what to initialize it with and the dimensions.

Note also that while ones, fill etc make sense for most <: AbstractArray types, undef is the odd one out in the sense that is only practical for mutable arrays.

undef is the odd one out in the sense that is only practical for mutable arrays.

Not really, because in reality undef means uninitialized(which is what originally called). I made a PR to rename it to undef (shame on me) but in hind-sight, uninit would probably have been better.

I think the point is that making an uninitialized immutable array isn't very useful.

Oh, yeah, I misread that.

Comet topic! :)

I鈥檝e actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this: [...]

For interested newcomers to this discussion, https://github.com/JuliaLang/julia/issues/24595#issue-273539562 discusses this direction at length as 'the second proposal':

The more general extension of this model is MyArray[{...}](contentspec[, modifierspec...]). Roughly, contentspec defines the result's contents, while modifierspec... (if given) provides qualifications, e.g. shape.

One thing we could do is:

  • make Array{T}(zeros, dims...) etc. work
  • make undef(T, dims...) and undef(dims...) work

That way we round out the collection of convenience constructors in way that can always be expressed in terms of the fuller Container{Eltype}(initializer, dims...) form.

make Array{T}(zeros, dims...) etc. work

What I see from this syntax is that whatever initializer put here should be as fast as undef. Since zeros is way sloweeer than undef, I think it's perhaps the time to get some updates on https://github.com/JuliaLang/julia/issues/130

julia> @btime zeros(Float64, 1000, 1000);
  443.589 渭s (2 allocations: 7.63 MiB)

julia> @btime Array{Float64}(undef, 1000, 1000);
  37.140 渭s (2 allocations: 7.63 MiB)

I am trying to think about the implications of these proposals for generic code. It is not clear to me if

  1. these methods (zeros, ones, fill) were meant to be convenience constructors for Array{T,N}, or more generic (and if yes, how generic? should there be a unified API for various collections of homogeneous items? does that even make sense?)
  2. if the motivation for zeros and ones is syntactic convenience (shorter than fill(one(T), dims...)), or something more abstract (as zero and one are, for additive and multiplicative identities), or speed (we can do zeros faster for some types?)

As for (1) a lot of packages define Base.zeros etc for their own types, which are not even necessarily <:AbstractArray. Should they do the same for the proposed undef(...) (if applicable)?

Regarding (2), it would be nice for custom types to be able to rely on a default like

function zeros(S::Type{SomeCustomType{T}}, shape...) where T
    fill(S, zero(eltype(S)), shape...)
end

and define only this fill method; zeros only when that confers an extra advantage. Then we could unify syntax with the fallback

function undef(SomeCustomType{T}, shape...)
    SomeCustomType{T}(undef, shape...)
end

What I see from this syntax is that whatever initializer put here should be as fast as undef.

I don't understand why that should be the case. Yes, we want initializers to be as fast as we can make them, but some require more work than others. Why would we require that they all be as fast as doing nothing?

Since zeros is way sloweeer than undef

It is quite tricky to measure this since the OS can sometimes give out uninitialized memory "for free" and only commit to the actual allocation when the memory is used.

May I suggest that Array{T}(zero, dims...) might be a more natural spelling, rather than zeros? And perhaps any function, filled with f(T), which would allow Array{T}(randn, dims...).

I guess that clashes slightly with undef(T, dims...), which seems less obviously a good idea.

Was this page helpful?
0 / 5 - 0 ratings