Julia: Simpler syntax for creating uninitialized arrays

Created on 16 Feb 2020 · 19Comments · Source: JuliaLang/julia

I find the syntax for creating uninitialized arrays a bit verbose, while there are nice and short options for almost all other common cases of creating arrays:

# compare to
Float64[1, 2, 3, 4, 5]
zeros(Float64, 10)
ones(Float64, 10)
fill(1.0, 10)

But for uninitialized arrays you always have to use curly bracket syntax if I'm correct:

v = Vector{Float64}(undef, 10)

How about one of these two alternatives, which both seem to be available:

v = Float64[undef, 10]
arr = Int32[undef, 3, 4, 5]

# or

v = undef(Float64, 10)
arr = undef(Int32, 3, 4, 5)

The second one is actually easy to get via:

(::UndefInitializer)(T::Type, dims::Vararg{Int}) = Array{T}(undef, dims...)

arrays

Source

jkrumbiegel

👍3

Most helpful comment

This is indeed somewhat intentional, to discourage uninitialized arrays. But we also wanted to move towards more general and regular syntax instead of all the special cases like zeros(...). The syntax undef(T, dims) is ok, but I question whether having more ways to write it is actually easier to use.

JeffBezanson on 17 Feb 2020

👍7

All 19 comments

See this issue and this Discourse discussion.

These posts are both very long and mostly discuss something other than your concrete proposal. However, there is at least one relevant point, namely that undef(Int, 10) returns an Array, when there are so many other AbstractArrays that could be usable.

Not sure I agree, though. The same could be said for zeros, ones and fill. Array still is by far the most used AbstractArray.

So think your proposed undef(T, dims...) syntax would be nice. It's short and explicit, and would probably be used quite often.

jakobnissen on 17 Feb 2020

👍2

would probably be used quite often.

...which might be a reason not to do it...

martinholters on 17 Feb 2020

😄1

...which might be a reason not to do it...

It may be important to make the "uninitialized" part explicit, but I don't think it's necessary to make the syntax harder to use.

yuyichao on 17 Feb 2020

JeffBezanson on 17 Feb 2020

👍7

If we want to increase uniformity, one thought for 2.0: deprecate zeros, ones in favor of fill(v, axes), and consider allowing fill(Undef{T}, axes) for an uninitialized array with eltype T. (fill(T, axes) won't work because what if you want to create an array of types?)

timholy on 29 Feb 2020

👍2

deprecate zeros, ones

Wasn't this already discussed, cf https://github.com/JuliaLang/julia/issues/24444?

KristofferC on 29 Feb 2020

I guess I'm consistent!

timholy on 29 Feb 2020

I’ve actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this:

Array{T}(undef, m, n)
Array{T}(zeros, m, n)
Array{T}(ones, m, n)

Why? It makes it easier to swap out any of the properties of what’s being done: it cleanly separates the container type, the element type, what to initialize it with and the dimensions.

StefanKarpinski on 29 Feb 2020

👍4

Note also that while ones, fill etc make sense for most <: AbstractArray types, undef is the odd one out in the sense that is only practical for mutable arrays.

tpapp on 29 Feb 2020

👍2

undef is the odd one out in the sense that is only practical for mutable arrays.

Not really, because in reality undef means uninitialized(which is what originally called). I made a PR to rename it to undef (shame on me) but in hind-sight, uninit would probably have been better.

KristofferC on 29 Feb 2020

I think the point is that making an uninitialized immutable array isn't very useful.

StefanKarpinski on 29 Feb 2020

👍2

Oh, yeah, I misread that.

KristofferC on 29 Feb 2020

Comet topic! :)

I’ve actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this: [...]

For interested newcomers to this discussion, https://github.com/JuliaLang/julia/issues/24595#issue-273539562 discusses this direction at length as 'the second proposal':

The more general extension of this model is MyArray[{...}](contentspec[, modifierspec...]). Roughly, contentspec defines the result's contents, while modifierspec... (if given) provides qualifications, e.g. shape.

Sacha0 on 29 Feb 2020

🎉1 😄1

One thing we could do is:

make Array{T}(zeros, dims...) etc. work
make undef(T, dims...) and undef(dims...) work

That way we round out the collection of convenience constructors in way that can always be expressed in terms of the fuller Container{Eltype}(initializer, dims...) form.

StefanKarpinski on 29 Feb 2020

👍2

make Array{T}(zeros, dims...) etc. work

What I see from this syntax is that whatever initializer put here should be as fast as undef. Since zeros is way sloweeer than undef, I think it's perhaps the time to get some updates on https://github.com/JuliaLang/julia/issues/130

julia> @btime zeros(Float64, 1000, 1000);
  443.589 μs (2 allocations: 7.63 MiB)

julia> @btime Array{Float64}(undef, 1000, 1000);
  37.140 μs (2 allocations: 7.63 MiB)

johnnychen94 on 1 Mar 2020

I am trying to think about the implications of these proposals for generic code. It is not clear to me if

these methods (zeros, ones, fill) were meant to be convenience constructors for Array{T,N}, or more generic (and if yes, how generic? should there be a unified API for various collections of homogeneous items? does that even make sense?)
if the motivation for zeros and ones is syntactic convenience (shorter than fill(one(T), dims...)), or something more abstract (as zero and one are, for additive and multiplicative identities), or speed (we can do zeros faster for some types?)

As for (1) a lot of packages define Base.zeros etc for their own types, which are not even necessarily <:AbstractArray. Should they do the same for the proposed undef(...) (if applicable)?

Regarding (2), it would be nice for custom types to be able to rely on a default like

function zeros(S::Type{SomeCustomType{T}}, shape...) where T
    fill(S, zero(eltype(S)), shape...)
end

and define only this fill method; zeros only when that confers an extra advantage. Then we could unify syntax with the fallback

function undef(SomeCustomType{T}, shape...)
    SomeCustomType{T}(undef, shape...)
end

tpapp on 1 Mar 2020

👍1

What I see from this syntax is that whatever initializer put here should be as fast as undef.

I don't understand why that should be the case. Yes, we want initializers to be as fast as we can make them, but some require more work than others. Why would we require that they all be as fast as doing nothing?

StefanKarpinski on 1 Mar 2020

Since zeros is way sloweeer than undef

It is quite tricky to measure this since the OS can sometimes give out uninitialized memory "for free" and only commit to the actual allocation when the memory is used.

KristofferC on 1 Mar 2020

👍2

May I suggest that Array{T}(zero, dims...) might be a more natural spelling, rather than zeros? And perhaps any function, filled with f(T), which would allow Array{T}(randn, dims...).

I guess that clashes slightly with undef(T, dims...), which seems less obviously a good idea.