I find the syntax for creating uninitialized arrays a bit verbose, while there are nice and short options for almost all other common cases of creating arrays:
# compare to
Float64[1, 2, 3, 4, 5]
zeros(Float64, 10)
ones(Float64, 10)
fill(1.0, 10)
But for uninitialized arrays you always have to use curly bracket syntax if I'm correct:
v = Vector{Float64}(undef, 10)
How about one of these two alternatives, which both seem to be available:
v = Float64[undef, 10]
arr = Int32[undef, 3, 4, 5]
# or
v = undef(Float64, 10)
arr = undef(Int32, 3, 4, 5)
The second one is actually easy to get via:
(::UndefInitializer)(T::Type, dims::Vararg{Int}) = Array{T}(undef, dims...)
See this issue and this Discourse discussion.
These posts are both very long and mostly discuss something other than your concrete proposal. However, there is at least one relevant point, namely that undef(Int, 10)
returns an Array
, when there are so many other AbstractArray
s that could be usable.
Not sure I agree, though. The same could be said for zeros
, ones
and fill
. Array
still is by far the most used AbstractArray
.
So think your proposed undef(T, dims...)
syntax would be nice. It's short and explicit, and would probably be used quite often.
would probably be used quite often.
...which might be a reason not to do it...
...which might be a reason not to do it...
It may be important to make the "uninitialized" part explicit, but I don't think it's necessary to make the syntax harder to use.
This is indeed somewhat intentional, to discourage uninitialized arrays. But we also wanted to move towards more general and regular syntax instead of all the special cases like zeros(...)
. The syntax undef(T, dims)
is ok, but I question whether having more ways to write it is actually easier to use.
If we want to increase uniformity, one thought for 2.0: deprecate zeros
, ones
in favor of fill(v, axes)
, and consider allowing fill(Undef{T}, axes)
for an uninitialized array with eltype T
. (fill(T, axes)
won't work because what if you want to create an array of types?)
deprecate zeros, ones
Wasn't this already discussed, cf https://github.com/JuliaLang/julia/issues/24444?
I guess I'm consistent!
I鈥檝e actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this:
Array{T}(undef, m, n)
Array{T}(zeros, m, n)
Array{T}(ones, m, n)
Why? It makes it easier to swap out any of the properties of what鈥檚 being done: it cleanly separates the container type, the element type, what to initialize it with and the dimensions.
Note also that while ones
, fill
etc make sense for most <: AbstractArray
types, undef
is the odd one out in the sense that is only practical for mutable arrays.
undef is the odd one out in the sense that is only practical for mutable arrays.
Not really, because in reality undef
means uninitialized
(which is what originally called). I made a PR to rename it to undef
(shame on me) but in hind-sight, uninit
would probably have been better.
I think the point is that making an uninitialized immutable array isn't very useful.
Oh, yeah, I misread that.
Comet topic! :)
I鈥檝e actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this: [...]
For interested newcomers to this discussion, https://github.com/JuliaLang/julia/issues/24595#issue-273539562 discusses this direction at length as 'the second proposal':
The more general extension of this model is
MyArray[{...}](contentspec[, modifierspec...])
. Roughly,contentspec
defines the result's contents, whilemodifierspec...
(if given) provides qualifications, e.g. shape.
One thing we could do is:
Array{T}(zeros, dims...)
etc. workundef(T, dims...)
and undef(dims...)
workThat way we round out the collection of convenience constructors in way that can always be expressed in terms of the fuller Container{Eltype}(initializer, dims...)
form.
make Array{T}(zeros, dims...) etc. work
What I see from this syntax is that whatever initializer put here should be as fast as undef. Since zeros
is way sloweeer than undef
, I think it's perhaps the time to get some updates on https://github.com/JuliaLang/julia/issues/130
julia> @btime zeros(Float64, 1000, 1000);
443.589 渭s (2 allocations: 7.63 MiB)
julia> @btime Array{Float64}(undef, 1000, 1000);
37.140 渭s (2 allocations: 7.63 MiB)
I am trying to think about the implications of these proposals for generic code. It is not clear to me if
zeros
, ones
, fill
) were meant to be convenience constructors for Array{T,N}
, or more generic (and if yes, how generic? should there be a unified API for various collections of homogeneous items? does that even make sense?)zeros
and ones
is syntactic convenience (shorter than fill(one(T), dims...)
), or something more abstract (as zero
and one
are, for additive and multiplicative identities), or speed (we can do zeros faster for some types?)As for (1) a lot of packages define Base.zeros
etc for their own types, which are not even necessarily <:AbstractArray
. Should they do the same for the proposed undef(...)
(if applicable)?
Regarding (2), it would be nice for custom types to be able to rely on a default like
function zeros(S::Type{SomeCustomType{T}}, shape...) where T
fill(S, zero(eltype(S)), shape...)
end
and define only this fill
method; zeros
only when that confers an extra advantage. Then we could unify syntax with the fallback
function undef(SomeCustomType{T}, shape...)
SomeCustomType{T}(undef, shape...)
end
What I see from this syntax is that whatever initializer put here should be as fast as undef.
I don't understand why that should be the case. Yes, we want initializers to be as fast as we can make them, but some require more work than others. Why would we require that they all be as fast as doing nothing?
Since zeros is way sloweeer than undef
It is quite tricky to measure this since the OS can sometimes give out uninitialized memory "for free" and only commit to the actual allocation when the memory is used.
May I suggest that Array{T}(zero, dims...)
might be a more natural spelling, rather than zeros
? And perhaps any function, filled with f(T)
, which would allow Array{T}(randn, dims...)
.
I guess that clashes slightly with undef(T, dims...)
, which seems less obviously a good idea.
Most helpful comment
This is indeed somewhat intentional, to discourage uninitialized arrays. But we also wanted to move towards more general and regular syntax instead of all the special cases like
zeros(...)
. The syntaxundef(T, dims)
is ok, but I question whether having more ways to write it is actually easier to use.