We need a way to construct Array{Union{T, Missing}}
objects filled with missing
easily. Here are three proposals which can be seen either as complementary or as alternatives.
1) The Missings package provides the missings(::Type, dims...)
function for that, which is really convenient since it avoids the need to write Union
and Missing
. I could make sense to add it to Base.
2) More generally, if would be nice if Array{Union{T, Missing}}(uninitialized, dims...)
implemented a simpler behavior. Currently, it returns an array filled with missing
for isbits
element types, but an array full of #undef
for other types:
julia> Array{Union{Int, Missing}}(uninitialized, 2)
2-element Array{Union{Missing, Int64},1}:
missing
missing
julia> Array{Union{String, Missing}}(uninitialized, 2)
2-element Array{Union{Missing, String},1}:
#undef
#undef
There are technical reasons for the first case of course. But it would sound reasonable to apply the same rule for all types, if only to make it easier to explain and remember. For example, in the section of the manual about missing values, it would be annoying to have to present an example for isbits
types, and an example for other types.
A possible rule would be that uninitialized arrays are filled with the first singleton type, "first" being defined not by the user-specified order but by the internal sorting of types. In terms of performance, I'm not sure whether it would have an impact. Maybe the current behavior allows using zeroed memory pages provided by calloc
, which can be more efficient than filling them with a custom value?
Array{Union{T, Missing}}(uninitialized, dims...)
, it seems appealing to be able to write Array{Union{String, Missing}}(missing, dims...)
. If that was supported, it would be less of an issue that the rules determining the contents of the array created using Array{Union{T, Missing}}(uninitialized, dims...)
are complex: it would only be used when you really want an uninitialized array (as the syntax indeed indicates).PS: before somebody proposes it, using fill
is not really an option since fill(missing, 2)
gives an Array{Missing}
, and the only alternative is very long: fill!(Array{Union{T, Missing}}(uninitialized, dims...), missing)
.
I'm surprised to see that Array{Union{Int, Missing}}(uninitialized, 2)
is filled with missing
, is that documented or guaranteed behavior in general (e.g. also with Union{Int,Void}
)? What about Array{Union{Int, Missing, Void}}(uninitialized, 2)
? It seems missing
has the priority over nothing
, is that reliable? I thought the whole point of unitialized
is to not make guarantees about the content of the container (just what happens to be the most efficient), so it seems strange to have a consistent filling in some case.
AFAIK that behavior is needed because arrays of isbits
Union
s cannot be left completely uninitialized: at least you need to choose a type for each entry, i.e. you need to set the type tag to a valid type. So you may as well choose a type reliably, by setting the type tag to zero everywhere. I think the order corresponds to what is shown when printing a type, but we may not want to guarantee it.
The problem is always the same: unless you return random contents, people will eventually depend on the implemented behavior even if its not part of the public API. But returning random contents is not always the most efficient solution.
Ok thanks for the clarification! I'm fine then with having consistent filling for isbits Union
s with uninitialzied
, but it seems much clearer to have another official way to consistenly fill with missing
, even for non-isbits, so I support your options 1) or 3) over 2) by far.
Modulo a few details, option three seems mostly in keeping with the direction in #24595? :)
Agree that option 3 seems like the way to go.
https://github.com/JuliaLang/julia/pull/25054 implements option 3.
I like option 3 because it shares the API of what I feel to be a more general solution here (https://github.com/JuliaLang/julia/issues/24595#issuecomment-348146353), which works a bit like:
out = Array{T,N}(input, dims)
# is equivalent to
out = Array{T,N}(unintialized, dims)
out .= input
Replace input
with missing
, and viol脿.
Fixed by #25054
Most helpful comment
Modulo a few details, option three seems mostly in keeping with the direction in #24595? :)