Julia: broadcasting over arrays of small unions

Created on 15 May 2018  路  6Comments  路  Source: JuliaLang/julia

@KristofferC noticed when the number of elements in the union goes over two, inference falls back to Any.

julia> Base._return_type(square, Tuple{Union{Missing, Float64}})
Union{Missing, Float64}
julia> Base._return_type(square, Tuple{Union{Missing, Float64, Float64a}})
Any

Working with vectors, matrices and arrays of elements typed as a Union of 2, 3, or 4 leaf types has become very fast thanks to much work. The fact that, at the moment, broadcasting fails to autogenerate arrays that match the source's type (array of 3 or 4 unioned leaf types) leaves a lot of low hanging fruit on the vine.
For example, a data vectors designed to handle Float64 and Int64 values (perhaps with sort of source) which support missing values requires Union{Missing, Int64, Float64}. All the machinery seems present.

broadcast inference missing data

Most helpful comment

But indeed the result of combine_eltypes now seems to be used also when it isn't concrete, which I wouldn't have expected.

Where do you see or observe this? We only use the return type if it _is_ concrete or if the result is empty. We are still doing it poorly in the sparse broadcast code, but that's the only place I'm aware broadcast does this incorrectly.

https://github.com/JuliaLang/julia/blob/1b92f51ba3544edfcdb70a2f43ac1bb3a7bb5543/base/broadcast.jl#L762-L766

All 6 comments

This mixes two different issues:

  • what's the element type of the result of broadcast
  • what's the inferred type of broadcasted functions

The former is chosen using Base.promote_typejoin, which special-cases Missing and Nothing (https://github.com/JuliaLang/julia/pull/25553). We have discussed using a more general system (see links in that PR), but it's hard to decide when a Union should be preserved and when it contains so many different types that typejoin should be used instead: should we stop at 3, 4, 5 types? You should be able to override promote_typejoin for very specific types, but it's not really recommended in general.

Whether inference should be more precise is a completely different question, which fortunately can be changed at any point without breaking code: it's just a matter of finding the best threshold for performance.

Isn't

https://github.com/JuliaLang/julia/blob/1b92f51ba3544edfcdb70a2f43ac1bb3a7bb5543/base/broadcast.jl#L643

determining the element type of the result?

Edit: Oh, that's only when the returnt ype is concrete?

AFAIK inference is supposed to be used only when the result is empty. But indeed the result of combine_eltypes now seems to be used also when it isn't concrete, which I wouldn't have expected. I guess @mbauman can clarify.

@nalimilan is absolutely right! This is a perfect example of the problem with return_type. Inference needs to be able to widen unions at some point, or the compiler will be too slow.

But indeed the result of combine_eltypes now seems to be used also when it isn't concrete, which I wouldn't have expected.

Where do you see or observe this? We only use the return type if it _is_ concrete or if the result is empty. We are still doing it poorly in the sparse broadcast code, but that's the only place I'm aware broadcast does this incorrectly.

https://github.com/JuliaLang/julia/blob/1b92f51ba3544edfcdb70a2f43ac1bb3a7bb5543/base/broadcast.jl#L762-L766

Was this page helpful?
0 / 5 - 0 ratings