Given the importance of type stability, I would like to propose a new language construct that would allow semi-static checking of type stable blocks. In the module I am developing, I am frequently using this type of construct
function f(a)
# Type stability
@stabilize m::Matrix{Float64} = a.m
# Compute things with m
end
...
I know there are:
@code_warntype
that are useful to see/check whether the written code is type stablebut I would really like to have something that prevents code from being instable, e.g. using a "stable block" (I check and this gives a syntax error with julia 0.[34]):
function f(a)
# Type stability
@stabilize m::Matrix{Float64} = a.m
stable begin
# Compute things - compilation error (or warning) if there is some type instability within the code
end
end
...
Another syntactic possibility would be to use pragmas #7449 (if they make it in the language).
A third--and I suspect preferred--syntactic possibility would be to add it as an annotation macro which would insert a :meta
node in the AST, like @inline
, @simd
, and @fastmath
:
@error_unstable begin
...
end
The hard part of this is the backend--detecting and acting on the type instability.
It would be a good idea to have annotations, this could be extended to enforce stability at a variable level:
@error_unstable a.m::Matrix{Float64}
would mean "from now on, a.m
is a matrix of floats.
I have no idea on what it means at a backend level though, but I suspect that at some point it is possible to see instability - since the @code_warntype
has somehow access to it.
@bpiwowar, if I were planning to implement this (to be clear: I'm not), I'd start by playing with the following strategy:
@leaftype a b c function myfunc(x,y)
...
end
to call pushmeta!(ex, :leaftype, :a, :b, :c)
(see http://docs.julialang.org/en/latest/devdocs/meta/); this enforces signals your intent that a
, b
, and c
should all be leaf types. EDIT: without any variable names, you could require that all variables are leaf types.
inference.jl
to check, after inference is complete, that any leaftype constraints are satisfied. You can examine how inlining works (with its call to popmeta!
) for inspiration.If it helps, I added some comments to inference.jl
in #10691 (which was not merged, but you can look over the PR). If you're hacking on inference.jl
, emitting ccall
s to jl_
is the best way I know of to figure out what's going on (see http://docs.julialang.org/en/latest/devdocs/debuggingtips/).
@bpiwowar you may have linked the wrong issue when you mentioned pragmas. Did you mean https://github.com/JuliaLang/julia/issues/7449?
@StefanKarpinski yes
I suspect that a better way to do this might be to add more features to Julia that allow you to introspect properties of functions like type stability and then add assertions after definitions that the properties you want to hold are true. So you'd do something like this:
function f(x,y)
...
end
TYPES = [Int, Float64, Rational{BigInt}]
for S in TYPES, T in TYPES
@assert istypestable(f,(S,T))
end
Part of the issue is that it's not entirely meaningful to ask if a generic function is type stable. Which method? Given what argument types? Humans are pretty good at looking at code and guessing which argument types you might want to apply the function to, but machines aren't so good. Unless a function has concretely typed parameters, it's usually possible to add new types with new behaviors that would make the function type unstable, so the question of type stability without further elaboration is usually not meaningful.
I agree that type stability should not be checked through some static analysis; but I don't want either to hard-code, as you propose, all the cases that should be checked (although this could be a good option for ensuring that critical parts of libraries are as type stable as possible).
What I was proposing was rather that when the function is compiled a warning, or an error, is generated if some part of the function is not type stable. Maybe this is not the right way to program in Julia, but what I do in many functions is:
1) ensuring that the data structures I will be working with have some concrete types (I can only know those at compilation time since they are computed from parameters and/or arguments). This part is not usually type stable
2) then comes a block that should be type stable
At the moment, I am using @code_warntype
and look at the line numbers to track down all the type instability issues within the critical block until I do not see any more type unsafe operations (as highlighted by the macro), but I would much prefer some way to say to the JIT compiler: everything here should be type stable, and have more readable messages (saying which part of the code is not type stable).
Rather than focusing on which parts of the code aren't type-stable, it will be far easier (from an implementation perspective) to focus on which _variables_ are not leaf types. When it throws an error, then a human can look at the code to figure out why.
I guess it's a lost battle but I'm not sure I like the idea of having those assertions in user code. I mean, it's probably good to have tools to solve this type uncertainties when it causes a performance problem, but let's just remember that return Any
is a perfectly valid type inference algorithm and sometimes it comes down to this due to heuristics, which could change on any commit to master.
I agree that it's not obvious we should have such things; at the same time, I understand where he's coming from.
Really, I think the best strategy is this: run your code. Happy with the performance? Great! (And move on.) Unhappy with the performance? Profile it. Functions that run more slowly than you expect are probably type-unstable. And there's no point worrying about functions that are type unstable but don't contribute to your run time.
Here I disagree a bit. When I am writing code, I know there are parts that have to be efficient and type stable - and I want to quickly (when I am running experiments) see that there are some problems that could affect speed. If I make a change to the code, I also want to see those warnings as soon as possible. I guess this is also due to the fact that (most) type instability are easy to fix, so (at least to me) it makes sense to track them down.
I would love though having a profiler that would point directly which type instability are the most critical from a cpu/memory perspective.
@bpiwowar wrote:
I would like to propose a new language construct that would allow semi-static checking of type stable blocks.
and
What I was proposing was rather that when the function is compiled a warning, or an error, is generated if some part of the function is not type stable.
My point was that type stability cannot be checked statically 鈥撀爄t's not even well-defined. Type stability can only be checked when a method is compiled, which can happen at any point in time. Note that this does _not_ happen when your source code is parsed 鈥撀爄t happens when the function is called with a particular set of arguments.
The profiler already does do this: look at the results with ProfileView, and focus your attention on the red bars.
But if you really do want this: I gave you a roadmap for how to implement precisely what you're asking for.
The compiler does have the info of how many generic call it had to emit and, e.g, how many jl_box_XXX it incurred in the generated code.
We could store this info somewhere and have a way to ask for some feedback, or "perf diagnostic", from the codegen.
The post-mortem diagnostic approach makes more sense since at that point it's meaningful to ask "what things that I ran were not type stable".
@timholy thanks, I will look at both the profiler (last time I tried it was all plain text, with no red bars) and the roadmap (how much time do you think it would take - let's say for you and I will figure out my multiplicative factor)
@StefanKarpinski @carnaval Yes this would definitely be great - I was impressed by the impact it has on speed and memory, so that's why I am tracking those down in parts of code that are speed critical.
EDIT: If you have a roadmap and a time estimation, I will look at it (or it can be a useful reference for somebody wishing to do this)
EDIT: just to know, will this get better with further version of Julia (I mean - will type instability be less a concern) ?
@StefanKarpinski @carnaval Yes this would definitely be great - I was impressed by the impact it has on speed and memory, so that's why I am tracking those down in parts of code that are speed critical.
Yeah the generated code is pretty much "all or nothing" for now, and there is certainly much we can do to improve the performance of the various dynamic cases. There isn't too much effort in this direction since usually you get excellent performance by simply rewriting it in a way the inference understands.
When I was trying to get nice SIMD vectorised code, I also wanted a feature
along those lines, that would allow me to state my intentions to the
compiler and have it tell me where my expectations broke down. I think
there's more things than type stability that are of interest. One criterion
that I found pretty crucial when writing an inner loop was whether the
function became a leaf function, eg free of call instructions. (At the
IR/assembly level at least.)
Tim meant ProfileView.jl, check it out, it's much easier than just the line view of profile.
It's always nearly impossible to provide an estimate of a coding project that's anywhere near realistic. Just as a joke, I'll say it would take me less than one day to implement that roadmap. It's really only two focused changes: add the @leaftype
macro, and insert what might be just a few lines into inference.jl
to loop over all variables and check them.
You want to do as much of the development outside of the julia tree as possible, and then move everything in once you have it working. You can grab the type-inferred AST from code_typed
and start playing with the representation. See also http://blog.leahhanson.us/julia-introspects.html. I think you can get 95% of this working outside base before moving it in.
But again, no promises that your changes would be merged! Still, doing the actual implementation is often required to get things considered seriously; otherwise it's hard to guess what it will involve, and vaporware is rarely worth reviewers' time.
FWIW, type stability is also a useful tool for modeling and correctness analysis, not just performance.
Hi,
I tried to start the implementation just to see if it was an easy task or not. I don't know if I this is a bug, but with
macro leaftype(ex...)
esc(_leaftype(ex[end], ex[1:(end-1)]...))
end
_leaftype(ex::Expr, symbols::Symbol...) = pushmeta!(ex, :leaftypemeta, symbols...)
@leaftype function f1(x)
for i in 1:1000 x += 0.5 end
begin y = 1 end
end
@leaftype function f2(x)
for i in 1:1000 x += 0.5 end
end
findleaftype(ex::Any) = false
function findleaftype(ex::Expr)
leaftypemeta, args = popmeta!(ex, :leaftypemeta)
if leaftypemeta return true end
for subex in ex.args
if findleaftype(subex) return true end
end
false
end
println("f1: ", findleaftype(@code_typed(f1(1))[1]))
println("f2: ", findleaftype(@code_typed(f2(1))[1]))
I get
f1: false
f2: true
This seems to be because the generated code adds a NewvarNode(:y)
expression before the meta node. Is this specific to code_typed or will I get the same kind of problem with inference.jl
? In that case, is this a bug?
Yep, it's a bug. See discussion in #11595. You may just want to wait for it to be fixed before tackling this further, or you can locally back out the troublesome commit and then rebase when the fix arrives.
Thanks for looking into this!
I think recent moves of Julialang towards GPU and TPU programming makes it urgent for type stability assertion. (#30007 )
@StefanKarpinski , you have mentioned several interesting proposals, but I suspect they can be hardly helpful in detecting kernel type instabilities since argument types of kernel functions on devices can not be tested easily (e.g. CuDeviceArray in CUDAnative). But I agree with you that type stability check can only be done at run time, and it is probably hard. I am not a PL theorem specialist, could you please explain a bit why Julia is able to put a "Box" around a variable, but is unable to detect box allocation inside a function?
I have just proposed something related. Not a static type checking facility, but rather the first time some method gets compiled, the compiler could issue a warning about type instabilities (if that is enabled by the user). It would make type checking much more immediate than it currently is with @code_warntype
.
https://github.com/JuliaLang/julia/issues/30155
Profiling is fine if you have a one-off performance issue you're trying to resolve. It's not practical for detecting regressions. In our system we generate code with inner loops that have to be tight - e.g. no type unstable code or allocations. It's growing into a big system with lots of people hacking on it, and we need a way to automatically monitor for changes that introduce type instability or allocations in performance-critical regions. Something like @assert_type_stable
and @assert_no_alloc
that would trigger an assertion failure at runtime would be ideal. Then we could enable those assertions for our automated tests, and disable them for production. Automated monitoring of benchmark times is hard because the timings are noisy, and step detection is unreliable and delayed.
Makes sense. If this is super-important, one might consider sponsoring someone at JuliaComputing to make it happen faster. (Note: I do not work for JC, so I am not saying this out of self-interest.)
Most helpful comment
Profiling is fine if you have a one-off performance issue you're trying to resolve. It's not practical for detecting regressions. In our system we generate code with inner loops that have to be tight - e.g. no type unstable code or allocations. It's growing into a big system with lots of people hacking on it, and we need a way to automatically monitor for changes that introduce type instability or allocations in performance-critical regions. Something like
@assert_type_stable
and@assert_no_alloc
that would trigger an assertion failure at runtime would be ideal. Then we could enable those assertions for our automated tests, and disable them for production. Automated monitoring of benchmark times is hard because the timings are noisy, and step detection is unreliable and delayed.