What is your question?
Is there a recommended way to precompile cudf kernels, ideally on app start / module import, such as a decorator?
The root issue is we have many small methods, and as not every task triggers all of them, it's a bit tricky to make sure there are no surprise runtime compilation happening, which kills interactivity and trips all sorts of QoS alerts & timeouts. We currently work around by writing a def warm() method in RAPIDS-using module that calls various methods to trigger compilation. We were going to look at adding some sort of @warm_rapids decorator, but thought to check here first for any guidance on if/where/how.
We are investigating eliminating JIT for binary operations. If we do, then JIT will only be used for UDFs, which by their nature would require user-defined warming.
If there are other cases where you are experiencing poor JIT performance, please provide specifics.
Binops + stdlib (.merge(), ...) + UDFs are indeed top of mind, and in roughly that order, for JIT warming
If broadening scope to JIT / interpreter overhead in general, less clear to me is ~deforestation. In the fn def(df, y, z): df.merge(..) + y + z, I can imagine fusing partial f(a, b, c) => a + b + c , and potentially going even deeper with some of the more popular std ops like merge. Likewise, both for first run and for re-runs. (We're software, so heavy reuse.) I already filed a separate thread on async support, which might give breathing room here as well.
Edit: Pragmatically... sounds like, "For the foreseeable future, keep writing your own warmup()". Should we do that just for UDFs, or for non-UDF use too?
There's no JITing happening in standard operations, so nothing to warm there. There's a few usages of cupy that have some JITing under the hood that we're already working to remove to something that isn't JIT based.
Biggest offender right now is definitely binops (especially with scalars), which will move to an AST based implementation instead of the current JIT based approach in the future which will remove the JIT overhead.
UDFs are a black box to us where you'll likely have to pre-warm yourself for the foreseeable future.
Closing as this has been answered. Additionally we have removed more usage of cupy in #5974 to further avoid some JITing.