Numba: Why Julia? Will Python/Numba and Python/Cython lose to Julia?

Created on 28 Feb 2019  路  19Comments  路  Source: numba/numba

Feature request


Yesterday I read an article: Why Numba and Cython are not substitutes for Julia. As I understood the main benefit from Julia is packages and math algorithms ecosystem that can be combined and reused. But they are put together during JIT compilation when in Python packages are put together during interpretation. So there is less bottlenecks and it's actually much more convenient to combine math algorithm this way.

The convenience matters a lot.

So I'm curious if I understood the main idea right and is there a really big deal about combining this way instead of another? If so then why there is no Numba specific packages ecosystem in development that can also be combined JIT compilation way? Or there is actually such Numba packages ecosystem being built and I'm simply not aware of it?

question

Most helpful comment

This is a good question. Numba is still maturing, so there is not really a Numba-specific package ecosystem, nor have we tried to encourage one yet. Whole program (or at least inter-library) JIT compilation is a tricky thing that can lead to very long compilation times if not managed carefully. This usually leads to hard problems like trying to trying to balance compilation time, performance, and being predictable for the user. Julia is obviously thinking about all these issues, but Numba isn't ready yet.

That said, people have lots of reasons they need to stay in the Python ecosystem, and so Numba (or Cython) is there for those applications and libraries that want to inject a bit of compiled code code with minimal effort into a performance critical section. For whole program compilation with a tracing JIT, I am very interested in PyPy and how maybe PyPy and Numba could be used together.

If you have no other constraints, I think it would be interesting to try Julia. We certainly keep a eye on the project, and have borrow good ideas from them from time to time. (The automatic multithreading pass that Intel contributed to Numba was first built for Julia.)

All 19 comments

This is a good question. Numba is still maturing, so there is not really a Numba-specific package ecosystem, nor have we tried to encourage one yet. Whole program (or at least inter-library) JIT compilation is a tricky thing that can lead to very long compilation times if not managed carefully. This usually leads to hard problems like trying to trying to balance compilation time, performance, and being predictable for the user. Julia is obviously thinking about all these issues, but Numba isn't ready yet.

That said, people have lots of reasons they need to stay in the Python ecosystem, and so Numba (or Cython) is there for those applications and libraries that want to inject a bit of compiled code code with minimal effort into a performance critical section. For whole program compilation with a tracing JIT, I am very interested in PyPy and how maybe PyPy and Numba could be used together.

If you have no other constraints, I think it would be interesting to try Julia. We certainly keep a eye on the project, and have borrow good ideas from them from time to time. (The automatic multithreading pass that Intel contributed to Numba was first built for Julia.)

@seibert Thanks for response. There in the article was mentioned that Julia also tries to decouple the code and compile it's parts independently if possible.

Do you have an opinion not just about that it's a hard task but about if it's a good task for data scientists and programmers? Will putting packages together Julia way be better for them than combining them in the python interpreter (in a broad number of use cases)?

Although the gain of different place of joining is still not clear enough for me but if it's there and it would significantly improve usability and reusability then it can be assumed that Julia is a way ahead of PyPy and Numba. And it would be almost impossible to catch up with it. The only feasible ways I can imagine are 1) centralized effort with financial support (though I cannot really estimate the amount of work so this can also be not feasible); 2) to create Python to Julia transpiler with Numba style decorators interface and with good Python-Julia-Python intergation. Small part of the integration is already there: PyCall.

I want to clarify that Julia is not a tracing JIT. In fact, Julia and numba are JIT'ing code in the same fashion. In some way, it's like an on-the-fly AOT compiler. A major feature in Julia that makes it more flexible/dynamic (thus feels like a tracing JIT) is that their functions can be dynamic dispatched and there are dynamic types (their Union types). Numba do not have those yet. This flexibility has its trade-off. When the code relies too much on dynamic types and dynamic dispatch, the performance will degrade to like the Python interpreter, in which everything are dynamic types (PyObject) and everything relies on dynamic dispatch.

The kind of inter-library, whole-program optimization is available in PyPy as it is a true tracing JIT. For Numba, inter-library optimization is possible and it is being leveraged. If a library exposes @numba.jit'ed functions, other libraries using those functions inside @numba.jit is able to optimize across libraries.

The inter-library optimization can only go so far though. Even Julia uses pre-compiled BLAS implementation (i.e. OpenBLAS, MKL) for linear algebra routines. It's BigInt and BigFloat also relies on C libraries. Cross-language interop will always be needed because new languages will be invented and old libraries are reused. The inter-library optimization will always be stopped by the inter-language barrier.

Going back to the ODESolver example in the article, that works great with Julia because someone has written DifferentialEquations.jl in Julia. If someone write an ODESolver in Numba, it will optimizes nicely as well.

Besides rewriting the entire system in one language, another point of efficient language interop is at the C level. The scipy.integrate.quad can take a C callback to avoid the Python context-switch overhead. With Numba, one can compile a Python function into a C callable with @cfunc (see https://numba.pydata.org/numba-doc/dev/user/cfunc.html#example).

I guess dynamic dispatch is advised for libraries. But you want to write an optimized app you should annotate types explicitly so that the choosing is handled on compile time. The compiler can also be made intelligent so it resolves common cases automatically. I didn't use Julia much but I guess they try to do something like this.

So if I understood right the whole program optimization is a problem of:

  1. Single language packages ecosystem. And this language should be a fast one. Julia does great here. Best and far ahead of others actually.
  2. Inevitable interoperation with BLAS libraries. E.g. how fast jit compiled code can interop with such libraries. That's the inresting point. If there's a way to make make it significantly faster than interpreter way of interop then Julia claims seems to be well founded. If not then...

But is there a way to make if faster? Did Julia team succeed in this?

I'll add my two cents as someone who switched from Julia to Python (I build core infrastructure though, not numerical code).

  • Julia is a great research project that shows how far dynamically dispatched compilation can go. However, there is no reason these ideas can't be ported to Python to incrementally improve it, and Numba is a great vehicle for this effort.
  • Julia's long compilation time was a major issue for us in many cases. The power of Python's interpreter was a relief (which is often underestimated)!
  • It takes a very long time for a language to mature. My guess is that it will take Julia another decade to be reliable enough for wider adoption.
  • Julia does have great language features not available in Python (e.g. first class Array syntax). However, they don't seem worth switching to me.

@sklam

Cross-language interop will always be needed because new languages will be invented and old libraries are reused. The inter-library optimization will always be stopped by the inter-language barrier.

Besides rewriting the entire system in one language, another point of efficient language interop is at the C level. The scipy.integrate.quad can take a C callback to avoid the Python context-switch overhead. With Numba, one can compile a Python function into a C callable with @cfunc

That's an intersting point. It may also shed some light on Julia future (as a guess). If the best tested math libs would always be written in C++/C/Fortran then the right strategy (even in reseach that heavily uses new algorithms) would be to use wrappers to these libraries (than use some less well tested implementations). And so the whole benefit of composing libraries in Julia falls apart and is not really relevant.

But if the quality of math libs in Julia would be comparable with C++/C/Fortran libs then the ease of use of one single language would (or may?) still matter...

But I'm not really convinced that joining in Python interpreter is that worse than joining Julia way. Is there really other use-cases except callbacks (or other cases when a library that implements some algorithm exposes an API that asccepts a function)? If that's the only major use-case then it's certainly better to go @cfunc way and create more interface conveniences (and encourage libs and modules developers to expose C callback API). If callbacks are the only use-case then joining in Numba is as good as joining in Julia.

Some relevant discussions here: https://news.ycombinator.com/item?id=17204750

For example this:

One of the main issues is that, because you compile things separately with a context-switch managed by python in the middle, if you pass a Cython compiled function to code that is calling compiled code (C/Fortran/etc.), then you still get a huge overhead. We tested this with ODE solvers, and things like Numba+SciPy odeint were still about 10x slower than they should be because of this phenomena ([1] mentions some of our tests). In the end, we found a very good reason to make sure the whole stack can compile together!
If I understood right then the problem is that lots of libraries expose python API instead of Numba API. If ODE solver would export Numba function instead of pure python function then it would be fast? Or I'm wrong and the situation is more complex? But if I'm right then it's a package ecosystem problem (that is solved in Julia).

More to the picture: the problems with building package ecosystem that can rival Julia's include Cython vs Numba battle. Like in this issue. When Python is fragmented Julia is unified and is made to be a convenient place for ecosystem contributors.

The Julia community continues to work towards making libraries written in Julia easily callable from Python. This requires a lot of effort today, but is already possible. Imagine having Julia libraries compile into a .so with C-callable entry points that can be loaded into python (example: diffeqpy)or R (example: MixedModels) or just about anything. For libraries that we can statically analyze, it will even be possible to excise the JIT altogether. Perhaps there will be a "Juthon" that complements Cython.

With 1.0 released last year, the language is stable, and that makes it even easier to undertake such projects, with the knowledge that libraries written in Julia will continue to be supported for a long time.

While Julia is certainly not as widely used as python/numpy, it has grown a lot and the stats may speak for themselves.

@ViralBShah I guess the best user interface would be a transpiler from subset of Python (and special ~pythonjulia module).

It should both be:

  • a standalone transpiler so that the user can write Juthon package and contribute it to the ecosystem by transpiling.
  • and runtime decorator style transpiler (with interface like Numba) that calls Julia from Python. So that a small piece of code can be added.

A good example of such transipiler is Transcrypt that is a valid Python prior transpiling.

@sklam Do you think it's possible to get a compatible ecosystem of python packages that expose API at C level so that packages can be combined in jit compilation (both export C functions and accept)? C is universal to Cython, Numba and C++/C backends.

I'm no expert and I have worries that the fact that already precompiled libraries are compiled together can lead to instability or simply performance degradations as they cannot be optimized by compiler together (degradation in comparison with compiling the whole thing from sources).

I understand that mass adoption of Numba would be better but I guess Python ecosystem fragmentation won't allow it...

UPD: some Reddit discussion

Answering my own question. There is yet another example of using low level C interfaces in Python: https://stackoverflow.com/a/51157909/9071377

And reminding Numba documentation: https://numba.pydata.org/numba-doc/dev/user/cfunc.html#example

So this is how performant combining of python math libraries looks like. And alternatives are unlikely.

Hopefully with Numba development dealing with such interfases can become more elegant.

UPD:
importing-cython-functions
https://github.com/numba/numba/issues/3086

What about memory usage and garbage collection?
How do they compare when dealing with large datasets?

@sklam You wrote that "is not a tracing JIT" but "on-the-fly AOT". I am aware of the difference of JIT and AOT - numba documentation says that it ofers those two - but not about "tracing JIT" and "on-the-fly AOT". Can you elaborate?

Was this page helpful?
0 / 5 - 0 ratings