Circuitpython: nRF Port: Enable @micropython.native, @micropython.viper and @micropython.asm_thumb

Created on 7 Oct 2018  路  19Comments  路  Source: adafruit/circuitpython

These decorators are immensely useful - they allow optimizing hot paths in the code and remove performance bottlenecks.

I conducted a little research yesterday, and it seems like these decorators work pretty well. I was able to speed up a piece of code by a factor of two just by decorating it with @micropython.native, and then with viper, I got it to run 10 times faster.

I also experimented with @micropython.asm_thumb, and after sending a PR to fix a few build issues (#1244), I was able to successfully run Thumb code inside python that would blink an LED on the nRF52840 dongle.

advanced api enhancement

Most helpful comment

Done.
Doubled the speed.

All 19 comments

Any idea what the code size impact of this? I think the nrf has plenty of space so its probably fine regardless.

Which of these decorators lead to invalid python? I'm wary of the asm and viper options because it can lead to code that is not CPython compatible and not portable. Its important to keep portability because we're using our libraries in CPython on the raspberry pi.

Our usual solution to speeding up hot paths is to introduce C modules to do the work instead. The iteration cycle for this is slower than these decorators and requires dedicated flash space. However, it ensures that the Python stays simple and that the API boundary to the time critical code is well defined and documented (in shared-bindings).

I experimented with both native and viper yesterday, because I had some code that generates patterns for neopixels, and it was running too slow.
The biggest leap in speed I got by converting everything to integer. After that, neither native nor viper didn't bring any further improvement, but also didn't force me to write incompatible code.
But surely your code will look more c like, and less pythony.
I think that the bottleneck lies in the neopixel routines, and not in my code, hence the lack of speed gain.

I think that even Python programmers have to access the hardware directly here and there, especially on microcontrollers, and therefore I welcome asm_thumb. Sure, you can do everything in C, but not every Python programmer is willing to do that, only to manipulate some registers here and there.
And even in C, you have to use inline assembler occasionally, so why not also in python?

Our neopixel library is known to be a bit slow. We have partially complete PixelBuf work that is a C helper to speed up all RGB led drivers.

I don't think many Python programmers will jump from Python to assembly without knowing C first. Just like supporting direct memory access isn't a priority for us, supporting assembly isn't either. It's a very advanced feature that is not useful to beginners. However, I'm always happy to help someone else add and support a feature like it. So, let me know if you'd like to help!

Thank you @tannewt !

A few notes:

@micropython.native gave me about 2x performance improvements without any code changes. It doesn't allow invalid python - but you are restricted to a subset of the language: with is not supported, generators are not supported, and raise must be used with an Argument. But you can only decorate specific hot paths, so it is opt-in at the function level.

@micropython.viper allows a non-standard syntax, which is an extension of the Python syntax, some of it (the type annotations) are actually defined in a PEP.

As far as code size, the final .uf2 file grew from 448kb to 495kb when enabling all three decorators (native, viper, and asm_thumb).

As far as adding support - it is already there, it is just a matter of enabling it in the port configuration file.

Enabling @micropython.native is a pure win - you get performance, it is opt-in, and users still write only standard python code.

Personally, the use case that I see for viper and arm_thumb is a user who writes an optimized driver for a specific device and wants to share it with other users. In the current situation, they will have to clone the CP repo, apply patches, build the relevant port and flash to their board. This means most beginners won't just be able to download such a driver and use it. If we have viper/arm_thumb, device drivers could be written and distributed as Python files, so everyone can download and use the driver immediately.

@uhrheber Lovely, thanks for sharing!

Try to cache the ptr32(0x50000508) calls (save them to a local variable), it can offer a nice performance gain. I'd also try to do it with the uint(0x80000000) constant.

You can probably try a similar optimization for the assembly loop - instead of loading the addresses into r1 every time, you can preload them to r0 and r1 and then just keep the str instructions inside the loop (you can also use a single base address and then just change the offset when calling str)

Done.
Doubled the speed.

@uhrheber awesome! I imagine you could squeeze even more speed by unrolling the loop (i.e. repeating the str instructions 4 or 8 times in each iteration)

That would be the way to go for device drivers written in assembler, like fast bitbanged SPI, or such.

@urish Thanks for the info! What happens when something like a with is used within a native statement? How does it error out?

Your argument for native makes a lot of sense. I'm worried about viper because we want to be a subset of CPython since our libraries also run in C python. That is my worry the assembly version as well. It won't be portable. I'm ok adding the latter two as long as they error in a friendly way on raspberry pi in CPython and don't take too much code space.

When measuring additional code size can you use the number output by the build? (This is output on the atmel build. So it may be missing from the nrf build.) The uf2 has a bunch of metadata in it that makes it hard to tell exactly what the impact is. Here is an example:

26688 bytes free in flash out of 253696 bytes ( 247.75 kb ).
24456 bytes free in ram for stack out of 32768 bytes ( 32.0 kb ).

Thanks!

This is done in #2271. Boards can now enable the decorator by adding CIRCUITPY_ENABLE_MPY_NATIVE to mkboardconfig.mk.

@dhalbert could we close this?

@theacodes I'll revise the title, since the other decorators are still not turned on.

viper is turned on as well.

I think asm_thumb should work, too, but I didn't test it.

ah, ok - could you do a quick test that it's on?

perhaps tonight. I'll report back.

For my own reference (and others, if you're curious):

CIRCUITPY_ENABLE_MPY_NATIVE turns on MICROPY_EMIT_THUMB and MICROPY_EMIT_INLINE_THUMB.

micropython.native needs MICROPY_EMIT_NATIVE.
micropython.viper needs MICROPY_EMIT_NATIVE.
micropython.asm_thumb needs MICROPY_EMIT_INLINE_ASM.

// Convenience definition for whether any inline assembler emitter is enabled
#define MICROPY_EMIT_INLINE_ASM (MICROPY_EMIT_INLINE_THUMB || MICROPY_EMIT_INLINE_XTENSA)

OK, so it's on, so this can be closed! I thought these were more separated than they are.

Confirmed, asm_thumb works fine.

Was this page helpful?
0 / 5 - 0 ratings