The default ARMCC compiler optimization strategy (-Otime) is not suitable for everyone. Developers should have a easy way to select the desired optimization target and even level of optimization. This could be done eg. via "mbed compile -o opt_size" or similar switch. Preferably the optimization should be selectable per component, but even the global flag would help a lot.
Similar option could be used to select also the optimization level on debug builds, as the code compiled with "-O0" does not necessary fit into flash, but if "-O1" is used, the code may fit.
Just for reference, our test application with default compiler flags produces this output:
| Module | .text | .data | .bss |
+-----------------------------+--------+-------+-------+
| Misc | 59006 | 40 | 4616 |
| features/FEATURE_CLIENT | 76323 | 23 | 40 |
| features/FEATURE_COMMON_PAL | 28158 | 163 | 10392 |
| features/frameworks | 4224 | 572 | 336 |
| features/mbedtls | 97069 | 80 | 0 |
| features/net | 39874 | 281 | 51040 |
| hal/common | 3828 | 36 | 148 |
| hal/targets | 17834 | 76 | 136 |
| rtos/rtos | 200 | 4 | 0 |
| rtos/rtx | 8376 | 92 | 4220 |
| Subtotals | 334892 | 1367 | 70928 |
+-----------------------------+--------+-------+-------+
And by changing the "-Otime" to "-Ospace" on mbed-os/tools/toolchain/arm.py:37 the same code compiles into +36KB smaller binary.
| Module | .text | .data | .bss |
+-----------------------------+--------+-------+-------+
| Misc | 56366 | 40 | 4616 |
| features/FEATURE_CLIENT | 68903 | 23 | 40 |
| features/FEATURE_COMMON_PAL | 24196 | 163 | 10392 |
| features/frameworks | 3500 | 572 | 336 |
| features/mbedtls | 83941 | 80 | 0 |
| features/net | 34578 | 281 | 51040 |
| hal/common | 3372 | 36 | 148 |
| hal/targets | 14696 | 76 | 136 |
| rtos/rtos | 194 | 4 | 0 |
| rtos/rtx | 7302 | 92 | 4220 |
| Subtotals | 297048 | 1367 | 70928 |
+-----------------------------+--------+-------+-------+
GCC has already been recently switched to -Os (space), so this setting is now inconsistent.
https://github.com/ARMmbed/mbed-os/pull/2185
I note also that IAR is set to "-Oh" (high, balanced) - it should presumably be "-Ohz" (high, favoring size).
This is a HUGE breaking change as it impacts the way inlining works in the optimizer. Modifying optimization build options for code size reduction is not a path we're currently considering.
Seems the largest size reduction is in TLS. Guessing performance would probably be a considerable hit due to reduction in inline optimization?!?
This type of change has previously broken user applications that bit-bang things too.
Well, yes, it could break stuff, but any such sensitive stuff is prone to being broken by all sorts of other things - turning on debug, changing toolchain, higher interrupt load. It's effectively unstable code. Objection wasn't raised on #2185.
I can see the counterargument that you may not want to always be going for "smallest possible on this compiler", but you'd rather aim for "best speed you can do for a standard code size".
And the current trio of "ArmCC optimises for time, IAR optimises balanced, GCC optimises for space" manages to end up with basically the same code size on all 3, but best performance on ArmCC.
GCC is optimising for space because it needs to to squeeze into the same flash as IAR, and ArmCC is optimising for time because it's got spare space.
IMHO, if the code is so fragile it works only by certain optimization mode it or the compiler should be fixed. I am not suggesting one to change the default, just make it possible for developer to use option which suits their use case.
The performance of mbedtls might not be that bad, as it already has the assembler optimizations.
And the current trio of "ArmCC optimises for time, IAR optimises balanced, GCC optimises for space" manages to end up with basically the same code size on all 3, but best performance on ArmCC.
Thanks for pointing this out. We should look at general common profiles that can be applied across these compilers.
I am not suggesting one to change the default, just make it possible for developer to use option which suits their use case.
Perfect, that is what the exporting system is for. Developers can modify makefile and project files in any way that suits their needs for the time being.
referring to https://github.com/ARMmbed/mbed-os/issues/2591, I assume that when we get exporting system back on track there is regression tests for all of them to keep them on track
I opened a pull request which implements the size optimization on ARMCC and IAR on a new profile (size.json). There is also size optimizations for debug target (debug-size.json) in same PR.
https://github.com/ARMmbed/mbed-os/pull/2973
now that we have the PR, can we close this issue?
Most helpful comment
IMHO, if the code is so fragile it works only by certain optimization mode it or the compiler should be fixed. I am not suggesting one to change the default, just make it possible for developer to use option which suits their use case.
The performance of mbedtls might not be that bad, as it already has the assembler optimizations.