Riot: IOTA pkg implementation and trouble with stack_overflows

Created on 29 Sep 2018 · 25Comments · Source: RIOT-OS/RIOT

Hey,
I am implementing a IOTA light wallet for a while and since RIOT is my preferred OS, I used it as underlying RTOS.

Hardware:
STM32F1 aka bluepill

But I have a problem.
While this implementation works fine:
https://github.com/Citrullin/RIOT/blob/iota_old_implementation/examples/iota_transaction_light/main.c

This one doesn't and I get sometimes stack_overflows and sometimes it just stops without any error.
https://github.com/Citrullin/RIOT/blob/iota_new_implementation/examples/iota_transaction_light/main.c

The main difference between these two implementation is the usage of global variables as buffer instead of declaring it within the functions. But both should be on the stack, as understand it, so that shouldn't make a difference. But I am not sure how RIOT is handling this.

I also created a example for a Linux environment: https://github.com/Citrullin/iota_microwallet_debug_example
Just clone, update (init) the submodule, execute ./compile and ./main. This uses the new implementation and works fine.

So, how and where do I need to increase the stack size for my package?

I think the new implementation is way cleaner. So I prefer to going with that. But since my C knowledge is not really good I have still some trouble debugging in a good way. Something I need to work on.

core pkg question

Source

Citrullin

All 25 comments

You should of course first of all determine what is causing the stack overflow (GDB + compiling with CFLAGS_OPT = -O0 + whatever debug interface the bluepill offers [sorry really don't know; but usually the build system provides this and a GDB instance with make debug] and some metaphoric elbow grease might help with this). If that really is non-negotiable to optimize you can increase the stacksize by enlargen the stack array of your thread and setting the stacksize parameter for the thread_create() function accordingly. However, as far as I can see you are just using the main thread in your example application. In that case, adapt the THREAD_STACKSIZE_MAIN macro within your application's Makefile (for stm32 you should select something larger than 1024+512):

CFLAGS += -DTHREAD_STACKSIZE_MAIN=2048

You can use the ps command (given you have a shell, you include the ps module into your application and compile with DEVELHELP=1) to check the stack usage (alternatively use call ps() in GDB if you don't want to include a shell into your application). If the output is broken you most likely have already a stack overflow (because it overwrote the TCB of another thread, so setting the STACKSIZE macro to some insane value first might be helpful to get a good guesstimate via ps() for the actually required stack size.

miri64 on 29 Sep 2018

(for alignment reasons the stack size should be divisable by 4 [on Cortex-M])

miri64 on 29 Sep 2018

Another tip: most stack overflows are caused by either having a too deep function call hierarchy (e.g. by recursion, but it can also happen by carelessly dangling from one function to the next) or by putting really big objects onto your stack (i.e. within the scope of functions). I did not really look into your code, but if you allocate some large buffers on your stack, reconsider if you can't make them static global variables instead). If necessary you can protect them for multi-access with mutexes or the like.

miri64 on 29 Sep 2018

Another tip: most stack overflows are caused by either having a too deep function call hierarchy (e.g. by recursion, but it can also happen by carelessly dangling from one function to the next) or by putting _really big_ objects onto your stack (i.e. within the scope of functions). I did not really look into your code, but if you allocate some large buffers on your stack, reconsider if you can't make them static global variables instead). If necessary you can protect them for multi-access with mutexes or the like.

I would say that's the point here. The buffer for the tx alone is nearly 3kb. If I use a static variable instead, isn't it blocking this space in the memory with this global variable forever? If I declare it within the function, this buffer space is available after the function was executed. Do I get that correct?

Citrullin on 30 Sep 2018

Ok, now I lgot curious and looked at it ;-) Right in the main:

https://github.com/Citrullin/RIOT/blob/66b00dca66c08a855b959788a5969a983f4ffb30/examples/iota_transaction_light/main.c#L89-L99

All those char arrays are just used by the application so by moving them all to static (global) memory should be safe and you would save 210 bytes of stack space.

miri64 on 30 Sep 2018

Ok, now I lgot curious and looked at it ;-) Right in the main:

https://github.com/Citrullin/RIOT/blob/66b00dca66c08a855b959788a5969a983f4ffb30/examples/iota_transaction_light/main.c#L89-L99

All those char arrays are just used by the application so by moving them all to static (global) memory should be safe and you would save 210 bytes of stack space.

So I declared it outside of the function

char address_from[81];
char address_to[81];
unsigned char seedBytes[48];

int run_bundle_creation(void){ }

Now this other issue appears. It just stops after the log messages:

main(): This is RIOT! (Version: 2018.07-devel-1344-g66b00-citrullin-ThinkPad-X220-iota_new_implementation)
IOTA Wallet Application
=====================================
Create IOTA transactions bundle...
Prepare transfer...

Citrullin on 30 Sep 2018

I would say that's the point here. The buffer for the tx alone is nearly 3kb. If I use a static variable instead, isn't it blocking this space in the memory with this global variable forever? If I declare it within the function, this buffer space is available after the function was executed. Do I get that correct?

Yes, but the stack would also block the memory space forever if you increase its size. Worse even: if two threads want to use your lib they both would require extra large stack space, while with a global buffer (and some clever exclusion) you only would need the memory for the buffer once. In general for embedded programming: everything larger than a few (my rule of thumb everything that is a struct or array > 20 bytes, though that is already quite expensive) should go into global memory.

miri64 on 30 Sep 2018

Now this other issue appears. It just stops after the log messages:

I wrote this before you informed me that your stack is occupied with 3kb. So this is no surprise and most likely still a stack overflow ;-)

miri64 on 30 Sep 2018

😄1

I would say that's the point here. The buffer for the tx alone is nearly 3kb. If I use a static variable instead, isn't it blocking this space in the memory with this global variable forever? If I declare it within the function, this buffer space is available after the function was executed. Do I get that correct?

Yes, but the stack would also block the memory space forever if you increase its size. Worse even: if two threads want to use your lib they both would require extra large stack space, while with a global buffer (and some clever exclusion) you only would need the memory for the buffer once. In general for embedded programming: everything larger than a few (my rule of thumb everything that is a struct or array > 20 bytes, though that is already quite expensive) should go into global memory.

Okay, that's a valid point. But then it's kind of dangerous, since they both write the same buffer. So if I don't take care of this, they both override each other in worst case. Do you have any tips to avoid that? I would think maybe about a struct and using IDs and waiting until the buffer is free or something in this direction.

Citrullin on 30 Sep 2018

I would say that's the point here. The buffer for the tx alone is nearly 3kb. If I use a static variable instead, isn't it blocking this space in the memory with this global variable forever? If I declare it within the function, this buffer space is available after the function was executed. Do I get that correct?

Yes, but the stack would also block the memory space forever if you increase its size.

I don't understand why this is the point. I mean the stack is just blocked while the function is executed. After the execution, the stack should be free. I currently also don't get the benefit of filling the global memory with used space, even it is not needed for the whole time.

Citrullin on 30 Sep 2018

Okay, that's a valid point. But then it's kind of dangerous, since they both write the same buffer. So if I don't take care of this, they both override each other in worst case. Do you have any tips to avoid that? I would think maybe about a struct and using IDs and waiting until the buffer is free or something in this direction.

That's what I meant with "and some clever exclusion". In RIOT we have mutexes for that. They are pretty similar to Python's Locks (in case you are more familiar with them). Since you are working from a package you either need to ensure the package provides a similar mechanism itself, or some wrapping mechanism around the OS's functionality. lwIP e.g. does this by providing some prototypes the OS then needs to provide an implementation for:

https://github.com/RIOT-OS/RIOT/blob/b78f78d3a172957f7f57233fd1b984733e8c39a0/pkg/lwip/contrib/sys_arch.c#L44-L63

I would say that's the point here. The buffer for the tx alone is nearly 3kb. If I use a static variable instead, isn't it blocking this space in the memory with this global variable forever? If I declare it within the function, this buffer space is available after the function was executed. Do I get that correct?

Yes, but the stack would also block the memory space forever if you increase its size.

I don't understand why this is the point. I mean the stack is just blocked while the function is executed. After the execution, the stack should be free.

Nope. The memory space for a thread's stack in RIOT is allocated at link-time (actually already on compile time on compile target object level) within the .bss segment of the binary depending on the size of the stack. So that memory isn't available for other things. Remember, we are on embedded hardware here with constrained memory available, so "growing" and "shrinking" memory regions are a luxury we don't have and we need to be deterministic with our memory usage.

miri64 on 30 Sep 2018

👍1

Remember, we are on embedded hardware here with constrained memory available, so "growing" and "shrinking" memory regions are a luxury we don't have and we need to be deterministic with our memory usage.

(same reason why malloc() or any other form of dynamic allocation is discouraged for embedded programming).

miri64 on 30 Sep 2018

👍1

To quote Master Yoda roughly: "Segmentation leads to fragmentation, fragmentation leads to missing space, missing space leads to insufferable pain" ;-).

miri64 on 30 Sep 2018

😄1

Remember, we are on embedded hardware here with constrained memory available, so "growing" and "shrinking" memory regions are a luxury we don't have and we need to be deterministic with our memory usage.

(same reason why malloc() or any other form of dynamic allocation is discouraged for embedded programming).

Okay, I can imagine that point. The compiler cannot predict stuff which is dynamically happening. E.g. incoming packages from some network and reacting on this. Okay, now it makes also sense in my head. Thanks for the explanation :)

Citrullin on 30 Sep 2018

To quote Master Yoda roughly: "Segmentation leads to fragmentation, fragmentation leads to missing space, missing space leads to insufferable pain" ;-).

So that wouldn't happen with a MMU and malloc, right?

Citrullin on 30 Sep 2018

Is that even really possible on a MCU without MMU? Or is RIOT handling this fragmentation then?

No it doesn't and that's exactly the point!

miri64 on 30 Sep 2018

👍1

(in networking we e.g. just expect that packets won't stay in memory for long (go from device to app, done; or vice-versa), so fragmented space is free'd quickly.)

miri64 on 30 Sep 2018

👍1

Okay. So one question for Threading then. So, a Thread blocks a specific size of memory and after execution this space will be free'd?

Just have a look at the thread doc. You will notice that the stack is expected to be provided to the thread_create() function as an array. The usual approach is to use an array within static memory here. So the space won't be free'd after finishing the thread, but it could be re-used (just make sure the thread really finished, otherwise you would override a running thread's TCB which is located at the very bottom of the stack). I've ran into a similar problem with lwIP because lwIP assumes the stacks are allocated dynamically. RIOT basically assumes (and this is true for most applications), that threads have an endless loop that does things. For lwIP this is true as well btw, so the question stated in the TODO there is more academical in nature than actually a concern.

So every package get's it's own Thread stack? And this is a fixed size?

That really depends on the package. If the package is only providing a library supposed to be called by some application or module then the answer is most definitely "no". If it is some application running on its own or a system component that runs in the background or provides its own process hierarchy, as e.g. lwIP, then you can say "yes". However, especially for applications the "running in its own thread" part should still mostly be controlled by the RIOT application (so provide at least a run_wallet_app() function or something), so the user can decide in which thread to run and you don't have the hassle wrapping all of the OS's thread functionalities. An example application would be looking e.g. like:

```C

define WALLET_STACK_SIZE (3 * 1024)

...
*/

static wallet_stack[WALLET_STACK_SIZE];
/*

...
*/

void wallet_thread(void *arg)
{
(void)arg;
run_wallet_app();
/ not expected to be reached */
return NULL;
}

int main(void)
{
thread_create(wallet_stack, WALLET_STACK_SIZE, /* ... /, wallet_thread, / ... */);
return 0;
}

miri64 on 30 Sep 2018

A much simpler alternative with that approach which illustrates the user control I was talking about would be of course:

int main(void)
{
    run_wallet_app();
    /* never returns */
    return 0;
}

miri64 on 30 Sep 2018

@miri64 With your description, I think I go the just library way. That makes more sense in the context of the functionality. I have some ideas about having a layer on top on it, which provides a more module or application functionality. Thanks for answering all my questions! :) So, I close the ticket then, since you answered more than just this question. :) That's something I really like about this project. The amazing community & core contributors :)

Citrullin on 30 Sep 2018

Great! Glad I was able to help.

And oh, another way to handle such a situation I did not mention (because it only came to my mind now): have the buffer be a parameter to the called function. This way the allocation (and possibly also the protection of the buffer space) is up to the application and more portable to other (more promiscuous) OSs. Asymcute does exactly that with the buffer space for the MQTT-SN packets and their context.

miri64 on 30 Sep 2018

👍1

(also if the application developer knows they only have one calling thread to the library, they have the freedom to choose not to provide a synchronization mechanism, saving code size)

miri64 on 30 Sep 2018

👍1

Great! Glad I was able to help.

And oh, another way to handle such a situation I did not mention (because it only came to my mind now): have the buffer be a parameter to the called function.

I had this idea before, but haven't done it, because I thought after executing the function the space is free'd. But as you said in the embedded context not really a good idea. Yes, from this point of view, I maybe consider this. Ahh, yes, if I do so, I don't need the mechanism for managing the lock state etc., since the caller need to handle that. So, now I really think I go this way, since this is more the separation of concern design and should be handled by something like an OS. I could then write an API layer for RIOT in the package. I think that's a nice way.

Citrullin on 30 Sep 2018

@miri64 What do you think of providing two pkg then? One which has an abstraction logic with handling the lock state and another which is more lower level, where you can use your own buffer. Or is there a elegant way to do it just in one package without blocking the memory for applications using just the lower level API?

Citrullin on 30 Sep 2018

@miri64 What do you think of providing two pkg then? One which has an abstraction logic with handling the lock state and another which is more lower level, where you can use your own buffer. Or is there a elegant way to do it just in one package without blocking the memory for applications using just the lower level API?

I think that might be confusing. I propose to provide the lower-level package first, then you can think about maybe putting an abstraction layer around it (but I would rather think a module would be a better idea since this is very OS-specific). However, in my opinion that lock state logic should be something everyone using multi-threading should be able to know about, so I'm wondering if that layer (especially due to its specificality to one package) might not be overkill.

miri64 on 30 Sep 2018

Was this page helpful?

0 / 5 - 0 ratings