Lightning: Anormal CPU consumption

Created on 20 Aug 2019 · 28Comments · Source: ElementsProject/lightning

When I develop for BTCPayServer, I use a docker-compose which run all the dependencies needed for a comfortable dev environment on regtest. (on 0.7.2)

In this docker-compose I have two clightning: A and B where A open channel to B and send payment to B. (never the reverse)

I noticed that A was taking 100% of my CPU while being idle. Everything seemed to works fine, just that the CPU was strangely high.

I have not yet tried to update to 0.7.2 in production, so I don't know yet if this is a general problem, if this is specific to 0.7.2, or specific to regtest and my own setup.

I thought reporting it in case somebody could reproduce.

voodoo

Source

NicolasDorier

Most helpful comment

I finally took a look at the issue, and it seems like _SC_OPEN_MAX in docker images is huge:

#include <stdio.h>
#include <unistd.h>
int main() {
  int max = sysconf(_SC_OPEN_MAX);
  printf("_SC_OPEN_MAX %d\n", max);
  return 0;
}

This returns the following:

| Context | Value |
|---------------|---------|
| Host (User) | 1024 |
| Host (root) | 1024 |
| Docker (root) | 1048576 |
| Docker (user) | 1024 |

So no wonder that got a lot slower, we are issuing 1024x syscalls (close()) on each subprocess (in particular calling bitcoind for polling). So my suggested quickfix would be not to run as root in docker (might be a good idea anyway).

@NicolasDorier would that be an option?

cdecker on 24 Sep 2019

👍2

All 28 comments

Ok it did not happened to me only.

https://github.com/btcpayserver/btcpayserver/issues/978

I have good tracing tool, will eventually figure out the issue, but there is a bug somewhere in 0.7.2

NicolasDorier on 21 Aug 2019

@rustyrussell @cdecker @ZmnSCPxj here are some flamegraph I took:

https://aois.blob.core.windows.net/public/1.svg

https://aois.blob.core.windows.net/public/2.svg

NicolasDorier on 21 Aug 2019

I tested again with 0.7.1, no problem. In prod, 0.7.2 does not show this problem, seems to be only regtest 0.7.1.

NicolasDorier on 21 Aug 2019

Can you also make flamegraph for 0.7.1? Thanks.

The flamegraphs for 0.7.2 seem to blame something called _arch_local_irq_enable, I am uncertain if this is relevant or not.

ZmnSCPxj on 21 Aug 2019

@ZmnSCPxj it appears relevant https://aois.blob.core.windows.net/public/3.svg

Note that on 3.svg, most of the traces are inside bitcoin-cli, not inside lightningd, because it's mainly idle.

NicolasDorier on 21 Aug 2019

😕1

Ouch. I have no idea what _arch_local_irq_enable is, it is nowhere in our codebase. Looks like digging through the libc and/or kernel is in order ....

ZmnSCPxj on 21 Aug 2019

@ZmnSCPxj unsure if it can help you our dockerfile is the same as yours outside stuff for tracing https://github.com/btcpayserver/lightning/blob/master/Dockerfile

NicolasDorier on 21 Aug 2019

Might it be the fact we run on alpine... but strange the problem was not here before. And no idea about __clone ?

NicolasDorier on 21 Aug 2019

__clone either means fork or pthread_create, and since we mostly use fork (and we use it a lot) it is likely just fork. Unless some library we use actually uses pthread_create underneath. And since __clone is likely to reset the stack for the new process that is likely the reason why you do not see it attributed properly as to who called the fork, because the information is erased by __clone clearing the stack.

I think there was a recent thing where we had to close fds when starting plugins, lemme try to dig that...

ZmnSCPxj on 21 Aug 2019

Could you check before and after this commit to see if this introduced the slowdown? https://github.com/ElementsProject/lightning/commit/25ddb8082369451024f471bc2d77fa7e7207cc47

If not then maybe bisect from there, haha...

ZmnSCPxj on 21 Aug 2019

@ZmnSCPxj or maybe I can just remove stack clearing with some magic compile flag?

NicolasDorier on 21 Aug 2019

testing without the commit

NicolasDorier on 21 Aug 2019

@ZmnSCPxj that's an incredible nose for smelling bugs that you have. Reverting this commit fix the issue. But why I can't see a problem in prod? Maybe because my node has no connection.

NicolasDorier on 21 Aug 2019

@ZmnSCPxj or maybe I can just remove stack clearing with some magic compile flag?

No, clone is a Linux kernel thing. You have to modify the kernel, not just C-Lightning.

@ZmnSCPxj that's an incredible nose for smelling bugs that you have.

Thank you very much. Principle is new code is always suspicious.

The other clue is that the flamegraph partially blames close from within the __clone call. And the commit in question adds a loop of close calls after fork but before exec.

Reverting this commit fix the issue. But why I can't see a problem in prod? Maybe because my node has no connection.

I am unsure. The code is correct functionally. Time to bring in @rustyrussell .

It could be that we are now doing The Right Thing but we erroneously did not do this before. Maybe the extra CPU load is fine?

ZmnSCPxj on 21 Aug 2019

I have gone back to 0.7.1 for our tests. I keep 0.7.2 in production, will report if user complains.

NicolasDorier on 22 Aug 2019

Will check the value of _SC_OPEN_MAX

NicolasDorier on 22 Aug 2019

so I tried to debug with printf by logging before and after the call.

Sadly nothing get printed. :(
Let me know if I can do something to help debug this.

NicolasDorier on 22 Aug 2019

Does the wizard can take a look at the voodoo? or let me know anything I can do to help?

NicolasDorier on 26 Aug 2019

Get me a replacement for my day job?

Sorry, this is some rather low-level OS stuff. The code added in the commit is correct, from what I know of Linux and POSIX. The extra CPU load might be acceptable. We also do the same code when starting a subdaemon. The difference is now we do the Right Thing (TM) on a bunch of other calls, including bitcoin-cli, which I suspect is the main cause in your particular case. Ping @rustyrussell again as he has more OS-level knowledge I suspect.

ZmnSCPxj on 27 Aug 2019

mmh maybe another issue might be that the docker is on musl (fetching glib from another source).
I vaguely remember complains about musl on some low performance of some low level stuff like this.

Might try to switch to another base image.

NicolasDorier on 28 Aug 2019

The added loop is a loop on calls to close, and I find it hard to imagine that such a simple syscall could have additional overhead from the C library. Still, it does not hurt to try.

We also now call into bitcoin-cli more often, and I also noticed that we recently greatly increased our running time in our Travis tests, so it seems to me likely, that these changes in total greatly increased our run time for calling into bitcoin-cli (more calls into bitcoin-cli, and more overhead per call due to closeing unused fds before calling into bitcoin-cli). So it might not be just specific to your environment.

ZmnSCPxj on 28 Aug 2019

I tried moving the docker from alpine to stretch. Same issue.

NicolasDorier on 29 Aug 2019

I finally took a look at the issue, and it seems like _SC_OPEN_MAX in docker images is huge:

#include <stdio.h>
#include <unistd.h>
int main() {
  int max = sysconf(_SC_OPEN_MAX);
  printf("_SC_OPEN_MAX %d\n", max);
  return 0;
}

This returns the following:

| Context | Value |
|---------------|---------|
| Host (User) | 1024 |
| Host (root) | 1024 |
| Docker (root) | 1048576 |
| Docker (user) | 1024 |

@NicolasDorier would that be an option?

cdecker on 24 Sep 2019

👍2

mmh I don't know, I can try. The big problem docker and permission is that images which use non-root users can't share data via volume sharing with other containers.

The c-lightning container is sharing different directory:

The tor hidden_service directory to get the onion address
The bitcoin data directory to get the cookiefile for authenticating on RPC
The lightning-rpc that btcpay container is using

Tor and Bitcoin containers are using their own non root user with different UID, and btcpay container is using root user.

I don't want to make change to several different images to fix this situation, as this can be a can of worms. Fixing one image which also share data might make another image fail, which need me to fix etc... and I am myself not maintainer of some images.

I can probably work around 2. by not relying on cookie auth for rpc authentication, and use a fixed password to RPC (we do this when the user select LND because LND cache the cookiefile value, and end up with bad data in cache if bitcoind reboot), I don't like this because all bitcoin RPC in BTCPay will have the same password.... We don't expose it on internet, but still I don't like it.

I think 3. should not be a problem, because this btcpay container runs as root, it will still have access to lightning-rpc.

For 1. I can fix it by using the same UID between Tor container and c-lightning.

On top of this I will need to change the permissions of all files in the c-lightning data directory, because existing user will have the files owned by root. Maybe it is easy, maybe not. I can try...

NicolasDorier on 25 Sep 2019

Frankly for all those problem I wonder if it is not just easier to cap the max variable to max 1024

NicolasDorier on 25 Sep 2019

Frankly for all those problem I wonder if it is not just easier to cap the max variable to max 1024

That wouldn't really work: though unlikely, a process might open >1024 fds, since running under root allows us to do so, and then only closing the first 1024 would leave the remainder lingering.

So far I only found one way that this works in docker reliably: run the container with --ulimit nofile=1024. This is also supported in docker-compose files. I'm still trying to see if there is a way to build this limit into the docker image itself, since having everybody add that will lead to some confusion, but I think for your use-case that is a solution.

cdecker on 25 Sep 2019

Ok, I found another variant that might work: limit ourselves to only ever open up to 1024 fds, so we don't have to close() anything beyond that range.

cdecker on 25 Sep 2019

Thanks a lot!

NicolasDorier on 26 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings