Actix-web: HTTP2 memory leak with many concurrent streams

Created on 28 Jul 2019 · 30Comments · Source: actix/actix-web

Take the static_index example from https://github.com/actix/examples

cd static_index
cargo run --release 2>/dev/null

And then run h2load -n 1000000 -c 10 -m 200000 http://127.0.0.1:8080/

When the benchmark is done, memory stays used in the server.

Source

leo-lb

Most helpful comment

Taking a quick look at this is seems that the issue you're seeing here is twofold:

1) Your system allocator is suboptimal, so even after h2load finishes a lot of memory is simply lost to fragmentation. If you switch to jemalloc your memory usage will drop by a lot.
2) There is some buffer/structure bloat in actix-web itself where even after the test finishes its internal structures (which were previously bloated due to a massive amount of requests) are not shrunk back to their normal size. I've attached a flamegraph (courtesy of my memory-profiler; GitHub unfortunately doesn't allow attaching raw svg so I had to gzip it - just unpack it and open in your web browser) where you can see around ~140MB that was kept allocated long after h2load finished.

koute on 24 Aug 2019

👍3

All 30 comments

Does it continue to grow on multiple runs?

fafhrd91 on 28 Jul 2019

@fafhrd91 Yes

leo-lb on 28 Jul 2019

i ran this on an up to date intel linux distro a dozen times with same params to h2load and memory usage grows to about 530MB and then subsequent runs don't increase memory usage.

rustrust on 31 Jul 2019

Here's a demonstration of the growing memory usage issue. (Fedora 30 with latest Linux kernel on IBM POWER9)

Peek 2019-08-01 00-29

leo-lb on 1 Aug 2019

@fafhrd91 I'm interested in helping out on this project. Can you assign me this issue?

schulace on 2 Aug 2019

@schulace sure. i looked into this, i am not sure where is the bug and is it a bug at all

fafhrd91 on 2 Aug 2019

@leo-lb can you please delete those comments and use something like a gist ? This is unreadable..

0xpr03 on 11 Aug 2019

valgrind:

full: https://gist.github.com/leo-lb/8ec427c3978e74eea78423c6ea514a9b
narrow biggest: https://gist.github.com/leo-lb/86856462eecab9b3a106b874c3322567

@0xpr03

leo-lb on 11 Aug 2019

👍1

Thanks, that looks way better. If you delete those massive comments now, it'll be much easier for all to read & keep an oversight of the thread.

0xpr03 on 11 Aug 2019

Taking a quick look at this is seems that the issue you're seeing here is twofold:

koute on 24 Aug 2019

👍3

i came to similar conclusion as @koute

fafhrd91 on 3 Sep 2019

Taking a quick look at this is seems that the issue you're seeing here is twofold:

Your system allocator is suboptimal, so even after h2load finishes a lot of memory is simply lost to fragmentation. If you switch to jemalloc your memory usage will drop by a lot.

There is some buffer/structure bloat in actix-web itself where even after the test finishes its internal structures (which were previously bloated due to a massive amount of requests) are not shrunk back to their normal size. I've attached a flamegraph (courtesy of my memory-profiler; GitHub unfortunately doesn't allow attaching raw svg so I had to gzip it - just unpack it and open in your web browser) where you can see around ~140MB that was kept allocated long after h2load finished.

BTW that can't be true. I'm having an unlimited amount of leaks here. My system has 64GB of memory. If I let it running for longer, it can start taking more than 60GB of memory and start causing the system to swap thrash, so no, it's not being lost in fragmentation.... and my system allocator is fine, the rest of my system runs fine.

@fafhrd91
And if there's some structure bloat to be solved, why would you close this issue? That's definitely an issue to be resolved.

Please re-open the issue.

leo-lb on 4 Sep 2019

It is not possible to fix without reproducible example.

fafhrd91 on 4 Sep 2019

It is not possible to fix without reproducible example.

I am confused, I posted the reproduction scenario in the first post, it reproduces 100% of the time for me, whether on x86 machines or my POWER9 machine.

This is a Resource Exhaustion DoS vulnerability.

leo-lb on 4 Sep 2019

I think @fafhrd91 means that your example relies currently on running a 64Gig RAM machine and spamming connections. Thus it is hard to reproduce (hardware and time wise) and far from a minimal, verifiable test case. So if you could break this down a bit more that would probably help a lot.

0xpr03 on 4 Sep 2019

BTW that can't be true. I'm having an unlimited amount of leaks here. My system has 64GB of memory. If I let it running for longer, it can start taking more than 60GB of memory

When I was testing this on my system that's not the behavior I got - it got up to a certain point and then stopped growing and stabilized. I'm guessing that's also what would have happened on your system, except probably due to the fact that you have a lot more cores than I do the server is being able to do more concurrent work, hence higher maximum memory usage.

and start causing the system to swap thrash, so no, it's not being lost in fragmentation....

How did you come to this conclusion? (: Whenever your system starts swapping or not has nothing to do with this. The term "lost to fragmentation" has a very specific meaning in the context of memory allocators, and in this case you can basically think of it as memory which is unused by the application but is still being kept allocated by the memory allocator itself either because of the allocator's limitations or due to bookkeeping constraints, so from the perspective of the operating system that memory is treated as allocated even though it's in reality unused.

and my system allocator is fine, the rest of my system runs fine.

That's not what I meant. If you're on Linux you're most likely using glibc, and glibc uses ptmalloc as its allocator, which uses sbrk to allocate memory, which makes it exhibit very pathological behavior in certain cases. Cases which usually look like the scenario you're testing here.

I recommend you do a couple of retests on your machine to investigate this further:

1) Limit actix-web so that it can use at most one or two cores simultaneously. (I'd probably be most effective to patch actix-web or something like that so that it thinks that there are only two cores instead of e.g. setting the affinity and just simply not letting it get scheduled on those cores.)
2) Use jemalloc instead of the system allocator.
3) Limit the cores and use jemalloc simultaneously.

Does your issue reproduce in all of these three cases? What's each case's maximum memory usage?

koute on 4 Sep 2019

closing, as proper reproducible example is not provided

fafhrd91 on 20 Dec 2019

what the.. the example is provided in the first post, the example cannot get any more basic than this.
The issue is not related to high core count or large amounts of RAM, it also happens on a x86 machine that has 4 cores and 16GB of RAM.

leo-lb on 20 Dec 2019

i ran this on an up to date intel linux distro a dozen times with same params to h2load and memory usage grows to about 530MB and then subsequent runs don't increase memory usage.

And in ANY case, 530MB of idle usage is not acceptable.

leo-lb on 20 Dec 2019

I can not reproduce on my Mac.

fafhrd91 on 20 Dec 2019

I can not reproduce on my Mac.

And that means the issue must be non-existent?

leo-lb on 20 Dec 2019

@koute When memory is "lost in fragmentation", it eventually gets reused later by future allocations. Fragmentation does not cause endless increase of memory usage. If a system starts swap thrashing, it means that the memory isnt being re-used, the usage is increasing indefinitely. Glibc's allocator does not cause leaks of memory such as this.

leo-lb on 20 Dec 2019

You could make a favor to open source community and debug the problem, especially because you can easily reproduce it.

fafhrd91 on 20 Dec 2019

You could make a favor to open source community and debug the problem, especially because you can easily reproduce it.

Yes, I regularly do so, and I tried already but I havent had enough time to end up with a positive result, however, that doesnt warrant closing the issue.

leo-lb on 20 Dec 2019

@leo-lb the issue is now 5 months old, actix has had some major rewrite due to async/await and apparently no one in the core team can reproduce this.
I'd suggest toning it down a little and investigating yourself a bit, or trying to find someone who can help you with that.

0xpr03 on 20 Dec 2019

@leo-lb the issue is now 5 months old, actix has had some major rewrite due to async/await and apparently no one in the core team can reproduce this.
I'd suggest toning it down a little and investigating yourself a bit, or trying to find someone who can help you with that.

I'm not sure I understand what you're suggesting here.
I assure you I do not go and invent issues, this issue isnt a request for anyone to work on anything, it simply is a log, a todo list, a tool to keep track of things.
So please re-open it?

leo-lb on 20 Dec 2019

try to run app with MALLOC_ARENA_MAX=1

fafhrd91 on 20 Dec 2019

try to run app with MALLOC_ARENA_MAX=1

I am getting similar results, same kind of leak, and it continues leaking over time with repeated h2load runs

leo-lb on 20 Dec 2019

It seems that the latest examples do not support http2. At least not without tls maybe?
Just tried with jemalloc, same leak.
I pushed my modifications here: https://github.com/leo-lb/examples/tree/http2-memleak/static_index

leo-lb on 21 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Can I get params from query string or http body or cookie or http header as a map?

ufosky · 3Comments

how to get the raw request body

zhaobingss · 4Comments

A Discord server for Actix and Actix Web

naturallymitchell · 4Comments

Add support for optional query parameters

Dadibom · 4Comments

User guide for actix_web::client

fafhrd91 · 5Comments