Caddy: Enabling basicauth causes CPU usage to spike

Created on 30 May 2020  路  14Comments  路  Source: caddyserver/caddy

Enabling basicauth for a route causes CPU usage to spike significantly for very modest request rates. On a Linode tiny instance (1 vcpu and 1GB of ram, admittedly modest hardware) 10-20 requests per second are sufficient to occupy 60%-80% of the single CPU available.

Here's a profile:

  • Generated using /debug/pprof/
  • Caddy was under modest load. I suspect one could reproduce this with any synthetic load, but I noticed it while browsing an authenticated reverse_proxy route to a netdata instance. So this was a single web-browser using javascript to fetch metrics maybe 10-ish times per second for in-browser display.
  • The profile shows that that >19s of CPU time were used during the 30s sampling period, from this single user traffic.
  • The profile shows that over 92% of that CPU time were occupied by blowfish/encryptBlock, presumably repeatedly doing the expensive hashing of the user-password for each request:

profile001

Perhaps caddyauth/BcryptHash or some similar function could be memoized to cache results for a given password input and reduce the number of bcrypt calculations that must be performed for a logged in user submitting multiple requests with the same username/password.

optimization

Most helpful comment

Great!! Small boxes, that's what we love to see. Thank you! (Had me worried for a hot second.)

Now, I will point to this in the future when people complain about Caddy's performance, as a constructive way to improve it. Thank you for the example.

All 14 comments

Thank you for profiling!! (As opposed to just complaining that basicauth is "slow" like so many people would usually do.)

So yeah, that's expected with a password hash like bcrypt. Caching the results is a good idea -- if we assume that the program memory is absolutely a safe place in the long term. (In general, we do assume this with Caddy since if your server is popped, it's game over already.)

Should caching it be opt-in or opt-out?

Thank you for profiling!! (As opposed to just complaining that basicauth is "slow" like so many people would usually do.)

馃憤 The debug endpoint makes profiling pretty straightforward, and I had now idea what was slow until I profiled, I didn't realize it was auth out of the gate.

Caching the results is a good idea -- if we assume that the program memory is absolutely a safe place in the long term.

I'm no expert in modern AppSec, but it seems to me that unless we have an encrypted in-memory path for the basicauth header, that memory is already "tainted" and caching doesn't really change the game. I get that variable lifetime counts for something, but I'm not sure that it's a qualitative change.

Should caching it be opt-in or opt-out?

Having a single user occupy most of an entire CPU seems like an untenable default condition, so I think I'd recommend opt-out... but maybe basicauth is niche enough these days that opt-in is viable. I think in my mind the bigger concern would be controlling tradeoff on the size vs the hit-rate, either with a global-scoped or a directive-scoped config that exposes whatever tweakables the underlying cache implementation has.

Though, to be clear, my own hobbyist use-case is the most trivial possible and would be improved by any caching at all regardless of configurability.

Also maybe if folks are worried about absolutely plaintext passwords hanging around forever in-memory... perhaps a salted fast-hash like sha512 could be the cache key used to lookup the bcrypt value. Obviously fast-hashes are susceptible to brute force attacks if the cache-memory is dumped, but at least one would have to conduct an attack at all... you wouldn't just see "my-aw3some-password!" hanging around in plaintext in a memory dump.

It's not obvious to me that the complexity is worth the protection it gives, but it's something to consider. I'll pipe down now, designing in-memory password handling is well outside my expertise.

Also maybe if folks are worried about absolutely plaintext passwords hanging around forever in-memory... perhaps a salted fast-hash like sha512 could be the cache key used to lookup the bcrypt value. Obviously fast-hashes are susceptible to brute force attacks if the cache-memory is dumped, but at least one would have to conduct an attack at all... you wouldn't just see "my-aw3some-password!" hanging around in plaintext in a memory dump.

Funny, I thought of the same thing but didn't even bother suggesting it for the same reason. It makes attacks _harder_ but the point of using expensive hashes in the first place is to... keep them expensive...

Go doesn't give us a lot of control over what stays in memory and what doesn't, except that not caching them is certainly more likely to allow them to be GC'ed sooner rather than later, whereas if we cache them they will absolutely stay in memory (either in plaintext or a weakly-hashed form) for a long time.

Meh, again, if the machine is so compromised that arbitrary memory can be read by the attacker, all bets are off anyway -- since the password is always plaintext coming in after TLS termination.

I just stumbled upon the same issue when switching some of my personal sites to Caddy 2 from v1. I was wondering why the new Caddy was slower and more resource hungry than expected.

I have a few personal services behind basic auth and they all return stuff from reverse proxies. Sites with lots of thumbnails would take quite a while to load and I would see a big CPU spike. I use some low cost VMs for some of my stuff. There, the impact is very noticable.

Disabling basic auth makes the problem go away. I already figured it might be that due to the use of bcrypt/scrypt.

Personally I think caching would be fine, but I would also be totally happy with the ability to chose a fast hash method. Before this I stored the password in plaintext, since it's really not that critical a password.

Anyway, thanks to @mikelococo for proving my suspicion and doing all the leg-work 馃槄

Oh and one more thought. Since I think the hash-function is called for each request with a password, the current implementation could certainly be easily misused to perform a simple denial of service attack.

Yeah, okay, so probably opt-in caching is a good idea. It's a little more conservative to be opt-in and it's easier to transition from opt-in to opt-out than the other way around, I think.

@href

the current implementation could certainly be easily misused to perform a simple denial of service attack.

It's no different than any other "simple" DoS attack -- and caching hashes won't prevent that as an attacker would just use a different password each time. As with other DoS attacks, if you're worried about this, implement some rate limiting or firewall rules, or put a CDN up in front.

It's true that caching won't help, but a fast hash-function would. But fair enough, a regular login site would in principle experience the same issue.

It's true that caching won't help, but a fast hash-function would.

As mentioned above, this would leave passwords more vulnerable to brute-force attacks.

So, really, DoS is not a particularly relevant concern here -- if you need to defend against that, there are rate limiting, firewalling, and CDN rules you can establish to do that properly.

@mikelococo Can you please try the latest at #3465? It works for me and offers considerable speedup (well, unnoticable on my laptop tbh) but I haven't done the fancy profiling on it... just used the time command.

Looks good. I didn't profile, but I can confirm that my test use-case went from generating >60% CPU load from caddy alone to effectively 0% CPU load from caddy.

Uh... maybe I spoke too soon. Profiling...

Looks good after all. I had built and tested master accidentally instead of hash-cache, my initial result must have been some further misinterpretation... not sure how I saw a reduction momentarily out of the master build... must have been slow metrics response+confirmation-bias. Since I bothered to profile again, here it is (my caddyfile has evolved a bit since the last test, but not in ways that matter here):

profile_cached

It's hard to find caddyauth in there, but it's hanging off of the big caddyhttp/wrapMiddleware box on the right. Representing 30ms of 370ms of processing, down from 19.28s of 19.62s of processing in the original profile.

Great!! Small boxes, that's what we love to see. Thank you! (Had me worried for a hot second.)

Now, I will point to this in the future when people complain about Caddy's performance, as a constructive way to improve it. Thank you for the example.

Was this page helpful?
0 / 5 - 0 ratings