Even mostly idle instances accumulate hundreds of open sockets that will eventually hit the ulimit and cause errors. I restarted my instance a few hours ago, and in the meantime it has accumulated 158 log lines (and 1568 that access /latest_version) and 284 open sockets. I don't know what the sockets are being used for, but this smells like a bug to me. One user on the #invidious IRC channel mentioned that he had to raise the open file descriptor limit to 16k, for an instance that is only used by them.
This has been a known issue for years.
It's not that big of a problem since the linux default value is far too low.
Is Invidious supposed to use that much though? Of course no.
Duplicate of https://github.com/iv-org/invidious/issues/578
I've at least partially mitigated the problem by removing the proxy_http_version 1.1 and proxy_set_header Connection "" lines from the Nginx config. It's now been running for 2 days without hitting the default socket limit where as previously it would typically run out in ~12 hours despite very light use (and it would then spam the logs and fill up the disk to make it even more fun).
The problem seems to be in the lsquic.cr library where there's a socket connection that is held indefinitely. I create a PR to close the socket.
The problem seems to be in the lsquic.cr library where there's a socket connection that is held indefinitely. I create a PR to close the socket.
Good catch, unfortunately we also have the problem without quic enabled here so there's probably more to it :(
We seem to hit it reliably after a bunch of broken pipe errors (see the crystal ticket I referred to), might be related. Don't have much more time to dig atm.
Ah sorry I take that back, I'm not sure if it's the main problem we have when we quickly run out of fd during charge bursts but that's definitely a problem we run into, I didn't realize it's a quic client lib and we're leaking such quic handles to google servers on our instance as well alright.
This issue should be fixed https://github.com/iv-org/lsquic.cr/pull/2
@martinetd can you open a new issue if you still have this issue?
Will do, thanks!
Most helpful comment
The problem seems to be in the lsquic.cr library where there's a socket connection that is held indefinitely. I create a PR to close the socket.