Hello,
When using locust with 1 master and 4 slaves, running a 50,000 users at 200 hatched per second I'm receiving the following error:
'ConnectionError(MaxRetryError("HTTPConnectionPool(host=\'rewresnwww6ld\', port=80): Max retries exceeded with url: /api/activities (Caused by
This seems to be coming from the requests library. My ulimit is unlimited and I've applied the other settings below from this post:
echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240
Any ideas? I can't effectively loadtest at this point as the error rate climbs after ~5000 users have been generated.
It is likely your sockets that end up in TIME_WAIT state, which effectively blocks them for re-use for a temporary time period.
See http://serverfault.com/questions/212093/how-to-reduce-number-of-sockets-in-time-wait and http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/ for more info.
One could argue that Locust should re-use sockets when doing big tests. We've been thinking about that for when testing Battlelog (multi-million user tests) to reduce this behavior, since it isn't optimal to re-use sockets too quickly (hence the default TIME_WAIT timeout). However, reusing sockets won't test the actual TCP accept handshake which also puts stress on your system. But in most cases, this isn't your actual bottleneck anyway.
This should not be a case of TCP port exhaustion as that would not generate that error. (Rather it would generate EAGAIN on connect())
I think it's more likely that your python processess don't actually have the intended resource limit. You could confirm this by printing it out in your locustfile.
import resource
print resource.getrlimit(resource.RLIMIT_NOFILE)
However, reusing sockets won't test the actual TCP accept handshake which also puts stress on your system. But in most cases, this isn't your actual bottleneck anyway.
I think it would actually. The peer would most likely be gone and it would be required to reestablish that connection from scratch. The only thing that would be reused is probably the kernel resources allocated for that socket. That said, I'd imagine reusing the sockets could create quite strange errors on a shaky network.
That appears to have been the issue. Once I added this call to my locustfile I was able to bring my servers down. Thanks for the help. Still unsure why the python process wasn't respecting my ulimit settings, but able to work around it for now.
resource.setrlimit(resource.RLIMIT_NOFILE, (999999, 999999))
Thanks guys.
Firstly check if the the socket is close. Python socket should call socket.close() after socket.shutdown(2), then the connection will be delocalized and released.
Then enlarge the maximum open files in /etc/security/limits.comf.
Most helpful comment
That appears to have been the issue. Once I added this call to my locustfile I was able to bring my servers down. Thanks for the help. Still unsure why the python process wasn't respecting my ulimit settings, but able to work around it for now.
Thanks guys.