After starting test from UI, state shows as STOPPED even though it's running.
If I click start again, then the status is updated to SPAWNING correctly

Status should show as "SPAWNING" I believe.
Just start test from UI.
I can't really reproduce this reliably though 🤔
bash-5.0$ curl http://locust-crs:8089/swarm --data-raw 'user_count=20000&spawn_rate=20'
{
"host": "https://01.ldt.xxxx-yyyy.com",
"message": "Swarming started",
"success": true
}
# Wait 10 seconds
bash-5.0$ curl -s http://locust-crs:8089/stats/requests | grep -m 1 state
"state": "stopped",
Are you using load shapes? (just guessing here, I dont know this part of the code base so you're on your own :)
Are you using load shapes?
Nope
I'm in a good position to debug it myself but thought I'd post an issue in case others had the same problem.
I also experienced this a few times and came to the conclusion that it was happening when the workers were overloaded (high CPU usage). In my case, the workers were performing some blocking tasks on start (random texts generation). To prevent this, I limited the amount of users per worker and I also sprinkled some gevent.sleep(0) in the blocking code so that the event loop is not completely blocked.
@max-rocket-internet did you manage to figure it out?
did you manage to figure it out?
Not yet! But we are seeing this every day. We might need to role back to a previous version. I'm still debugging it.
Perhaps it only happens with a large number of workers? Were you using an on_stop method? (in that case, try running latest master with the above mentioned fix)
Without more details I think I'll have to close this.
Were you using an on_stop method?
I just checked all our tests, none of them use on_stop.
I'm still looking into. I can reproduce it only once and a while 😐
I'm not sure if there are any debug-loggings surrounding this logic, but I'd recommend running locust with -L DEBUG, and checking the log output when the issue occurs (possibly adding some loggings if the existing ones are not enough)
Yeah I added --loglevel DEBUG but it's not enough. I then added my own debug logging but still can't reproduce it reliably.
sneaky...
Hmm got debug logs from today but doesn't look like they help much:
[2020-10-14 11:58:10,144] locust-xxx-master-547544d45d-cq2gb/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces)
[2020-10-14 11:58:10,152] locust-xxx-master-547544d45d-cq2gb/INFO/locust.main: Starting Locust 1.2.3
[2020-10-14 11:58:13,443] locust-xxx-master-547544d45d-cq2gb/INFO/locust.runners: Client 'locust-xxx-worker-5b949b4c64-s2ngr_d1b34715c5054995a007b260f1b7db52' reported as ready. Currently 1 clients ready to swarm.
...
[2020-10-14 11:58:14,470] locust-xxx-master-547544d45d-cq2gb/INFO/locust.runners: Client 'locust-xxx-worker-5b949b4c64-n66h7_54b1cb9170134ab4a7282763ccaa782c' reported as ready. Currently 75 clients ready to swarm.
[2020-10-14 11:58:47,101] locust-xxx-master-547544d45d-cq2gb/INFO/locust.runners: Shape test starting. User count and spawn rate are ignored for this type of load test
[2020-10-14 11:58:47,101] locust-xxx-master-547544d45d-cq2gb/DEBUG/locust.runners: Updating state to 'ready', old state was 'ready'
[2020-10-14 11:58:47,102] locust-xxx-master-547544d45d-cq2gb/INFO/locust.runners: Shape worker starting
[2020-10-14 11:58:47,102] locust-xxx-master-547544d45d-cq2gb/INFO/locust.runners: Shape test updating to 25000 users at 13.00 spawn rate
[2020-10-14 11:58:47,102] locust-xxx-master-547544d45d-cq2gb/INFO/locust.runners: Sending spawn jobs of 333 users and 0.17 spawn rate to 75 ready clients
[2020-10-14 11:58:47,102] locust-xxx-master-547544d45d-cq2gb/DEBUG/locust.runners: Sending spawn message to client locust-xxx-worker-5b949b4c64-s2ngr_d1b34715c5054995a007b260f1b7db52
...
[2020-10-14 11:58:47,109] locust-xxx-master-547544d45d-cq2gb/DEBUG/locust.runners: Sending spawn message to client locust-xxx-worker-5b949b4c64-n66h7_54b1cb9170134ab4a7282763ccaa782c
[2020-10-14 11:58:47,110] locust-xxx-master-547544d45d-cq2gb/DEBUG/locust.runners: Updating state to 'spawning', old state was 'ready'
[2020-10-14 11:58:47,110] locust-xxx-master-547544d45d-cq2gb/DEBUG/locust.runners: Updating state to 'stopped', old state was 'spawning'
🤔
Most helpful comment
I'm in a good position to debug it myself but thought I'd post an issue in case others had the same problem.