Locust: Median response times off

Created on 21 Mar 2017 · 8Comments · Source: locustio/locust

 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
----------------------------------------------------------------------------------------------------
 POST long                                                        1     0(0.00%)    6067    6067    6067  |    6100    0.00
 POST short                                                       1     0(0.00%)    1239    1239    1239  |    1200    0.20
----------------------------------------------------------------------------------------------------
 Total                                                           2     0(0.00%)                                       0.20

 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 POST medium                                                       7     0(0.00%)     468     199    1006  |     360    
 POST short                                                        2     0(0.00%)    1063     574    1553  |     570    0.33
----------------------------------------------------------------------------------------------------
 Total                                                              9     0(0.00%)                                       1.00

This only seems to be an intermittent issue, but notice how the median response times are not within the bounds of min and max.
Anyone else ran into a similar issue?

Source

steventang2013

Most helpful comment

In order to calculate the median response time, as well as response times for specific percentiles, without storing the response time for every single request, we keep a dict of the following format: {response_time: numer_of_requests}. In order to save memory we round the response time to only use two digits of precision (so that 6067 becomes 6100, 1239 -> 1200, 574 -> 570, and so on). Since that dict is used to calculate the median response times, while the exact response times are used when calculating min/max, it could happen that the median ends up outside of the min/max boundaries (especially for few requests).

I guess it would probably be good to mention that we only use two digits precision for median & percentile response times, in the docs and the web UI.

heyman on 23 Mar 2017

👍4

All 8 comments

Yes! I had someone at the office show me a situation where they ran 10 requests, the average was way outside of the bounds of what the logs showed. So probably not just an issue with the median, seems to be an issue with reporting. It also seems to be intermittent for us too so I haven't really gone too deep on it yet.

giantryansaul on 22 Mar 2017

I guess it would probably be good to mention that we only use two digits precision for median & percentile response times, in the docs and the web UI.

heyman on 23 Mar 2017

👍4

I would like to know the status if this issue. I understand is a precision problem because only 2 digits are taken, but I don't understand why this is not improved. Having values of median over max and min is not what it's expected in a tool that's used to measure performance. Wouldn't it be possible to use more accuracy, or at least, round it to the max or min value in this cases?

jredrejo on 10 May 2018

Having values of median over max and min is not what it's expected
in a tool that's used to measure performance

agreed.. but is #790 really any better? it just masks the issue rather than adding precision. why don't you submit a PR that uses 3 digit precision and see how that works? (we will get a more precise result at the expense of memory used)

cgoldberg on 26 May 2018

Yes, I think #790 is better than the current situation. Of course using 3 digit precision would be better but I am not involved in locustio development and I didn't want to go so far in the changes.

790 gives something that's logical from a stats point of view instead of something that's an impossible value.

jredrejo on 28 May 2018

790 gives something that's logical from a stats point of view

instead of something that's an impossible value.

it might make it look logical, but the stats are not correct. That's slightly better than the current situation I suppose.. but I'd rather someone fix the actual issue.

cgoldberg on 28 May 2018

Well, they are correct when you have only one hit, what's better.
I agree it's better to fix the actual issue, but also, this is aslight improvement. Currently you can't show the stats to anyone because immediately will tell you they are wrong.
So, while someone else has time or resources to provide an exact solution, I think #790 can save locust users of some red faces.

jredrejo on 28 May 2018