Django-rest-framework: Possible race condition in throttling

Created on 26 May 2017 · 15Comments · Source: encode/django-rest-framework

The throttle_success method of SimpleRateThrottle updates the cache with a new value of history. If there are a number of concurrent requests, there may be a race condition where certain values of history would be overwritten by stale data.

Checklist

[x] I have verified that that issue exists against the master branch of Django REST framework.
[x] I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
[x] This is not a usage question. (Those should be directed to the discussion group instead.)
[x] This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
[x] I have reduced the issue to the simplest possible case.
[ ] I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

Able to reproduce the issue with LocMemCache and memcached
Enable throttling on a simple view & set the rate to say 500/hour
Send concurrent requests with ab, e.g.
Setup 1 : ab -l -c 20 -n 500 http://127.0.0.1:8000
Setup 2 : ab -l -c 1 -n 500 http://127.0.0.1:8000

Expected behavior

In both setup (1 & 2), the view should be throttled for any further request because of the 500 requests sent by ab (once in batches of 20 & second time as a single request).

Actual behavior

Setup 1 : View is not throttled yet (possible another 20-50 requests can be served, varies each time).
Setup 2 : View is throttled for any subsequent request. Concurrency is 1, so history is recorded correctly in the cache.

It appears as though some concurrent requests are being ignored by the cache.

help wanted

Source

bitcity

👍2

Most helpful comment

@xordoquy
I understand all of this except the part when closing an issue is acknowledging and not swiping it under a rug.
Seems like keeping issues count low for stats' sake.

Community member @bitcity did a great job of filing and describing the issue.
And all he's got is "we'll not consider this, please do more work".

If the issue (that is real by the way) stays open, somebody in the community can just take it from here and prepare a PR.
If the issue is closed, project misguides community:

there is no race condition issues in throttling
anybody who stumbles on it again will most likely have to file this again
there is no point in submitting issues if you don't have solution for it

I'm sorry if I come out aggressive. I try to point out that this will decrease community contributions and/or waste other people's time on doing double work.

dmitry-mukhin on 12 Feb 2018

👍4

All 15 comments

Okay, would consider fixes for this on their own merit, but I'm not going to consider it problematic otherwise, given the transient nature of throttling.

tomchristie on 26 May 2017

👎4

At the moment I couldn't arrive at a quick fix. However, I feel that this might be a good place for people to discuss and propose possible solutions. Won't that be facilitated by keeping it open ?

bitcity on 26 May 2017

👍1

We'll redirect people willing to work on it here.
We tend to close issues because otherwise they usually stand opened forever and will only add more noise and more work to our maintenance work.
Provided you are the first to mention it and given our experience with the issues, it's unfortunately unlikely others will be working on it.

xordoquy on 29 May 2017

👎2

@bitcity you're probably going to have to fix this by writing your own throttle using a cache backend that allows you to do atomic list commands (push/append), something like Redis would probably work.

jturmel on 21 Jun 2017

@jturmel Thanks for the tip. I ended up writing my custom throttle class that takes care of race condition (I use the atomic incr operation and keep counters instead of timestamps). The solution works for my use case. Unfortunately, it's not generic enough to be used as a patch though.

bitcity on 21 Jun 2017

we've seen this problem during the last 3 days, after inspecting quickly the throttling code I realized the same things already discussed and got to the root and I'll probably going to implement my own throttling usig redis distributed lock. Now one question, I'd like to contribute to this but I'm not sure there is a "common" way of solving the problem saying we are using the cache abstraction layer and every backend could implement specific method to get around this issue. Maybe an additional call to the cache before inserting the data could help mitigate this a bit but doesn't sound like a proper fix. Does anybody have some ideas on which would be the best ways to fix this?

MattBlack85 on 8 Feb 2018

@tomchristie this is ridiculous.
nobody will want to submit any more issues with approach
"if there's no PR this is not an issue and it doesn't happen often on my projects anyway"

dmitry-mukhin on 12 Feb 2018

👍2

@dmitry-mukhin there's no point in being aggressive.
Beside we didn't said this was not an issue.
We said we'll consider PR about it. This is a community project. Core team members contribute on a regular basis but they also manage hopefully increasing community contributions.
If this is a serious issue, take some time and write a PR. You can also make it a third party so one is free to drive how it goes on its own.

xordoquy on 12 Feb 2018

@xordoquy
I understand all of this except the part when closing an issue is acknowledging and not swiping it under a rug.
Seems like keeping issues count low for stats' sake.

Community member @bitcity did a great job of filing and describing the issue.
And all he's got is "we'll not consider this, please do more work".

If the issue (that is real by the way) stays open, somebody in the community can just take it from here and prepare a PR.
If the issue is closed, project misguides community:

there is no race condition issues in throttling
anybody who stumbles on it again will most likely have to file this again
there is no point in submitting issues if you don't have solution for it

I'm sorry if I come out aggressive. I try to point out that this will decrease community contributions and/or waste other people's time on doing double work.

dmitry-mukhin on 12 Feb 2018

👍4

I've added the "help wanted" label to this issue. This is useful for tracking issues that are currently out of scope for maintainers, but in which contributions would be welcome.

rpkilby on 12 Feb 2018

@rpkilby I believe that labels on closed issues are ignored by vast majority of github users ¯\_(ツ)_/¯

dmitry-mukhin on 12 Feb 2018

Agreed, we should probably add a section to the contributing docs about the label.

rpkilby on 12 Feb 2018

We already have about 100 opened issues. Anyone interested in contributing already has the choice.
This issue isn't easy otherwise someone would already have submitted a PR.
Dont forget that http://www.commitstrip.com/en/2014/05/07/the-truth-behind-open-source-apps/ is pretty accurate.

xordoquy on 12 Feb 2018

👍1

Honestly, don’t especially consider it an outstanding, addressable bug. Application level throttling will only get you so far, and getting throttle rates that are slightly off is just a constraint of the system. Fuzinesses in distributed systems can be okay in some cases. We’ll point at third party packages if anyone wants to implement something stricter.

tomchristie on 15 Feb 2018

Partially agree with that, but slightly off really depends on the network which is by nature, unreliable. Nothing can avoid the first request of a series to get in, being the latest to update the cache in the actual scenario because of the network itself. I agree tho that a proper fix is really beyond the scope of the framework but we can add one line to the docs describing the limit of the standard implementation.

MattBlack85 on 16 Feb 2018

Was this page helpful?

0 / 5 - 0 ratings