Sentry: Rate Limiting seemingly disabled as of 8.13.0

Created on 12 Apr 2017  Â·  42Comments  Â·  Source: getsentry/sentry

Details are here: https://forum.sentry.io/t/no-rate-limiting-after-upgrade/1114

Highlights:

  • Two confirmed cases of updating to or beyond 8.13.0 and rate limiting not having an effect.
  • Tried setting it in the UI and forcing it in the conf.
  • Confirmed in 8.14.1, haven't had a chance to try 8.15.0 yet.

Most helpful comment

I've upgrade sentry from 8.6.0 to 8.22.0, then the rate limit seems not work. The code here returns the min value of system_limit and account_limit. In single organization mode, account_limit is zero (which means no limit.). So this method always return zero which means no limit.

After run the code below to set quota manually in python console, rate limit works.

OrganizationOption.objects.set_value(
    organization=Organization.objects.first(),
    key='sentry:account-rate-limit',
    value=50000)

All 42 comments

I'm on 8.16.0 and tried tweaking various configurations according to the docs, rate-limiting is obviously not working @mattrobenolt @dcramer

Also seeing it on 8.14.1. This bug is causing us to lag behind on new events for hours during spikes of high error rates, as our sentry servers can't keep up with the firehose of inbound events. This is really frustrating, as it is during these times that getting visibility into the latest errors is critical to reducing downtime. Can we get please this prioritized?

@boboli we will try to look at this, but at the end of the day if you're using sentry for free (open source) you have to realize that these requests may slip for us. Only so many resources, and if we dont see issues on our SaaS service things become a lower priority (unless its a core brekaing issue). In this case we use different rate limiting code in production, so its likely theres a flaw in the builtin that the tests aren't catching.

We have been experiencing this issue as well since we upgraded Sentry from v8.10 to v8.16.1. The bug is definitely annoying because it causes hours of lag behind on new events from time to time.

I think that with people upgrading to newer versions, more and more will realize they lose rate limiting function and google-find this issue. Or, if they see this issue first, they may decided to delay the upgrade (I know I'd have delayed if I knew it).

Hopefully the bug can be fixed soon. We have done the db migration, I don't think we can downgrade at this point.

I'm planning to look into this soon (hopefully this week), just have been spending my free cycles on random smaller features.

I actually had looked at this recently and couldn't find a bug, so maybe I was just overlooking things and people were asking for a feature that we had simply removed

Yeah, the DSN rate limiting works, but I still don't see the "organization level project percentage" working at all.

@HakShak will take a look -- we're likely going to redo those features, so I would suggest not relying on the current behavior too heavily, but they SHOULD still work.

FYI investigation (via tests) is here https://github.com/getsentry/sentry/pull/5688/files

@dcramer / @HakShak

Is the take away here that with version 8.17 the rate limiting feature per key is working but the "organization level project percentage" is not?

It looks like you can set the system wide rate limit via the UI or via config. Does one value supersede the other?

generally speaking looks like rate limiting per project isn't working for me. Could I be missing something. Thanks in advance.

@JarenGlover ive not identified anything as not working, but if something truly is it would help to have more detail on what isn't working, and what configuration is setup to cause it.

@dcramer: We notice 2 things that are not working.

Global Rate Limit

We've set the global rate limit to 3000 using the UI under the admin settings:
image

But this is clearly not working just by doing some simple math from our stats page:
screen shot 2017-07-21 at 1 22 54 pm
2.6E6 events in 1 hour ~= 43K events/minute

We've tried setting this in the conf too, but not sure if we're doing it right:

SENTRY_QUOTAS_OPTIONS = {
    "cluster": "quotas",
    "system.rate-limit": 3000
}

This setting definitely used to work in the past, I believe in version 8.12, as the stats page used to show bars of red to show dropped events.

Per-DSN Rate Limit

We then tried setting the per-DSN rate limiting (set to 1000/min) in the UI, and we can see from this screenshot that this isn't working either:
screen shot 2017-07-21 at 1 30 01 pm
Simple math: 9.7E6 events in 1 day ~= 6736 events/minute

Setup

We're using Sentry 8.17.0 on Python 2.7.6/Ubuntu 14.04 in AWS EC2 (not using Docker), with Redis for our buffer, ratelimiter store, quota store, TSDB, digests; rabbitmq for our queue; memcached for cache; and AWS RDS PostgreSQL for node storage. I can give you more details on our setup if needed.

@boboli are you using single organization mode (the default)?

@dcramer: yup, have had it on since we first provisioned sentry (v8.7.0).

Can you confirm what SENTRY_QUOTAS is set to?

SENTRY_QUOTAS = 'sentry.quotas.redis.RedisQuota'
SENTRY_QUOTAS_OPTIONS = {
    "cluster": "quotas",
    "system.rate-limit": 1000
}

@boboli and quotas exists in the output of:

$ sentry config get redis.clusters
        type: DICTIONARY
 from config: {'default': {'hosts': {0: {'host': '127.0.0.1', 'port': 6379}}}}
     current: {'default': {'hosts': {0: {'host': '127.0.0.1', 'port': 6379}}}}

?

Yup:

(sentry_venv) robinhood@ip-10-1-30-102:~/sentry$ sentry config get redis.clusters
        type: DICTIONARY
 from config: {'default': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 5, 'port': 6379}}}, 'ratelimiter': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 1, 'port': 6379}}}, 'buffer': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 2, 'port': 6379}}}, 'cache': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 0, 'port': 6379}}}, 'digests': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 4, 'port': 6379}}}, 'quotas': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 3, 'port': 6379}}}, 'tsdb': {'hosts': {0: {'host': 'sentry-tsdb.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 0, 'port': 6379}}}}
     current: {'default': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 5, 'port': 6379}}}, 'ratelimiter': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 1, 'port': 6379}}}, 'buffer': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 2, 'port': 6379}}}, 'cache': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 0, 'port': 6379}}}, 'digests': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 4, 'port': 6379}}}, 'quotas': {'hosts': {0: {'host': 'sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 3, 'port': 6379}}}, 'tsdb': {'hosts': {0: {'host': 'sentry-tsdb.uon5mu.ng.0001.use1.cache.amazonaws.com', 'db': 0, 'port': 6379}}}}

Lastly, can you confirm the version of Redis? Given no rate limits are working, it makes me believe that this is something thats not core app code causing the issue.

sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com:6379> INFO
# Server
redis_version:2.8.24

For the record, we are using AWS Elasticache for this redis cluster, so it might be an Amazon-flavored version of redis as opposed to the open-source one.

Maybe this helps for debugging but I ran KEYS * on the db number we use for quotas (#3), and it came up with nothing but:

sentry-buffer.uon5mu.ng.0001.use1.cache.amazonaws.com:6379> KEYS *
1) "ElastiCacheMasterReplicationTimestamp"

Can anyone else here also confirm Redis version? I'm wondering if somehow the LUA isnt working as expected here for a specific runtime.

We use AWS Elasticache too, and our redis version is v3.2.4.

$ echo INFO | nc $REDIS 6379 | egrep redis_version
redis_version:3.2.4

Using this docker image for our redis: redis:3.2-alpine

Oh, sorry and correction: our prod environment, which is experiencing the no-rate-limiting issue, uses redis v2.8.23. Our stage environment uses redis v3.2.4. However, we can't tell if the later has the same issue or not because it has almost no incoming traffic.

@dcramer: any updates on the investigation? I'd be willing to provide additional details if it helps to debug.

@boboli i haven't had time. We'd like to resolve this, but its not affecting everyone, and its definitely not affecting us, so it'll happen when there's an obvious reason it's broken or someone has spare cycles to look into it

@HakShak are you saying the issue no longer exist when you use Redis 3.2?

No, still exists. The update which added per DSN seemed to add another
method of rate limiting. Isn't very accurate but at least triggers while
the installation setting and the relative project settings don't seem to.

On Aug 7, 2017 19:26, "Jaren Glover" notifications@github.com wrote:

@HakShak https://github.com/hakshak are you saying the issue no longer
exist when you use Redis 3.2?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/getsentry/sentry/issues/5227#issuecomment-320756996,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEuQFT9-9ItTLbpXFnybD0mQHkfo_Pfwks5sV2TpgaJpZM4M6x0t
.

@HakShak what do you mean by its not very accurate? It's using the same system as core rate limits.

We have a DSN right now set to 100 per minute, that projects has about 7k an hour on the low end and gets up to 15k an hour on the peak. Checking it now, I actually don't see any events being throttle by it.

I still subscribe to your theory @dcramer that something outside the Sentry codebase is causing this.

I saw DSN throttling working when we first tried it, but something has either lingered around in the DB or built up in the TSDB.

One thing which might be "unique" about our setup is that we use different Redis instances for the TSDB and the queue.

Here are details on the issues with the DSN rate limits: GH-6007

@tkaemming: Does the bug also affect the global rate limit logic as well?

The specific issue in the ticket linked above wouldn't affect global rate limits (the code involved is isolated to the rate limits associated with per-project keys.) We haven't ruled out any other issues related to global limits though — we just haven't been able to reproduce them ourselves.

Following up: the issue with per-project key rate limits has been fixed on sentry.io and will be part of the 8.20 release.

I've upgrade to 8.20 and rate limits aren't working. From the settings page I've confirmed that my account is limited to a maximum of 20 events per 60 seconds, but sentry stats page shows 1191 EVENTS PER MINUTE and none are dropped. @tkaemming

I've upgrade sentry from 8.6.0 to 8.22.0, then the rate limit seems not work. The code here returns the min value of system_limit and account_limit. In single organization mode, account_limit is zero (which means no limit.). So this method always return zero which means no limit.

After run the code below to set quota manually in python console, rate limit works.

OrganizationOption.objects.set_value(
    organization=Organization.objects.first(),
    key='sentry:account-rate-limit',
    value=50000)

I can confirm the same behaviour as reported by @jinzhao1994 - also running sentry 8.22.0.

Just want to chime in and add that system.rate-limit does not work on my install of 8.20

@tkaemming : it seems like @jinzhao1994's investigation found the problem. Can we take another look at solving this issue for the next release?

@boboli et all -- we'd like to fix this, but its not a high priority for us. You have to realize that most people contributing to Sentry work for Sentry, and Sentry's goal is our SaaS. If you're using Sentry, especially for free (which is likely true if you're not using sentry.io), you might consider asking your employer for some time to help accomplish the goals, and we're happy to provide guidance around it.

Rate limits have been entirely pushed towards per-key limits at this point. I'm closing this out as we dont intend to continue to invest in other kinds of rate limits. I would suggest using nginx or similar to have more granular control over how you handle load-based rate limit concerns. You can find examples of this in our docs: https://docs.sentry.io/server/throttling/

We should confirm this is disabled in future versions of Sentry, but I believe it already should be.

Was this page helpful?
0 / 5 - 0 ratings