@mattrobenolt @JTCunning @wedamija any reason why we have this hard-coded value? Can we merge #21581?
Ok, this probably explains the issues we have. But this is scary.
This was introduced, seemingly erroneously, with https://github.com/getsentry/sentry/commit/0f3386f929557d409e4713aa9a9d20e9f0a1354e#diff-6b365c5584bcacefd93ae4c7025171734bf2cf0e64fa9f46a507e0255e4fa10
Now, while I agree. I'm actually a bit concerned with infinite retries. So kinda glad this has been incorrect. The assumption in the comment isn't really true. Historically, we've had deletion tasks fail for many reasons that aren't corrected by just a retry. In which case, we're going to accumulate infinite deletion tasks that never succeed without human intervention.
I'm not sure the correct answer here. Obviously it'd be preferably if we didn't have deletion tasks fail. But that's just not reality. I kinda like it being hardcore at 1, but I'd be open to it being set to 10 or something in production so we can reattempt temporary issues but not spiral out of control for issues that can't be fixed without human interaction.
@mattrobenolt how about we set it to something like 5, add some back off and if it still fails, raise an exception?
You're overthinking it. Just set it to 5 and it'll work itself out. We don't need to get into exponential backoff. That doesn't really work well in Celery and I'd rather not. Celery will raise the exception when it fails after 5 times. So just setting this simply to 5 will be sufficient.
One thought I have is that if we have infinite retries at least we're aware that something is wrong and have a chance to fix it without celery just throwing the task away. But maybe there are cascading effects of these retries that could cause us issues. I'm good with the 5 retries if we think that is safest
It is safest. tbh we need a dead letter queue for stuff like this.
Most helpful comment
You're overthinking it. Just set it to 5 and it'll work itself out. We don't need to get into exponential backoff. That doesn't really work well in Celery and I'd rather not. Celery will raise the exception when it fails after 5 times. So just setting this simply to 5 will be sufficient.