Resilience4j version: 1.3.1
Java version: 1.8.0_241
Hey guys, I would like to know if there is possible to implement my own backoff logic. For example: I would like to have a logic which while i'm receive a 500 error increase my retry interval until reach a determinated value time and once reach that value i want to continue trying but not increase my retry interval anymore. Without max attempts. Is it possible?
Hi,
you can set the intervalFunction in your RetryConfig. And you can set a very high max attempts value.
And you can set a retryOnResult predicate so that you only retry on HTTP status code 5xx.
It worked, thank you very much for your help.
I was very surprised that default IntervalFunction.ofExponentialBackoff doesn't support setting maximum interval time.
I think the restriction of maximum interval time is very useful and important feature.
E.g. I want to let my service to try to connect to external service each N seconds (with exponential backoff) until it will succeed. But if this external service is unavailable for a long time (say 21h), then I don't want my interval between attempts to become 10h+. If external service become available after several minutes after last attempt, it's unreasonable to wait another 10h to try again. So, the maximum interval time should be bound to, say, 5m (or 30m or whatever, depending on requirements).
The only case when maximum exponential backoff time isn't important is when maximum retry count is very low.
I calculated time for just 27 attempts:
public class BackoffInterval {
public static void main(String[] args) {
int n = 27;
LongSummaryStatistics stats = LongStream.iterate(1000, prev -> (long) (prev * 1.5))
.limit(n)
.summaryStatistics();
long intervalAfterNAttempts = stats.getMax();
long totalTimeForNAttempts = stats.getSum() - intervalAfterNAttempts;
String intervalAfterNAttemptsFormatted = new DurationFormatter(Duration.ofMillis(intervalAfterNAttempts)).toString();
String totalTimeFormatted = new DurationFormatter(Duration.ofMillis(totalTimeForNAttempts)).toString();
System.out.println("Interval between " + n + " and " + (n + 1) +" attempts: " + intervalAfterNAttemptsFormatted);
System.out.println("Total time for " + n + " attempts: " + totalTimeFormatted);
}
}
So, that means that just after 21h, next attempt will be tried only after another 10.5h.
If you really need very long retry wait intervals, I suggest to use a Cluster Scheduler like Quartz and a persistent queue to handle tasks which must be retried.
Think of Little's Law. What is your mean arrival rate of new tasks?
Just imagine your external system is down and you don't use a CircuitBreaker. The mean service time of your tasks is increasing up to hours. The mean number of waiting tasks in your system increases very fast and the capacity of your remaining system decreases and might negatively affect other parts of the system.
If the tasks are important enough so that processing can wait for hours, you are not allowed to lose them, right? Can you restart your system without loosing them?
In the upcoming version, it should be already supported.
https://github.com/resilience4j/resilience4j/blob/master/resilience4j-core/src/main/java/io/github/resilience4j/core/IntervalFunction.java#L97
If you really need very long retry wait intervals
No, I don't need long wait intervals. I need totally opposite: restrict intervals to not be too long after some number of attempts (a very small number with default BackOff strategy, I would say).
Thank you! I see it's implemented in master like Math.min(interval, maxIntervalMillis). This is exactly what I did now in my custom IntervalFunction. This is exactly what I asked about.
If you really interested why do I need it, I can explain:
I have an XMPP client component (class) that should start at the application startup. Application should wait until XMPP client starts up to 10s. Then application will sometimes send messages using this XMPP client. But, if XMPP client is unable to start in 10s, it's still ok - application must still start, even without XMPP client (so, XMPP client "service" is not such important for the application. XMPP notifications will not work, but the main function of the application will work - it's much more important to have this application to be always run and inability to connect to XMPP server should never lead to failed start of this application).
Then, in the background, XMPP client must try to connect indefinitely until attempt will succeed. As soon as XMPP client is connected and authenticated at least one time, retry-cycle can be stopped, as given XMPP client already supports auto re-connection internally. So, the trick is to let it successfully connect and authenticate once. And this is where I need resilience4j and it's Retry implementation. I can easily implement this simple retry logic with do-while of course, but it's tedious and brittle, and I'll better follow a single pattern in all places of the application, where some retry logic is required.
But, of course, I want each attempt to happen not too rarely. Each 5 mins is OK for this application.
Still, I want first attempts to happen more often, than further attempts.
This is what exponential back-off strategy is for, right?
This is my final implementation:
private Retry createStartupRetry() {
long maxIntervalMs = STARTUP_MAX_RETRY_INTERVAL.toMillis();
IntervalFunction intervalFunction =
IntervalFunction.of(1000, (x) -> Math.min((long) (x * 1.5), maxIntervalMs));
RetryConfig retryConfig = RetryConfig.custom()
.intervalFunction(intervalFunction)
.maxAttempts(STARTUP_MAX_RETRY_ATTEMPTS)
.build();
Retry retry = Retry.of(SimpleXmppClient.class.getSimpleName(), retryConfig);
retry.getEventPublisher()
.onRetry(event -> {
String waitInterval = DurationFormatUtils.formatDurationHMS(event.getWaitInterval().toMillis());
Throwable lastThrowable = event.getLastThrowable();
String lastErrorMessage = lastThrowable != null ? lastThrowable.getMessage() : null;
logger.info("XmppClient to '{}' has failed to start (attempt #{}). Will retry in {}. Last error was: {}",
xmppServer, event.getNumberOfRetryAttempts(), waitInterval, lastErrorMessage);
});
return retry;
}
To be honest, I never seen a case when anybody want the interval between attempts to increase indefinitely. If someone needs a finite number of attempts, he will set maximum number of attempts. But maximum interval between attempts usually should be restricted to some decent value. 5 min.. 30 min.. maybe 24h... it's all depends on business and technical requirements. But not restrict it at all means that very soon it will go much bigger than that.
Usually this doesn't apply to HTTP-requests (or other kind of RPC calls). Usually we want them to end in a finite time, so we usually restrict max attempt count to 3 or something. In this case max interval time doesn't matter, as it never reaches that value anyway. Maybe that is what you meant.
But RPC is not the only thing that requires Retry logic. There are cases beyond that, where max attempt count much bigger than 3 is desired. And in these cases max interval time does matter.
Anyway, #1044 solves this problem. Thanks!
Thank you for your explanation.
I understand your use case now. 馃憤