Circuit breakers prevent thundering herds, and improve resiliency against intermittent errors. Every client-side endpoint should be wrapped in a circuit breaker.
some of the Hystrix's featured can be imported to Kong. https://github.com/Netflix/Hystrix/wiki
Hi, i'm also in favourite of developing this plugin for Kong - anyone knows if that's already the case?
We are using hystrix on sevice side, but java is not what we need on Kong more a lua module or something.
@t1tcrucible as far as I know no one is developing this yet. Yes, all Netflix tools are mainly Java. Would you like to PR?
Yes it would be nice; we are figuring out if we can do this in luascript/C-API.
This would be very easy to implement, and it would consist in asking the user what threshold of 5** errors the system should accept before shutting down the service. The plugin should also provide an API interface to re-enable the circuit breaker.
An iteration over the first version, would be to also consider timeouts or response time.
A plugin for this functionality would be great (instead of people developing their own ad hoc solution)
I was thinking about having a Circuit Breaker plugin with the following configuration template:
{
"config": {
"statuses": [
500,
501,
502
],
"minute": 20
}
}
This would block the API if more than 20 occurrences of 500, 501 and 502 errors are being returned per minute. We could support second, minute, hour, day, month timespans.
To enable the circuit again, the plugin would provide an API endpoint like:
curl -x POST -d "closed=true" http://127.0.0.1:8001/apis/{api}/circuit
To get the status of a circuit:
curl -X GET http://127.0.0.1:8001/apis/{api}/circuit
{
"closed": true
}
Thoughts?
Sounds like a nice first step. I think it would be interesting to explore how the plugin could re-enable the route automatically:
auto=true, auto=false)health=http://upstream.somewhere/api/health_check) using "GET". This endpoint is expected to be idempotent (that is, has no side-effects). If 200, then re-enable the circuit.Expanding a little bit on the above, you could have the plugin expose the following configuration:
{
"config": {
"statuses": [
500,
501,
502
],
"minute": 20,
"health": {
"endpoint": "http://upstream.somewhere/api/health_check",
"method": "GET",
"expected": "200",
"wait_after_close": 60,
"period": 10
}
}
}
Where the health section is optional, and if set it means that you want to automatically check for health and re-enable the circuit. endpoint is the URL to query, method is the method to use, and expected is the expected HTTP status code. wait_after_close is the number of seconds to wait after the API is closed in order to start querying the health API, and period is how many seconds to wait between queries. The health check task could be run on an nginx timer.
Seems like the enterprise edition supports Circuit Breaker out of the box.
@alexforever86 Could you point out links to the documentation for the circuit breaker feature of the enterprise edition that I can refer to? Didn't find them out.
This is a very useful plugin to have.
This would be similar to rate-limiting plugins with support for different datastores for storing counters.
@sonicaghi @thibaultcha @thefosk Would you accept a PR for a simple implementation as discussed in the comments (less the health check part) ?
@hbagdi Hi,
This feature is already in the works internally. It will be available for testing when it is ready, and we'll be happy to receive some feedback!
@thibaultcha Sounds good. Thanks!
good jod!
Health checks and circuit breakers are available since 0.12! :tada:
Most helpful comment
This would be very easy to implement, and it would consist in asking the user what threshold of 5** errors the system should accept before shutting down the service. The plugin should also provide an API interface to re-enable the circuit breaker.
An iteration over the first version, would be to also consider timeouts or response time.