I'm getting "failure to get a peer from the ring-balancer" error when calling the API's of my service after upgrading Kong to 1.2.1 (also tried 1.2.0). The exact same config works well in 1.1.2.
After some troubleshooting, I realized the problem has something to do with the following in my Kong yaml:
I have this setup because I have my own plugin with different settings for Route1 and Route2 respectively; underlying they should both go to the same microservice and the same port. And I'm using a library to auto-generate the complex Kong yaml.
Apparently the cause of the error is having multiple Upstreams sharing the same Target (same IP and same Port). If I remove one of the Routes, or simply point Route2 to Service1 and get rid of the thus excess Upstream2 etc., the error will go away.
we ran into this exact bug, it's completely unclear as to what's breaking. downgrading from 1.2.1 to 1.1.2 works fine
Hi @imsky and @22vincetsang, thanks for reporting. We've started triage internally.
I believe we are experiencing the same issue. Just rolled back to 1.1.2, continuing to monitor if it fixes.
Also, db-less on k8s with the kong-ingress-controller.
@22vincetsang Could you post error logs of Kong around this event?
Also, do you see any errors when you manually (or Kong Ingress Controller automatically) load the configuration using /config into Kong?
ping @Tieske
@hbagdi Only these:
2019/07/23 01:59:46 [debug] 33#0: *1197 [lua] certificate.lua:22: log(): [ssl] no SNI provided by client, serving default SSL certificate
2019/07/23 01:59:46 [debug] 33#0: *1196 [lua] base_plugin.lua:43: log(): executing plugin "prometheus": log
172.20.2.1 - - [23/Jul/2019:01:59:46 +0000] "GET /my-api/v1/my_endpoint HTTP/1.0" 503 58 "-" "PostmanRuntime/7.6.1"
And yes, the "failure to get a peer from the ring-balancer" error (with HTTP status code of 503) can be reproduced if I manually load the Kong config via the /config endpoint.
Btw, just to share with those who're suffering the same issue, I'm currently working around this on my side by doing a post-process on my Kong yaml so there're no more multiple upstreams sharing same target (i.e. pointing Route2 to Service1 then getting rid of the excess Upstream2 and Service2, as I indicated in description).
encounter same problem at 1.2.1 db-less mode
same problem at 1.2.1 db-less mode (Kong Ingress Controller)
Hi @22vincetsang @imsky @22vincetsang @HsinHeng @Renz2018
We have the fix for this bug and it has released with 1.3 rc1, feel free to try it out. The final version should be released soon, stay tuned.
Same problem here at 1.2.x
i'm stuck with the same error 1.2.x
Kong 1.3 is released, please upgrade to Kong 1.3 to resolve this issue.
The fixes here are:
https://github.com/Kong/kong/pull/4817
https://github.com/Kong/kong/pull/4810
I'm facing the same issue with Kong 1.2.1. I was willing to upgrade to 1.4 (specially for all the bug-fixes and new features) in our production environment, but reading the upgrading guide 1.2.x to 1.4, there are some breaking changes in version 1.3, so the upgrade process will take more than expected (test, validation, etc.).
Whenever I restart my Upstream back-end, I start receiving "failure to get a peer from the ring-balancer" error.
For us, the solution was to restart Kong, whenever we restart our Upstream back-end, in order to avoid the error mentioned above. Not the best solution at all, but it gives a way to go, while we can find a Maintenance Window with our customers.
@Edenshaw not sure that would be the same thing.
Most of the former reports are based on identical targets, getting an identical uuid, and hence one is created, and the second will overwrite the first one. This seems Kong configuration related.
In your case the backend changes, which has nothing to do with the Kong configuration. My guess would be to have a look at DNS records, changing IP's and possibly DNS record TTL settings. And definitely checkout your logs, because there will be some errors in there pointing to the probable cause.
@Edenshaw not sure that would be the same thing.
Most of the former reports are based on identical targets, getting an identical uuid, and hence one is created, and the second will overwrite the first one. This seems Kong configuration related.
In your case the backend changes, which has nothing to do with the Kong configuration. My guess would be to have a look at DNS records, changing IP's and possibly DNS record TTL settings. And definitely checkout your logs, because there will be some errors in there pointing to the probable cause.
Hi @Tieske Thanks a lot for your insight. I'll check again my configuration and logs to try spot the problem.
Most helpful comment
Hi @22vincetsang @imsky @22vincetsang @HsinHeng @Renz2018
We have the fix for this bug and it has released with 1.3 rc1, feel free to try it out. The final version should be released soon, stay tuned.