It seems that kong startup/migration is using different DNS resolution the actual run time (or Lua DNS resolution, I'm not sure what is doing the resolution). Example: I have an AWS internal hosted zone DNS by the name of qa2. If Kong config doesn't have the hosted zone suffix, e.g.:
contact_points:
- "cassandra-01:9042"
Kong will start without any errors but when asked to answer a simple request, will fail:
$ curl http://localhost:8001/
"{message:An unexpected error occurred}"
The error log will show that it can't find the DB instance:
2016/03/28 11:18:47 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-04 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
Nevertheless, if I use a "proper" DNS name, everything works:
contact_points:
- "cassandra-01.qa2:9042"
Like I mentioned before, it seems that the different part of kong are using different DNS resolution methods, that produce different behaviour. This is fairly confusing since you expect kong to fail from the beginning if the Cassandra DNS is bad.
Kong on docker: v0.7.0
Cassandra: v2.2.4
Without investigation, the reason that seems obvious at first sight would be dnsmasq. Kong manages dnsmasq along Nginx. In the CLI, dnsmasq is not started, but once Kong started, dnsmasq is used as the resolver. From there, I suggest different approaches to try to fix this:
kong.yml (probably easier)To better investigate this, you can use the following command to check how the address is being resolved by a DNS server:
$ nslookup -port={dns_server_port} {address_to_resolve} {dns_server_address}
like
$ nslookup -port=53 google.com 8.8.8.8
Now, you can start kong/dnsmasq and try to see what's the output of:
$ nslookup -port=8053 cassandra-01 127.0.0.1
Port 8053 is the default port that Kong uses for dnsmasq. Dnsmasq should follow the local resolution settings, so the output of these two commands should be the same:
$ nslookup -port=8053 cassandra-01 127.0.0.1 # Using dnsmasq
$ nslookup cassandra-01 # Using system settings
It seems like in your environment this is not the case, so we can see the output of these commands and take it from there.
Ok, so I'm running following commands from within the Kong Docker container, and surprisingly they both show the correct address:
$ yum install bind-utils
....
$ nslookup cassandra-01
Server: 10.0.0.2
Address: 10.0.0.2#53
Non-authoritative answer:
Name: cassandra-01.qa2
Address: 10.0.1.56
$ nslookup -port=8053 cassandra-01 127.0.0.1
Server: 127.0.0.1
Address: 127.0.0.1#8053
Non-authoritative answer:
Name: cassandra-01.qa2
Address: 10.0.1.56
But Kong still doesn't resolve it correctly. error.log shows:
2016/03/30 12:55:09 [error] 115#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
So two notes:
127.0.0.1:8001.cluster.lua when pinging Cassandra, which shouldn't affect loading 127.0.0.1:8001, unless the problem is the same.Can you see if in the error.log file there are more errors after invoking http://127.0.0.1:8001/?
Sure.
Just launching the container, without invoking any request (but waiting a few minutes) shows the following errors:
$ cat /usr/local/kong/logs/error.log
2016/03/31 20:50:35 [error] 114#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/03/31 20:50:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:50:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:50:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:53:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
After making a request to http://127.0.0.1:8001/, the failed request also appears:
2016/03/31 20:56:41 [error] 114#0: *3 [lua] responses.lua:97: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 127.0.0.1, server: , request: "GET / HTTP/1.1", host: "127.0.0.1:8001"
See this gist for the full error.log.
What happens if you make a request to http://127.0.0.1:8001/apis ?
I receive an error message, and the log shows:
2016/04/04 08:37:15 [error] 114#0: *6 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 172.17.0.1, server: , request: "GET /apis HTTP/1.1", host: "localhost:8001"
I am seeing the exact same behavior, also running mashape/kong:0.7.0
From inside the container, I see
sh-4.2# nslookup cassandra
Server: 10.3.240.10
Address: 10.3.240.10#53
Name: cassandra.default.svc.cluster.local
Address: 10.3.247.225
Kong starts up fine, correctly finding cassandra, but then the error log is full of similar errors to @noamelf :
2016/04/05 21:09:21 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/05 21:09:23 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/05 21:09:31 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:39 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:47 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:51 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:09:53 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:10:01 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
- configure dnsmasq so that it can also resolve your AWS zone
- disable dnsmasq and enter your resolver's address in kong.yml (probably easier)
I should note that I am trying to do the second thing, with the following config (full kong.yml at https://gist.github.com/rileylark/a22e2146b85dec3faa9a7c82162b3b29 )
dns_resolver: server
dns_resolvers_available:
server:
address: "10.0.0.10:53"
cassandra:
contact_points:
- "cassandra:9042"
With these settings, kong can find cassandra while it's joining the node, but cluster.lua and hooks.lua cannot find cassandra at runtime
@rileylark ark in your case what's the output of:
$ nslookup -port=8053 cassandra 127.0.0.1
and is it the same as:
$ nslookup cassandra
?
@thefosk Ah, of course I should have thought of that before. Sorry! So my case is different after all:
sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
^C
sh-4.2# nslookup cassandra
Server: 10.3.240.10
Address: 10.3.240.10#53
Name: cassandra.default.svc.cluster.local
Address: 10.3.249.185
sh-4.2#
So it feels like the kong.yml file is being respected at some levels of kong, but then deep down somewhere kong is still trying to resolve names at 127.0.0.1:8053 .
So, maybe to progress at this point, I should focus on configuring the local dnsmasq instead of trying to disable it and provide my own nameserver?
Ok, I tried NOT providing a custom nameserver, and instead letting dnsmasq do the trick, and now I get results exactly like @noamelf.
kong.yml is exactly copy/pasted from exact copy/paste from https://github.com/Mashape/docker-kong/blob/master/config.docker/kong.yml :
cassandra:
contact_points:
- "cassandra:9042"
nginx: |
[...snip...]
Docker container output:
[INFO] Kong 0.7.0
[INFO] Using configuration: /etc/kong/kong.yml
[INFO] Setting working directory to /usr/local/kong
[INFO] database...........cassandra keyspace=kong ssl=verify=false enabled=false replication_factor=1 contact_points=cassandra:9042 replication_strategy=SimpleStrategy timeout=5000 data_centers=
[INFO] dnsmasq............address=127.0.0.1:8053 dnsmasq=true port=8053
[INFO] serf ..............-profile=wan -rpc-addr=127.0.0.1:7373 -event-handler=member-join,member-leave,member-failed,member-update,member-reap,user:kong=/usr/local/kong/serf_event.sh -bind=0.0.0.0:7946 -node=kong-3591152686-oc2ik_0.0.0.0:7946 -log-level=err
[INFO] Trying to auto-join Kong nodes, please wait..
[INFO] Successfully auto-joined 10.0.1.4:7946
sh-4.2# nslookup cassandra
Server: 10.3.240.10
Address: 10.3.240.10#53
Name: cassandra.default.svc.cluster.local
Address: 10.3.249.185
sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
Server: 127.0.0.1
Address: 127.0.0.1#8053
Name: cassandra.default.svc.cluster.local
Address: 10.3.249.185
sh-4.2#
sh-4.2# cat /usr/local/kong/logs/error.log
2016/04/06 17:55:00 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:55:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:55:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:10 [error] 113#0: *3 [lua] hooks.lua:75: member_update(): Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:56:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:54 [error] 113#0: *4 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 10.0.2.4, server: , request: "GET /apis HTTP/1.1", host: "kong:8001"
2016/04/06 17:56:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
sh-4.2#
This is weird and I will need more time to replicate the issue and trying to figure it out.
@thefosk Yesterday I got a vanilla nginx setup working in the exact same kubernetes cluster, which I mention just to confirm that nginx can work in this environment w/o any extra config.
If you have a Kubernetes cluster running already (easy to set up on Google Container Engine), you can get an exact repro of my situation from my yaml files at https://github.com/peardeck/kubernetes-boilerplate/tree/more-kong/kong
First, kubectl apply -f cassandra.yaml which will start a cassandra db and expose it at cassandra:9042. Wait for that to start successfully (~1m for me) and then run kubectl apply -f kong.yaml. From there I think you'll be able to see the behavior that Noam & I are seeing.
I'm happy to help however I can - let me know if you can think of ways for me to provide more info or debugging context!
I got the same problem when running Kong-with-postgres-backend on Kubernetes.
My dns resolve is ok:
[root@kong-n6t3q /]# nslookup -port=8053 kong-database 127.0.0.1
Server: 127.0.0.1
Address: 127.0.0.1#8053
Name: kong-database.mailship.svc.cluster.local
Address: 10.233.30.140
[root@kong-n6t3q /]# nslookup kong-database
Server: 10.233.0.2
Address: 10.233.0.2#53
Non-authoritative answer:
Name: kong-database.mailship.svc.cluster.local
Address: 10.233.30.140
but i still got error:
[root@kong-n6t3q /]# tail -f /usr/local/kong/logs/error.log
2016/06/06 16:22:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:24:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
...
2016/06/06 16:28:59 [error] 270#0: *10 [lua] responses.lua:99: execute(): kong-database could not be resolved (3: Host not found), client: 10.3.18.104, server: _, request: "GET / HTTP/1.1", host: "10.3.18.104:32723"
Kong: v0.8.0
Postgres: v9.4
I ran into this problem as well trying to deploy Kong's official docker image onto a Kubernetes cluster. I work around this issue by using the fully-qualified Kubernetes service name in /etc/kong/kong.yml. For example, I set the environment in my Kubernetes deployment yml:
env:
- name: DATABASE
value: postgres
- name: DATABASE_HOST
value: kong-database.zymbit.svc.cluster.local
And I have modified setup.sh to update kong.yml:
https://github.com/zymbit/docker-kong/commit/514a0a2df27772d2d46c4908885710db4da91b40
The Automated Build image with this change is at:
https://hub.docker.com/r/zymbit/kong/
Please let me know if this is useful and I will open a PR.
Thanks!
By the way, there is an issue specific to Ruby that is exposed when resolv.conf sets ndots > 1.
Kubernetes sets options ndots:5
Perhaps this is going on in the bowels of dnsmasq / lua?
Thx, rca. I can confirm that your changes fix the issue we were having on kubernetes
Encountered this issue in my Kubernetes clusters today as well.
After running in circles chasing dnsmasq, I finally gave up on trying to use the shorter DNS entry for my service my-service, and just ended up using the full thing my-service.default.svc.cluster.local
Nothing else I tried worked including pointing the resolver at kube-dns.kube-system.svc.cluster.local and a bunch of other things that I can't remember ;)
It's worth mentioning that Kong was running in the same namespace as the service I was trying to proxy to as well (default).
@thenayr
I change go-dnsmasq in k8s.
maybe nginx resolver is problem.
you can try it
spec:
containers:
### https://github.com/kubernetes/kubernetes/issues/26309
### use go-dnsmasq in pod
- name: go-dnsmasq
image: "janeczku/go-dnsmasq:release-1.0.5"
args:
- --listen
- "127.0.0.1:53"
- --default-resolver
- --append-search-domains
- --hostsfile=/etc/hosts
I'm running another Nginx proxy (with open-resty / Lua modules etc) and have 0 issues with Nginx DNS resolution inside Kubernets. All of my proxy_pass definitions are to the short DNS names. It's definitely specific to something with Kong.
Just came across this while looking into Kong as a possible solution for something. This sounds oddly familiar to a problem we had with running an open-resty solution for an api-proxy in a Kubernetes cluster. We ended solving it with having the nginx.conf be a jinja2 template that was parsed at start time (ugly, but solved other issues as well) that has :
resolver {{ nameserver }} valid=300s ipv6=off;
resolver_timeout 5s;
in the http block and the python start script that starts things up did a :
f = open('/etc/resolv.conf', 'r')
resolv = f.read()
f.close()
nameserver = None
for r in resolv.split('\n'):
if r.startswith('nameserver'):
nameserver = r.split(' ')[1]
break
to set the nameserver in the config (along with a bunch of other environment specific vars) then used the output as the live nginx.conf for runtime. This worked fine for doing the short name looking http://appname work fine where the open-resty api-proxy would run in different namespaces and talk to the various microservice pods in the same namespace via the short name.
Not sure it that will help at all, but might be a start path to head down.
In 0.10 we'll have our internal dns resolution, see #1587, used both from the cli and in nginx. So the inconsistencies should be resolved.
That branch is just up for review now and all tests pass. So if some of you are willing to give it a try, please do.
NOTE: see this code, the config read from resolv.conf is the nameservers, attempts, and timeout. So the shorter names with ndots options are not supported for now.
Really looking for feedback on this.
PS. to run the branch you'd need to manually install dns.lua which hasn't been published yet. See https://github.com/Mashape/dns.lua
@tieske, this sounds great. Any chance you can roll this up into a test image that I can docker pull?
@rca I just pushed it at docker pull mashape/kong:dns
Anyone else having an issue running the kong:dns branch?
Just tried it on docker-cloud, got:
2016-10-18T09:33:28.296046329Z /docker-entrypoint.sh: line 9: exec: kong: not found
2016-10-18T09:35:53.260789876Z /docker-entrypoint.sh: line 9: exec: kong: not found
It's mashape/kong:dns.
We use mashape/kong for our tests, while keeping the official kong image clean.
Am I missing something, this is the command I run:
$ run mashape/kong:dns
Unable to find image 'mashape/kong:dns' locally
dns: Pulling from mashape/kong
8d30e94188e7: Already exists
73cc12610219: Already exists
77d6169dca74: Already exists
587d89d67788: Already exists
Digest: sha256:0dea70bb87e4c51581874a4659f397c9cda2a0e4e4214f9e202ea6fbd7d90031
Status: Downloaded newer image for mashape/kong:dns
/docker-entrypoint.sh: line 9: exec: kong: not found`
I am experiencing that same issue running it with docker-compose locally.
docker-compose up
Recreating dev_kong_1
Attaching to dev_kong_1
kong_1 | /docker-entrypoint.sh: line 9: exec: kong: not found
dev_kong_1 exited with code 127
Same issue, running mashape/kong:dns in kubernetes:
2016-10-28T00:43:45.958914991Z /docker-entrypoint.sh: line 9: exec: kong: not found
Pulling and running mashape/kong seems to have solved this problem.
... almost.
Now I've got no problem connecting to the postgres database, but now upstream forwarding doesn't work.
My client:
$ http http://192.168.99.100:32057/sbex1/v1/
HTTP/1.1 502 Bad Gateway
Connection: keep-alive
Content-Type: text/plain; charset=UTF-8
Date: Fri, 28 Oct 2016 00:56:04 GMT
Server: kong/0.9.0
Transfer-Encoding: chunked
An invalid response was received from the upstream server
In the Kong logs:
==> logs/error.log <==
2016/10/28 00:56:04 [error] 87#0: *43 sbex1-app could not be resolved (2: Server failure), client: 172.17.0.1, server: kong, request: "GET /sbex1/v1/ HTTP/1.1", host: "192.168.99.100:32057"
2016/10/28 00:56:04 [info] 87#0: *43 client 172.17.0.1 closed keepalive connection
But dnsmasq can see it --
<dwlo kong]# nslookup -port=8053 sbex1-app localhost
Server: localhost
Address: 127.0.0.1#8053
Name: sbex1-app.test.svc.cluster.local
Address: 10.0.0.78
I suspect this one's harder to get around -- does nginx, itself, need to be told to use dnsmaq?
We have updated how we resolve DNS hostnames in > 0.10.x (and removed the dnsmasq dependency) - Can you try to replicate with 0.10.x and see if the problem has been fixed?
Any updates here, following the changes in 0.10.x?
We resolve DNS externally and only ever pass IPs to Kong, because of the
various DNS issues we encountered previously. I'm not in a great position
to test the fix right now, but we are not blocked by the issue that I
reported back then.
On Sun, Jun 4, 2017 at 10:31 PM, Robert notifications@github.com wrote:
Any updates here, following the changes in 0.10.x?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Mashape/kong/issues/1104#issuecomment-306095391, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABLW3I-F8utSGMgpIL9siCeynlXQWMp2ks5sA3amgaJpZM4H51Ue
.
@derrley understood - if you want to give it another try please let us know the outcome. Closing this issue for now, but happy to re-open it if Kong >= 0.10 still doesn't work.
Most helpful comment
I'm running another Nginx proxy (with open-resty / Lua modules etc) and have 0 issues with Nginx DNS resolution inside Kubernets. All of my
proxy_passdefinitions are to the short DNS names. It's definitely specific to something with Kong.