Kong: Different DNS resolution between startup and actual runtime

Created on 28 Mar 2016 · 36Comments · Source: Kong/kong

Problem

It seems that kong startup/migration is using different DNS resolution the actual run time (or Lua DNS resolution, I'm not sure what is doing the resolution). Example: I have an AWS internal hosted zone DNS by the name of qa2. If Kong config doesn't have the hosted zone suffix, e.g.:

contact_points:
    - "cassandra-01:9042"

Kong will start without any errors but when asked to answer a simple request, will fail:

$ curl http://localhost:8001/ 
"{message:An unexpected error occurred}"

The error log will show that it can't find the DB instance:

2016/03/28 11:18:47 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-04 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

Nevertheless, if I use a "proper" DNS name, everything works:

contact_points:
    - "cassandra-01.qa2:9042"

Like I mentioned before, it seems that the different part of kong are using different DNS resolution methods, that produce different behaviour. This is fairly confusing since you expect kong to fail from the beginning if the Cassandra DNS is bad.

Versions

Kong on docker: v0.7.0
Cassandra: v2.2.4

tasneeds-investigation

Source

noamelf

👍6

Most helpful comment

I'm running another Nginx proxy (with open-resty / Lua modules etc) and have 0 issues with Nginx DNS resolution inside Kubernets. All of my proxy_pass definitions are to the short DNS names. It's definitely specific to something with Kong.

thenayr on 1 Sep 2016

👍4

All 36 comments

Without investigation, the reason that seems obvious at first sight would be dnsmasq. Kong manages dnsmasq along Nginx. In the CLI, dnsmasq is not started, but once Kong started, dnsmasq is used as the resolver. From there, I suggest different approaches to try to fix this:

configure dnsmasq so that it can also resolve your AWS zone
disable dnsmasq and enter your resolver's address in kong.yml (probably easier)

thibaultcha on 28 Mar 2016

To better investigate this, you can use the following command to check how the address is being resolved by a DNS server:

$ nslookup -port={dns_server_port} {address_to_resolve} {dns_server_address}

$ nslookup -port=53 google.com 8.8.8.8

Now, you can start kong/dnsmasq and try to see what's the output of:

$ nslookup -port=8053 cassandra-01 127.0.0.1

Port 8053 is the default port that Kong uses for dnsmasq. Dnsmasq should follow the local resolution settings, so the output of these two commands should be the same:

$ nslookup -port=8053 cassandra-01 127.0.0.1 # Using dnsmasq
$ nslookup cassandra-01 # Using system settings

It seems like in your environment this is not the case, so we can see the output of these commands and take it from there.

subnetmarco on 28 Mar 2016

Ok, so I'm running following commands from within the Kong Docker container, and surprisingly they both show the correct address:

$ yum install bind-utils
....

$ nslookup cassandra-01
Server:         10.0.0.2
Address:        10.0.0.2#53

Non-authoritative answer:
Name:   cassandra-01.qa2
Address: 10.0.1.56

$ nslookup -port=8053 cassandra-01 127.0.0.1
Server:         127.0.0.1
Address:        127.0.0.1#8053

Non-authoritative answer:
Name:   cassandra-01.qa2
Address: 10.0.1.56

But Kong still doesn't resolve it correctly. error.log shows:

2016/03/30 12:55:09 [error] 115#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

noamelf on 30 Mar 2016

So two notes:

The DNS resolution must happen somewhere else, or something weird is happening, especially if both dnsmasq and the system DNS resolutions are the same.
Re-reading your issue, we have two distinct problems:
- A 500 error when loading the Admin API on 127.0.0.1:8001.
- A problem in the async job cluster.lua when pinging Cassandra, which shouldn't affect loading 127.0.0.1:8001, unless the problem is the same.

Can you see if in the error.log file there are more errors after invoking http://127.0.0.1:8001/?

subnetmarco on 30 Mar 2016

👍1

Sure.
Just launching the container, without invoking any request (but waiting a few minutes) shows the following errors:

$ cat /usr/local/kong/logs/error.log
2016/03/31 20:50:35 [error] 114#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/03/31 20:50:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:50:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:50:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:51:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:51:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:07 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:15 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:23 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:31 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:37 [error] 114#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/03/31 20:52:45 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:52:53 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:01 [error] 114#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/03/31 20:53:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:53:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:05 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer
2016/03/31 20:54:35 [error] 114#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra-01:9042: cassandra-01 could not be resolved (3: Host not found) for socket with peer cassandra-01:9042., context: ngx.timer

After making a request to http://127.0.0.1:8001/, the failed request also appears:

2016/03/31 20:56:41 [error] 114#0: *3 [lua] responses.lua:97: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 127.0.0.1, server: , request: "GET / HTTP/1.1", host: "127.0.0.1:8001"

See this gist for the full error.log.

noamelf on 31 Mar 2016

What happens if you make a request to http://127.0.0.1:8001/apis ?

subnetmarco on 1 Apr 2016

I receive an error message, and the log shows:

2016/04/04 08:37:15 [error] 114#0: *6 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra-01:9042: Host considered DOWN., client: 172.17.0.1, server: , request: "GET /apis HTTP/1.1", host: "localhost:8001"

noamelf on 4 Apr 2016

I am seeing the exact same behavior, also running mashape/kong:0.7.0

From inside the container, I see

sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.247.225

Kong starts up fine, correctly finding cassandra, but then the error log is full of similar errors to @noamelf :

2016/04/05 21:09:21 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/05 21:09:23 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/05 21:09:31 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:39 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:47 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/05 21:09:51 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:09:53 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/05 21:10:01 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer

rileylark on 5 Apr 2016

configure dnsmasq so that it can also resolve your AWS zone

disable dnsmasq and enter your resolver's address in kong.yml (probably easier)

I should note that I am trying to do the second thing, with the following config (full kong.yml at https://gist.github.com/rileylark/a22e2146b85dec3faa9a7c82162b3b29 )

dns_resolver: server
dns_resolvers_available:
  server:
    address: "10.0.0.10:53"
cassandra:
  contact_points:
    - "cassandra:9042"

With these settings, kong can find cassandra while it's joining the node, but cluster.lua and hooks.lua cannot find cassandra at runtime

rileylark on 6 Apr 2016

@rileylark ark in your case what's the output of:

$ nslookup -port=8053 cassandra 127.0.0.1

and is it the same as:

$ nslookup cassandra

subnetmarco on 6 Apr 2016

@thefosk Ah, of course I should have thought of that before. Sorry! So my case is different after all:

sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
;; connection timed out; trying next origin
^C


sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2#

So it feels like the kong.yml file is being respected at some levels of kong, but then deep down somewhere kong is still trying to resolve names at 127.0.0.1:8053 .

So, maybe to progress at this point, I should focus on configuring the local dnsmasq instead of trying to disable it and provide my own nameserver?

rileylark on 6 Apr 2016

Ok, I tried NOT providing a custom nameserver, and instead letting dnsmasq do the trick, and now I get results exactly like @noamelf.

kong.yml is exactly copy/pasted from exact copy/paste from https://github.com/Mashape/docker-kong/blob/master/config.docker/kong.yml :

    cassandra:
      contact_points:
        - "cassandra:9042"
    nginx: |
      [...snip...]

Docker container output:

[INFO] Kong 0.7.0
[INFO] Using configuration: /etc/kong/kong.yml
[INFO] Setting working directory to /usr/local/kong
[INFO] database...........cassandra keyspace=kong ssl=verify=false enabled=false replication_factor=1 contact_points=cassandra:9042 replication_strategy=SimpleStrategy timeout=5000 data_centers=
[INFO] dnsmasq............address=127.0.0.1:8053 dnsmasq=true port=8053
[INFO] serf ..............-profile=wan -rpc-addr=127.0.0.1:7373 -event-handler=member-join,member-leave,member-failed,member-update,member-reap,user:kong=/usr/local/kong/serf_event.sh -bind=0.0.0.0:7946 -node=kong-3591152686-oc2ik_0.0.0.0:7946 -log-level=err
[INFO] Trying to auto-join Kong nodes, please wait..
[INFO] Successfully auto-joined 10.0.1.4:7946

sh-4.2# nslookup cassandra
Server:     10.3.240.10
Address:    10.3.240.10#53

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2# nslookup -port=8053 cassandra 127.0.0.1
Server:     127.0.0.1
Address:    127.0.0.1#8053

Name:   cassandra.default.svc.cluster.local
Address: 10.3.249.185

sh-4.2#

sh-4.2# cat /usr/local/kong/logs/error.log 
2016/04/06 17:55:00 [error] 113#0: *1 [lua] hooks.lua:102: member_join(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:55:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:55:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:55:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:55:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:10 [error] 113#0: *3 [lua] hooks.lua:75: member_update(): Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, client: 127.0.0.1, server: , request: "POST /cluster/events/ HTTP/1.1", host: "0.0.0.0:8001"
2016/04/06 17:56:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:32 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:56:40 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:48 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:56:54 [error] 113#0: *4 [lua] responses.lua:97: handler(): Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., client: 10.0.2.4, server: , request: "GET /apis HTTP/1.1", host: "kong:8001"
2016/04/06 17:56:56 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:02 [error] 113#0: [lua] cluster.lua:38: Cassandra error: No query id found in cache for prepared query, context: ngx.timer
2016/04/06 17:57:10 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:18 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:26 [error] 113#0: [lua] cluster.lua:38: Cassandra error: Could not create lock for prepare request: Error locking mutex: timeout, context: ngx.timer
2016/04/06 17:57:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:00 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
2016/04/06 17:58:30 [error] 113#0: [lua] cluster.lua:80: Cassandra error: All hosts tried for query failed. cassandra:9042: cassandra could not be resolved (2: Server failure) for socket with peer cassandra:9042., context: ngx.timer
sh-4.2#

rileylark on 6 Apr 2016

This is weird and I will need more time to replicate the issue and trying to figure it out.

subnetmarco on 8 Apr 2016

@thefosk Yesterday I got a vanilla nginx setup working in the exact same kubernetes cluster, which I mention just to confirm that nginx can work in this environment w/o any extra config.

If you have a Kubernetes cluster running already (easy to set up on Google Container Engine), you can get an exact repro of my situation from my yaml files at https://github.com/peardeck/kubernetes-boilerplate/tree/more-kong/kong

First, kubectl apply -f cassandra.yaml which will start a cassandra db and expose it at cassandra:9042. Wait for that to start successfully (~1m for me) and then run kubectl apply -f kong.yaml. From there I think you'll be able to see the behavior that Noam & I are seeing.

I'm happy to help however I can - let me know if you can think of ways for me to provide more info or debugging context!

rileylark on 9 Apr 2016

I got the same problem when running Kong-with-postgres-backend on Kubernetes.
My dns resolve is ok:

[root@kong-n6t3q /]# nslookup -port=8053 kong-database 127.0.0.1  
Server:         127.0.0.1
Address:        127.0.0.1#8053

Name:   kong-database.mailship.svc.cluster.local
Address: 10.233.30.140

[root@kong-n6t3q /]# nslookup kong-database                            
Server:         10.233.0.2
Address:        10.233.0.2#53

Non-authoritative answer:
Name:   kong-database.mailship.svc.cluster.local
Address: 10.233.30.140

but i still got error:

[root@kong-n6t3q /]# tail -f /usr/local/kong/logs/error.log      
2016/06/06 16:22:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:23:58 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
2016/06/06 16:24:28 [error] 271#0: [lua] cluster.lua:79: kong-database could not be resolved (3: Host not found), context: ngx.timer
...
2016/06/06 16:28:59 [error] 270#0: *10 [lua] responses.lua:99: execute(): kong-database could not be resolved (3: Host not found), client: 10.3.18.104, server: _, request: "GET / HTTP/1.1", host: "10.3.18.104:32723"

Version:

Kong: v0.8.0
Postgres: v9.4

minhchuduc on 6 Jun 2016

I ran into this problem as well trying to deploy Kong's official docker image onto a Kubernetes cluster. I work around this issue by using the fully-qualified Kubernetes service name in /etc/kong/kong.yml. For example, I set the environment in my Kubernetes deployment yml:

          env:
            - name: DATABASE
              value: postgres
            - name: DATABASE_HOST
              value: kong-database.zymbit.svc.cluster.local

And I have modified setup.sh to update kong.yml:

https://github.com/zymbit/docker-kong/commit/514a0a2df27772d2d46c4908885710db4da91b40

The Automated Build image with this change is at:

https://hub.docker.com/r/zymbit/kong/

Please let me know if this is useful and I will open a PR.

Thanks!

rca on 15 Jul 2016

👍3

By the way, there is an issue specific to Ruby that is exposed when resolv.conf sets ndots > 1.

Kubernetes sets options ndots:5

Perhaps this is going on in the bowels of dnsmasq / lua?

https://bugzilla.redhat.com/show_bug.cgi?id=1132704

rca on 15 Jul 2016

Thx, rca. I can confirm that your changes fix the issue we were having on kubernetes

tvschuer on 2 Aug 2016

Encountered this issue in my Kubernetes clusters today as well.

After running in circles chasing dnsmasq, I finally gave up on trying to use the shorter DNS entry for my service my-service, and just ended up using the full thing my-service.default.svc.cluster.local

Nothing else I tried worked including pointing the resolver at kube-dns.kube-system.svc.cluster.local and a bunch of other things that I can't remember ;)

It's worth mentioning that Kong was running in the same namespace as the service I was trying to proxy to as well (default).

thenayr on 1 Sep 2016

@thenayr
I change go-dnsmasq in k8s.
maybe nginx resolver is problem.

you can try it

    spec:
      containers:
      ### https://github.com/kubernetes/kubernetes/issues/26309
      ### use go-dnsmasq in pod
      - name: go-dnsmasq
        image: "janeczku/go-dnsmasq:release-1.0.5"
        args:
          - --listen
          - "127.0.0.1:53"
          - --default-resolver
          - --append-search-domains
          - --hostsfile=/etc/hosts

gavinzhou on 1 Sep 2016

thenayr on 1 Sep 2016

👍4

Just came across this while looking into Kong as a possible solution for something. This sounds oddly familiar to a problem we had with running an open-resty solution for an api-proxy in a Kubernetes cluster. We ended solving it with having the nginx.conf be a jinja2 template that was parsed at start time (ugly, but solved other issues as well) that has :

    resolver {{ nameserver }} valid=300s ipv6=off;
    resolver_timeout 5s;

in the http block and the python start script that starts things up did a :

f = open('/etc/resolv.conf', 'r')
resolv = f.read()
f.close()

nameserver = None

for r in resolv.split('\n'):
  if r.startswith('nameserver'):
    nameserver = r.split(' ')[1]
    break

to set the nameserver in the config (along with a bunch of other environment specific vars) then used the output as the live nginx.conf for runtime. This worked fine for doing the short name looking http://appname work fine where the open-resty api-proxy would run in different namespaces and talk to the various microservice pods in the same namespace via the short name.

Not sure it that will help at all, but might be a start path to head down.

nshttpd on 16 Sep 2016

In 0.10 we'll have our internal dns resolution, see #1587, used both from the cli and in nginx. So the inconsistencies should be resolved.

That branch is just up for review now and all tests pass. So if some of you are willing to give it a try, please do.

NOTE: see this code, the config read from resolv.conf is the nameservers, attempts, and timeout. So the shorter names with ndots options are not supported for now.

Really looking for feedback on this.

PS. to run the branch you'd need to manually install dns.lua which hasn't been published yet. See https://github.com/Mashape/dns.lua

Tieske on 16 Sep 2016

👍1

@tieske, this sounds great. Any chance you can roll this up into a test image that I can docker pull?

rca on 16 Sep 2016

@rca I just pushed it at docker pull mashape/kong:dns

subnetmarco on 29 Sep 2016

👍2

Anyone else having an issue running the kong:dns branch?
Just tried it on docker-cloud, got:
2016-10-18T09:33:28.296046329Z /docker-entrypoint.sh: line 9: exec: kong: not found 2016-10-18T09:35:53.260789876Z /docker-entrypoint.sh: line 9: exec: kong: not found

tvschuer on 18 Oct 2016

👍1

It's mashape/kong:dns.

We use mashape/kong for our tests, while keeping the official kong image clean.

subnetmarco on 19 Oct 2016

Am I missing something, this is the command I run:

$ run mashape/kong:dns
Unable to find image 'mashape/kong:dns' locally
dns: Pulling from mashape/kong
8d30e94188e7: Already exists 
73cc12610219: Already exists 
77d6169dca74: Already exists 
587d89d67788: Already exists 
Digest: sha256:0dea70bb87e4c51581874a4659f397c9cda2a0e4e4214f9e202ea6fbd7d90031
Status: Downloaded newer image for mashape/kong:dns
/docker-entrypoint.sh: line 9: exec: kong: not found`

tvschuer on 19 Oct 2016

I am experiencing that same issue running it with docker-compose locally.

docker-compose up
Recreating dev_kong_1
Attaching to dev_kong_1
kong_1  | /docker-entrypoint.sh: line 9: exec: kong: not found
dev_kong_1 exited with code 127

kendavis2 on 21 Oct 2016

Same issue, running mashape/kong:dns in kubernetes:

2016-10-28T00:43:45.958914991Z /docker-entrypoint.sh: line 9: exec: kong: not found

derrley on 28 Oct 2016

Pulling and running mashape/kong seems to have solved this problem.

derrley on 28 Oct 2016

... almost.

Now I've got no problem connecting to the postgres database, but now upstream forwarding doesn't work.

My client:

$ http http://192.168.99.100:32057/sbex1/v1/
HTTP/1.1 502 Bad Gateway
Connection: keep-alive
Content-Type: text/plain; charset=UTF-8
Date: Fri, 28 Oct 2016 00:56:04 GMT
Server: kong/0.9.0
Transfer-Encoding: chunked

An invalid response was received from the upstream server

In the Kong logs:

==> logs/error.log <==
2016/10/28 00:56:04 [error] 87#0: *43 sbex1-app could not be resolved (2: Server failure), client: 172.17.0.1, server: kong, request: "GET /sbex1/v1/ HTTP/1.1", host: "192.168.99.100:32057"
2016/10/28 00:56:04 [info] 87#0: *43 client 172.17.0.1 closed keepalive connection

But dnsmasq can see it --

<dwlo kong]# nslookup -port=8053 sbex1-app localhost
Server:     localhost
Address:    127.0.0.1#8053

Name:   sbex1-app.test.svc.cluster.local
Address: 10.0.0.78

I suspect this one's harder to get around -- does nginx, itself, need to be told to use dnsmaq?

derrley on 28 Oct 2016

We have updated how we resolve DNS hostnames in > 0.10.x (and removed the dnsmasq dependency) - Can you try to replicate with 0.10.x and see if the problem has been fixed?

subnetmarco on 19 May 2017

👍1

Any updates here, following the changes in 0.10.x?

p0pr0ck5 on 5 Jun 2017

We resolve DNS externally and only ever pass IPs to Kong, because of the
various DNS issues we encountered previously. I'm not in a great position
to test the fix right now, but we are not blocked by the issue that I
reported back then.

On Sun, Jun 4, 2017 at 10:31 PM, Robert notifications@github.com wrote:

Any updates here, following the changes in 0.10.x?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Mashape/kong/issues/1104#issuecomment-306095391, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABLW3I-F8utSGMgpIL9siCeynlXQWMp2ks5sA3amgaJpZM4H51Ue
.

derrley on 5 Jun 2017

@derrley understood - if you want to give it another try please let us know the outcome. Closing this issue for now, but happy to re-open it if Kong >= 0.10 still doesn't work.

subnetmarco on 25 Jul 2017

Was this page helpful?

0 / 5 - 0 ratings