Linkerd2: Issue with ExternalName services

Created on 12 Apr 2018 · 19Comments · Source: linkerd/linkerd2

I recently tried running conduit against our kubernetes cluster running locally on minikube and ran into an issue where the conduit proxy is returning a 500 for requests that succeed without the proxy in place. I've spent some time coming up with a minimal repro case and I believe I've determined that the conduit proxy doesn't correctly handle services that use an externalName to point to a service in a different namespace.

To start, this is the error I'm seeing in the logs for the conduit-proxy container that sits alongside my app:

INFO conduit_proxy using controller at HostAndPort { host: DnsName("proxy-api.conduit.svc.cluster.local"), port: 8086 }
INFO conduit_proxy routing on V4(127.0.0.1:4140)
INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
INFO conduit_proxy::transport::connect "controller-client", DNS resolved DnsName("proxy-api.conduit.svc.cluster.local") to 10.109.40.159
ERR! conduit_proxy turning operation timed out after Duration { secs: 10, nanos: 0 } into 500

Versions

| thing | version |
| -- | -- |
| conduit client | v0.3.1 |
| conduit server | v0.3.1 |
| kubernetes | v1.9.0 |
| minikube | v0.25.0 |

Minimal repro

I start by applying the following resources to my cluster. Basically, I've got app1, which lives in namespace, that will read the APP2_HOST environment variable to determine where to make calls under the hood to app2 which lives in namespace2.

# ------------------------------------------------------------------------------
# Namespaces
# ------------------------------------------------------------------------------
---
apiVersion: v1
kind: Namespace
metadata:
  name: namespace1

---
apiVersion: v1
kind: Namespace
metadata:
  name: namespace2

# ------------------------------------------------------------------------------
# Services
# ------------------------------------------------------------------------------
---
apiVersion: v1
kind: Service
metadata:
  name: app1
  namespace: namespace1
spec:
  selector:
    app: app1
  ports:
  - name: http
    port: 8000

---
apiVersion: v1
kind: Service
metadata:
  name: app2
  namespace: namespace1
spec:
  type: ExternalName
  externalName: app2.namespace2.svc.cluster.local

---
apiVersion: v1
kind: Service
metadata:
  name: app2
  namespace: namespace2
spec:
  ports:
  - port: 8000
  selector:
    app: app2

# ------------------------------------------------------------------------------
# Deployments
# ------------------------------------------------------------------------------
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app1
  namespace: namespace1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app1
  template:
    metadata:
      namespace: namespace1
      labels:
        app: app1
    spec:
      containers:
      - name: name
        image: jondlm/app1:latest
        env:
        - name: APP2_HOST
          value: app2:8000
        ports:
        - containerPort: 8000

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app2
  namespace: namespace2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app2
  template:
    metadata:
      namespace: namespace2
      labels:
        app: app2
    spec:
      containers:
      - name: name
        image: citizenstig/httpbin
        ports:
        - containerPort: 8000

Below is the source code for app1. It basically just calls app2 and returns that response.

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
)

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
        r, err := http.Get(fmt.Sprintf("http://%s%s", os.Getenv("APP2_HOST"), req.URL.Path))
        if err != nil {
            return
        }
        defer r.Body.Close()

        body, _ := ioutil.ReadAll(r.Body)
        w.WriteHeader(r.StatusCode)
        w.Write(body)
    })
    fmt.Println("listening on port 8000")
    http.ListenAndServe("0.0.0.0:8000", nil)
}

app2 is the same thing that runs http://httpbin.org. It just returns some JSON.

So, after all that stuff is running in my cluster, I run the following to see the request succeeds:

$ kubectl -n namespace1 port-forward $(kubectl -n namespace1 get pod -l app=app1 -o jsonpath='{.items[0].metadata.name}') 8000:8000
Forwarding from 127.0.0.1:8000 -> 8000

$ time curl -v localhost:8000/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /get HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 11 Apr 2018 23:19:22 GMT
< Content-Length: 197
< Content-Type: text/plain; charset=utf-8
<
{
  "args": {},
  "headers": {
    "Accept-Encoding": "gzip",
    "Host": "app2:8000",
    "User-Agent": "Go-http-client/1.1"
  },
  "origin": "172.17.0.25",
  "url": "http://app2:8000/get"
}
* Connection #0 to host localhost left intact
curl -v localhost:8000/get  0.00s user 0.00s system 29% cpu 0.026 total

As you can see, that call works. Next I inject the conduit proxy (lines 57-81 is the app1 deployment):

$ cat kubernetes.yaml | sed -n 57,81p | conduit inject - | kubectl apply -f -
deployment "app1" configured

After that proxy has joined the mesh successfully, I again forward the same port and attempt the call again:

$ time curl -v localhost:8000/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /get HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 11 Apr 2018 23:13:23 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact
curl -v localhost:8000/get  0.00s user 0.01s system 0% cpu 10.038 total

This time you can see the call fails after about 10 seconds. If I make the following change to the app1 deployment to go directly to app2.namespace2 the call will again succeed.

diff --git a/kubernetes.yaml b/kubernetes.yaml
index c1592ca..e595d0a 100644
--- a/kubernetes.yaml
+++ b/kubernetes.yaml
@@ -76,7 +76,7 @@ spec:
         image: jondlm/app1:latest
         env:
         - name: APP2_HOST
-          value: app2:8000
+          value: app2.namespace2:8000
         ports:

So, this leads me to the conclusion that conduit is choking on the app2 service that lives in namespace1 that is an externalName to app2.namespace2.svc.cluster.local.

Thank you for you time! Looking forward to being able to really kick the tires on conduit.

Source

jondlm

All 19 comments

@jondlm thank you for this outstanding report!

To confirm that I understand the issue: your application, in _namespace1_, is sending a request with Host: app2; and app2.namespace1 is an ExternalName refering app2.namespace2.svc.cluster.local. And, when the proxy tries to route such a request, it fails with a 500 after 10 seconds.

This sounds like a problem with our service discovery service. There are a number of fixes on master (such as for #155) that improve service discovery behavior. If you're feeling especially adventurous, you can try to run master off of a docker tag like git-259fdcd1.

For example:

conduit install --conduit-version=git-259fdcd1 | ...
conduit inject --conduit-version=git-259fdcd1 k8s.yml | ...

In the meantime, we can try to reproduce this behavior ourselves to debug further.

Thanks again for trying Conduit and especially for taking the time to write a detailed issue!

olix0r on 12 Apr 2018

It does appear that this issue is resolved on master. Let me describe how I tested this:

start docker-for-mac with kubernetes enabled
Install conduit: conduit install -v git-259fdcd1
Install the emojivoto app in a test namespace, with conduit: curl https://raw.githubusercontent.com/runconduit/conduit-examples/master/emojivoto/emojivoto.yml | conduit inject - | kubectl apply -f -.
Once that starts, disable the traffic generator so we know what we're looking at: kubectl delete deploy/vote-bot -n emojivoto

At this point, the emojivoto namespace is populated with several services:

NAME            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/emoji    1         1         1            1           5d
deploy/voting   1         1         1            1           5d
deploy/web      1         1         1            1           5d

NAME                   DESIRED   CURRENT   READY     AGE
rs/emoji-6676787fcd    0         0         0         5d
rs/emoji-6c694694bb    0         0         0         5d
rs/emoji-cb55477c4     1         1         1         13m
rs/voting-56c55b5c7b   0         0         0         5d
rs/voting-7c4948f5cb   1         1         1         13m
rs/voting-7f6d846b56   0         0         0         5d
rs/web-59956dc84d      0         0         0         5d
rs/web-5fcbcd8dd       1         1         1         13m
rs/web-794d7676d4      0         0         0         5d

NAME            AGE
deploy/emoji    5d
deploy/voting   5d
deploy/web      5d

NAME                   AGE
rs/emoji-6676787fcd    5d
rs/emoji-6c694694bb    5d
rs/emoji-cb55477c4     13m
rs/voting-56c55b5c7b   5d
rs/voting-7c4948f5cb   13m
rs/voting-7f6d846b56   5d
rs/web-59956dc84d      5d
rs/web-5fcbcd8dd       13m
rs/web-794d7676d4      5d

NAME                         READY     STATUS    RESTARTS   AGE
po/emoji-cb55477c4-8gfgb     2/2       Running   0          13m
po/voting-7c4948f5cb-x4rqr   2/2       Running   1          13m
po/web-5fcbcd8dd-626fn       2/2       Running   0          13m

NAME             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
svc/emoji-svc    ClusterIP      None            <none>        8080/TCP       5d
svc/voting-svc   ClusterIP      None            <none>        8080/TCP       5d
svc/web-svc      LoadBalancer   10.99.131.238   localhost     80:30994/TCP   5d

Then, I created an application instance in the bot namespace. This namespace includes the emojivoto ExternName Service.

NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/voter   1         1         1            1           9m

NAME                  DESIRED   CURRENT   READY     AGE
rs/voter-694667bb85   0         0         0         9m
rs/voter-cfdfc8859    1         1         1         5m

NAME           AGE
deploy/voter   9m

NAME                  AGE
rs/voter-694667bb85   9m
rs/voter-cfdfc8859    5m

NAME                       READY     STATUS    RESTARTS   AGE
po/voter-cfdfc8859-qj59s   2/2       Running   0          5m

NAME            TYPE           CLUSTER-IP   EXTERNAL-IP                           PORT(S)   AGE
svc/emojivoto   ExternalName   <none>       web-svc.emojivoto.svc.cluster.local   <none>    9m

I see stats on the dashboard at this point, but to confirm it manually, I logged into the pod:

:; kubectl -n bot exec -c vote-bot $(kubectl get pods --no-headers -n bot |awk '{print $1}') -i -t bash
root@voter-cfdfc8859-qj59s:/# curl -sv http://emojivoto/ -o /dev/null
* Hostname was NOT found in DNS cache
*   Trying 10.99.131.238...
* Connected to emojivoto (10.99.131.238) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: emojivoto
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: text/html
< date: Thu, 12 Apr 2018 00:31:12 GMT
< content-length: 560
< 
{ [data not shown]
* Connection #0 to host emojivoto left intact

🎉 🎉 🎉

So, it seems to me that we've fixed this on master. It would be helpful, though, to get confirmation that my reproduction is accurate.

We're just wrapping up the 0.4 release, so I'd expect a fix out in the coming days.

Appendix:

bot.yml:

---
apiVersion: v1
kind: Namespace
metadata:
  name: bot

---
apiVersion: v1
kind: Service
metadata:
  name: emojivoto
  namespace: bot
spec:
  type: ExternalName
  externalName: web-svc.emojivoto.svc.cluster.local

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  name: voter
  namespace: bot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: voter
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: voter
    spec:
      containers:
      - command:
        - emojivoto-vote-bot
        env:
        - name: WEB_HOST
          value: emojivoto
        image: buoyantio/emojivoto-web:v3
        name: vote-bot

olix0r on 12 Apr 2018

Oh wow, thank you for such a fast and detailed response! That's awesome to hear. I will try out master tomorrow and report back with the results!

jondlm on 12 Apr 2018

Oddly enough, I was able to run the emojivoto stuff as your described and also found that it was working. But when I applied _git-259fdcd1_ to my cluster, it didn't fix the problem. After much head scratching and digging, I think the problem only shows up when talking over an explicitly declared port. I was able to reproduce the problem on your example by modifying the web-svc service in the emojivoto namespace with:

...
    port: 8000
    targetPort: 80
...

And then modifying the voter deployment in the bot namespace with:

...
        env:
        - name: WEB_HOST
          value: emojivoto:8000
...

On the vote-bot container it's showing the following every 10 seconds ish:

unexpected end of JSON input

I also cranked up the logging on the conduit proxy with CONDUIT_PROXY_LOG=trace and here is what I saw:

TRCE conduit_proxy::control::discovery "controller-client", checking DNS for DnsNameAndPort { host: Name(Name("emojivoto")), port: 8000 }
TRCE tower_reconnect "controller-client", poll response
DBUG tokio_core::reactor::timeout_token "controller-client", cancel timeout 4
DBUG tokio_core::reactor "controller-client", cancel a timeout: 4
TRCE h2::proto::streams::recv "controller-client", release_capacity; size=5
TRCE h2::proto::streams::recv "controller-client", release_connection_capacity; size=5
TRCE tower_h2::recv_body "controller-client", Data::drop: releasing capacity: 0
TRCE h2::proto::streams::recv "controller-client", release_capacity; size=0
TRCE h2::proto::streams::recv "controller-client", release_connection_capacity; size=0
TRCE conduit_proxy::control::telemetry "controller-client", report sent to controller in Duration { secs: 0, nanos: 4034853 }
TRCE tower_reconnect "controller-client", poll_ready; connected
TRCE tower_reconnect "controller-client", poll_ready; ready
TRCE conduit_proxy::telemetry::control "controller-client", poll
TRCE conduit_proxy::telemetry::control "controller-client", recv.poll(Receiver { capacity: AtomicUsize(10000) })
DBUG tokio_core::reactor "controller-client", updating a timeout: 0
DBUG tokio_core::reactor "controller-client", scheduling direction for: 0
DBUG tokio_core::reactor "controller-client", blocking
DBUG tokio_core::reactor "controller-client", scheduling direction for: 1
DBUG tokio_core::reactor "controller-client", blocking
DBUG tokio_core::reactor loop process - 1 events, Duration { secs: 0, nanos: 120471 }
DBUG tokio_core::reactor loop poll - Duration { secs: 0, nanos: 1077 }
DBUG tokio_core::reactor loop time - Instant { tv_sec: 179147, tv_nsec: 737635118 }
TRCE tokio_core::reactor event Readable Token(13)
TRCE h2::proto::settings ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), send_pending_ack; pending=None
TRCE h2::codec::framed_read ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), poll
DBUG tokio_core::reactor ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), scheduling direction for: 3
DBUG tokio_core::reactor ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), blocking
TRCE h2::proto::streams::prioritize ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), try reclaim frame
TRCE h2::proto::streams::prioritize ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), poll_complete
TRCE h2::proto::streams::prioritize ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), schedule_pending_open
TRCE h2::proto::streams::prioritize ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), pop_frame
TRCE h2::codec::framed_write ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), flush
TRCE h2::codec::framed_write ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), flushing buffer
TRCE h2::proto::streams::prioritize ("controller-client", "HostAndPort { host: DnsName(Name(Name(\"proxy-api.conduit.svc.cluster.local\"))), port: 8086 }"), try reclaim frame
DBUG tokio_core::reactor loop process - 1 events, Duration { secs: 0, nanos: 80527 }
DBUG tokio_core::reactor loop poll - Duration { secs: 0, nanos: 692 }
DBUG tokio_core::reactor loop time - Instant { tv_sec: 179147, tv_nsec: 737717817 }
DBUG tokio_core::reactor loop process - 0 events, Duration { secs: 0, nanos: 2113 }
DBUG tokio_core::reactor loop poll - Duration { secs: 9, nanos: 997487644 }
DBUG tokio_core::reactor loop time - Instant { tv_sec: 179148, tv_nsec: 611266861 }
TRCE tokio_core::reactor firing timeout: 0
DBUG tokio_core::reactor notifying a task handle
DBUG tokio_core::reactor loop process - 0 events, Duration { secs: 0, nanos: 44294 }
DBUG tokio_core::reactor loop poll - Duration { secs: 0, nanos: 5692 }
DBUG tokio_core::reactor loop time - Instant { tv_sec: 179148, tv_nsec: 611318738 }
TRCE tokio_core::reactor event Readable Token(7)
TRCE hyper::proto::h1::dispatch Dispatcher::poll
TRCE hyper::proto::h1::conn read_keep_alive; is_mid_message=true
TRCE tower_reconnect poll response
DBUG tokio_core::reactor scheduling direction for: 4
DBUG tokio_core::reactor blocking
DBUG tokio_core::reactor dropping I/O source: 4
DBUG tokio_core::reactor::timeout_token cancel timeout 0
DBUG tokio_core::reactor cancel a timeout: 0
ERR! conduit_proxy turning operation timed out after 10s into 500

Kinda makes me think there's another bug with DnsNameAndPort resolution.

jondlm on 12 Apr 2018

Excellent find, @jondlm! I'm doubtful that we'll be able to resolve this in time for the upcoming release, but we should be able to fix this shortly thereafter.

olix0r on 12 Apr 2018

👍1

The problem is as follows:

When we resolve the name _emojivoto_ in the namespace _bot_, the proxy resolves _emojivoto.bot.svc.cluster.local_ through the controller.
The controller resolves the external name through DNS: https://github.com/runconduit/conduit/blob/624b87f743bae53ed16385e6979d8daf1096f783/controller/destination/k8s_resolver.go#L83-L91
This bypasses kubernetes Service resolution, which is aware of port mappings: https://github.com/runconduit/conduit/blob/624b87f743bae53ed16385e6979d8daf1096f783/controller/destination/k8s_resolver.go#L73-L81

To resolve this, we must try to handle ExternalName services recursively to detect whether the named destination is actually another local kubernetes service.

olix0r on 12 Apr 2018

👍1

@olix0r Please see https://github.com/runconduit/conduit/issues/571#issuecomment-379356516. I suspect that if we simply remove all the code in the Destination service server that deals with ExternalNames, then everything will work correctly, where "correctly" means "like it did when conduit wasn't injected."

In particular, I guess kube-proxy/kube-dns is doing the port translation, which kind of surprises me, but it makes sense.

briansmith on 12 Apr 2018

@briansmith You're right that removing the external name handling in the controller is the shortest path to fixing this (i.e. making behavior no worse than it is today). Though, I'll note that we should ideally resolve this as a kubernetes service so that we can do load balancing and proper dst metric labeling for stats.

I propose the following:

implement an integration test to capture the broken case.
Remove the current ExternalName support to fix the test.
After that lands on master, we can follow up with a more thorough solution (i.e. along the lines of recursive resolution).

olix0r on 12 Apr 2018

👍1

I agree with your proposed resolution. Definitely if there's any significant benefit to doing things differently than I suggested, then we should do so. I assume you'll file an issue for #3 and we can flesh out the details there.

Note that we had some difficulty implementing the integration test in the test suite initially which is why there isn't one right now.

briansmith on 12 Apr 2018

I'm not sure that it is actually valid to specify port mappings for ExternalName services. The documentation says that ExternalName services are name aliases only and translate directly to CNAME which doesn't allow port mapping.

Two examples:

https://kubernetes.io/docs/concepts/services-networking/service/ says (emphasis mine):

An ExternalName service is a special case of service that does not have selectors. It does not define any ports or Endpoints. Rather, it serves as a way to return an alias to an external service residing outside the cluster.

https://github.com/kubernetes/dns/blob/master/docs/specification.md#25---records-for-external-name-services also says that ExternalNames get translated directly to CNAME.

So, we should figure out if Kubernetes actually intends to support port mapping for ExternalName services. If not, then load balancing, at least, will work perfectly without special support on Conduit's behalf.

briansmith on 12 Apr 2018

To be clear, I'm not specifying a port mapping on the ExternalName service. The ExternalName service just happens to resolve to another ClusterIP service that does the port mapping.

I suspect, and I will try to confirm this, that even explicitly specifying http://emojivoto:80 will have the same issue with no port mappings involved.

jondlm on 12 Apr 2018

👍1

To be clear, I'm not specifying a port mapping on the ExternalName service. The ExternalName service just happens to resolve to another ClusterIP service that does the port mapping.

Thanks for clarifying that.

I agree that when an ExternalName is actually an alias for a k8s service, we'd need some custom recursion feature to do full load balancing, and full metrics, etc.

briansmith on 12 Apr 2018

My suspicion about the explicit port :80 hasn't really panned out; I can only reproduce the issue when switching to a different port. I suspect the root issue has more to do with the recursive resolution you were mentioning.

jondlm on 12 Apr 2018

@jondlm i just spent some time trying to reproduce the issue you as you described it, but I think I'm still missing something in my setup:

I updated to svc/web-svc definition to do a port mapping:

---
apiVersion: v1
kind: Service
metadata:
  name: web-svc
  namespace: emojivoto
spec:
  type: LoadBalancer
  selector:
    app: web-svc
  ports:
  - name: http
    port: 80
    targetPort: 8000

Updated deploy/web-svc to listen on 8000

  template:
    metadata:
      creationTimestamp: null
      labels:
        app: web-svc
    spec:
      containers:
      - env:
        - name: WEB_PORT
          value: "8000"
...

Ensure that deploy/voter is configured with emojivoto:80:

    spec:
      containers:
      - command:
        - emojivoto-vote-bot
        env:
        - name: WEB_HOST
          value: emojivoto:80
        image: buoyantio/emojivoto-web:v3
        name: vote-bot

Deploy the emojivoto and bot namespaces as before:

#; doe ~/b/conduit  (ver/external-name-disco-test ✔)
:; kubectl describe svc/web-svc -n emojivoto
Name:                     web-svc
Namespace:                emojivoto
Labels:                   <none>
Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"web-svc","namespace":"emojivoto"},"spec":{"ports":[{"name":"http","port":80,"t...
Selector:                 app=web-svc
Type:                     LoadBalancer
IP:                       10.110.36.52
LoadBalancer Ingress:     localhost
Port:                     http  80/TCP
TargetPort:               8000/TCP
NodePort:                 http  31376/TCP
Endpoints:                10.1.2.102:8000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
#; doe ~/b/conduit  (ver/external-name-disco-test ✔)
:; kubectl describe svc/emojivoto -n bot
Name:              emojivoto
Namespace:         bot
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"emojivoto","namespace":"bot"},"spec":{"externalName":"web-svc.emojivoto.svc.cl...
Selector:          <none>
Type:              ExternalName
IP:
External Name:     web-svc.emojivoto.svc.cluster.local
Session Affinity:  None
Events:            <none>

I'm able to make calls to emojivoto from the bot namespace, without issue:

:; kubectl -n bot exec -c vote-bot $(kubectl get pods --no-headers -n bot |awk '{print $1}') -i -t sh
# curl -v emojivoto
* Rebuilt URL to: emojivoto/
* Hostname was NOT found in DNS cache
*   Trying 10.110.36.52...
* Connected to emojivoto (10.110.36.52) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: emojivoto
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: text/html
< date: Thu, 12 Apr 2018 23:16:04 GMT
< content-length: 560
< 

    <!DOCTYPE html>
    <html>
        <head>
            <meta charset="UTF-8">
            <title>Emoji Vote</title>
            <link rel="icon" href="/img/favicon.ico">

            <script async src="https://www.googletagmanager.com/gtag/js?id=UA-60040560-4"></script>
            <script>
              window.dataLayer = window.dataLayer || [];
              function gtag(){dataLayer.push(arguments);}
              gtag('js', new Date());
              gtag('config', 'UA-60040560-4');
            </script>
        </head>
        <body>
            <div id="main" class="main"></div>
        </body>

            <script type="text/javascript" src="/js" async></script>

* Connection #0 to host emojivoto left intact
    </html>

In the proxy's logs, I see that we are indeed falling back to DNS for emojivoto and we're sending traffic to web-svc.emojivoto.svc.cluster.local:80, which is mapped by kubeproxy... As mentioned before, this behavior should be improved so that the proxy does its own balancing; but that's somewhat unrelated to the issue you reported.

So, I'm not sure that the targetPort is whole story of the issue you're seeing...

olix0r on 13 Apr 2018

Note that you can get some more detailed logging by adding --proxy-log-level=conduit_proxy::control::discovery=trace to the inject command.

olix0r on 13 Apr 2018

Hmm, that's odd. I noticed in your step 1 you changed targetPort to 8000 (and subsequently the receiving deployment) whereas in my example I changed the port to 8000 and then the WEB_HOST on the bot to target emojivoto:8000.

I'll also extract more detailed logs...

jondlm on 13 Apr 2018

Closing this issue since it appears to be resolved with v0.4.1 :tada:

jondlm on 27 Apr 2018

W00t!