Istio: accessing regular k8s services from istio mesh

Created on 27 Jul 2017 · 76Comments · Source: istio/istio

While I would like to route my traffic between my applications (http, grpc, tcp) using Istio/Envoy service mesh, some applications also need to reach some core TCP services like Zookeeper or Kafka.
I would like to be able to reach those core services using the regular K8s service endpoints.

app -> envoy proxy -> k8s service (by dns name .)

As far as I've found, it does not seem possible to route traffic out of the mesh, except using a Istio egress, which is http(s) only and is not meant to talk to k8s services.

Do you have any solution or plan for that ?
thanks.

aresecurity kindocs

Source

prune998

👍5

Most helpful comment

@ldemailly those are not the same. This particular issue is about accessing headless services via pod IPs, without out. Its a bug in Pilot. @ijsnellf is working on it.

rshriram on 17 Aug 2017

👍2

All 76 comments

If you don't use auth feature, then you should be able to reach non-istio pods from istio pods and vice versa in the normal way.

kyessenov on 27 Jul 2017

@kyessenov I should be able to reach pods or services ?
Actually I can reach neither...

~ # nslookup kafka-zk-broker-kafka.dev

Name:      kafka-zk-broker-kafka.dev
Address 1: 10.33.0.11 kafka-zk-kafka-0.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 2: 10.38.96.16 kafka-zk-kafka-2.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 3: 10.40.128.13 kafka-zk-kafka-1.kafka-zk-broker-kafka.dev.svc.cluster.local

~ # ping kafka-zk-broker-kafka.dev
PING kafka-zk-broker-kafka.dev (10.33.0.11): 56 data bytes
64 bytes from 10.33.0.11: seq=0 ttl=64 time=0.376 ms
64 bytes from 10.33.0.11: seq=1 ttl=64 time=0.348 ms

--- kafka-zk-broker-kafka.dev ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.348/0.362/0.376 ms

~ # telnet kafka-zk-broker-kafka.dev 9092
Connection closed by foreign host

Everything is working if I don't deploy the pod using the Istio sidecart

prune998 on 27 Jul 2017

Pods can only be reached through service names in istio (we don't program all individual pod routes).

This is likely due to namespacing issue (istio only cares about the namespace it's deployed in, cc @andraxylia ). I'd hope that if you deploy istio in "dev" namespace, it would work (at least we test for that case).

kyessenov on 27 Jul 2017

in fact, everything is deployed in the "dev" namespace in my test.
My Kafka service is a headless one, trying to reach each pod individually, as suggest the result of the nslookup command.
So, if Istio can't reach (route) to the Pod's IP :
1) i'm screwed
2) I will have to user a cluster-IP service

Any chance to have Istio route to the pod's IP ? I think my usecase is more than usual, especially when you have a "service" like Kafka or Mongodb, with a rich client, where you want your "client" to know about all the existing server endpoints.

prune998 on 27 Jul 2017

Thanks for the suggestion, we'll consider adding explicit network endpoints for headless TCP services. We were trying to preserve the service abstraction and reduce configuration load, but a rich client trying to address endpoints directly is a legitimate use-case. This is even more true for headless services.

At some point, we would want Envoy to take some of the rich client functionality by adding a kafka filter and delegating LB and other features to Envoy from the rich client. This would require a ClusterIP service. Would that make sense to you?

kyessenov on 27 Jul 2017

cc @rshriram @louiscryan

kyessenov on 27 Jul 2017

@kyessenov I'm not sure it's a good idea to go the way of filters in Envoy. You will end re-writing applications clients logic in filter for as lot bunch of applications (Kafka, ZK, Mongo, Cassandra...).

The cool thing with rich clients is that they take care of the connection/deconnection/rebalance logic. There is no point going through another tool to gain nothing.

My suggestion would be enabling Istio/Envoy to route traffic to headless services, maybe by using some command line like --includeHeadlessServices, like includeIPRanges, or simply by discovering the headless services in the current namespace and maintaining a route for them...

Sadly, this discussion means I can't use Istio for now... at least with Envoy... How would it be with Linkerd ?

prune998 on 27 Jul 2017

We'll add an option to route directly to endpoints for headless services in the next release.

I'm not sure about the state of TCP load balancing for linkerd.

kyessenov on 27 Jul 2017

can't wait for the next release then !
Any thought on the release date ? Will be testing right away !

prune998 on 27 Jul 2017

It's not the proxy that's the issue. It's pilot. We don't configure Envoy or Linkerd with pod IPs due to the potentially large number of listener blocks or configs.

The fix for headless services would be a hack at most. It will face issues as more pods are added to the headless service or removed (if it's a statefulset there might be less churn). The sensible option for this is to have the passthrough mode support implemented in Envoy and then add a genetic tcp proxy listener in Envoy that matches traffic for the kube internal subnet range (e.g. 10.0..) and passes traffic through to original destination and port. Then one would be able to talk to pods directly irrespective of headless services or normal tcp services. We would probably even eliminate tcp proxy configuration completely.

rshriram on 28 Jul 2017

Here is the issue in Envoy that is attempting to add this support. https://github.com/lyft/envoy/pull/1246

rshriram on 28 Jul 2017

@kyessenov @rshriram How about enabling external traffic for TCP? Then headless services can be defined as external services (they are external to Istio).

A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for headful services?

vadimeisenbergibm on 28 Jul 2017

On Fri, Jul 28, 2017 at 12:11 PM Vadim Eisenberg notifications@github.com
wrote:

@kyessenov https://github.com/kyessenov @rshriram
https://github.com/rshriram How about enabling external traffic for
TCP? Then headless services can be defined as external services (they are
external to Istio).

K8s external services don't support tcp. Secondly, the user wants to
directly talk to pod IP. We need to process pods in statefulset or headless
services like any other pod, there by providing the ability to dynamically
add or remove pods from an upstream cluster.

A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for

headful services?

We setup tcp proxy in Envoy.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/istio/istio/issues/506#issuecomment-318696038, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AH0qd1n9tAP3I-NbP25eWdRRaHU8JFulks5sSggwgaJpZM4OlbzW
.

>

~shriram

rshriram on 28 Jul 2017

How would transparent tcp proxying work with mTLS?

kyessenov on 28 Jul 2017

I opened an issue as well on the other repo. Just gathering them into one.

https://github.com/istio/issues/issues/37

FuzzOli87 on 3 Aug 2017

This would be very useful. I would like to use Istio but currently cannot because I need my services within the mesh to be able to access DBs and other services that are non-Istiofied.

hollinwilkins on 7 Aug 2017

I have the same problem, can't access StatefulSet.

wuzhihui1123 on 14 Aug 2017

Same as https://github.com/istio/pilot/issues/1015
(not sure we do have time to address the full solution but the minimal is k8 api server access)

ldemailly on 17 Aug 2017

@ldemailly those are not the same. This particular issue is about accessing headless services via pod IPs, without out. Its a bug in Pilot. @ijsnellf is working on it.

rshriram on 17 Aug 2017

👍2

afaik the bug is we intercept everything but as long as we fix it's great

ldemailly on 17 Aug 2017

@wattli can you add the details to explained in the morning to this bug

sakshigoel12 on 13 Sep 2017

Do you have any news on this ?

prune998 on 20 Sep 2017

it likely won't be fixed in the very first 0.2 release but should be soon after depending on your exact case (for instance access to k8s api server and https services should be first to get working)

ldemailly on 20 Sep 2017

@prune998 we have support for headless services in master. If you are feeling a bit adventurous, we would appreciate some feedback if you could try out the istio.yaml from istio/istio master branch. You need to make sure that you name the ports for headless services, and that the port on which the headless service is listening does not collide with istio-fied service ports (e.g., both headless and istio service on port 80). You can find an example of headless service in istio/pilot/test/integration/testdata/

rshriram on 21 Sep 2017

that's a good news. Will try it today.
Thanks.

prune998 on 21 Sep 2017

building the whole stack from master seems to be a mess... I think I'll wait until release (or nightly ?)

prune998 on 22 Sep 2017

@rshriram do you have a pointer to the commit that added headless services support ?
Thanks

prune998 on 22 Sep 2017

https://github.com/istio/pilot/commit/29f0f191c7989648d47979e582c22acdd8d7311f from two weeks ago.

kyessenov on 22 Sep 2017

in the 0.2.4,i add zookeeper into service mesh but i can not access zookeeper, even inside the zookeeper cluster.

immortalHuang on 24 Sep 2017

Work around Documented in FAQ

sakshigoel12 on 26 Sep 2017

@sakshigoel12 @wattli where exactly? I don't see it in https://istio.io/docs/tasks/security/faq.html

ldemailly on 27 Sep 2017

@ldemailly , headless service is one kind of non-istio services, right? The first item of the FAQ mentions:

Can a service with Istio Auth enabled communicate with a service without Istio?
Currently it is not well supported. But we do have plan to support this in the near future.

wattli on 27 Sep 2017

maybe this is on the wrong issue (apologies) but I thought you mentioned we documented the healthcheck workaround somewhere - I do see curl mentioned but it's probably not enough for a user to solve

ldemailly on 28 Sep 2017

@ldemailly it is also documented here: https://istio.io/docs/setup/kubernetes/quick-start.html#installation-steps step 4.

@prune998, could you take a look to see if the doc is satisfactory?

linsun on 28 Sep 2017

@ldemailly it sounds good, even if it's really minimal explanations.
I'll try to get a look at it ASAP and see how it behave.

Thanks

prune998 on 28 Sep 2017

@linsun my point is that there is a way to make liveness check work - we should actually not scare our users into turning off auth because of liveness - but for that to be practical we need a more detailed documentation of "how"

ldemailly on 28 Sep 2017

i see. my understanding is that is not planned for this month's 0.2 release.

linsun on 28 Sep 2017

the workaround works today, with 0.2.4, it just needs documentation

ldemailly on 28 Sep 2017

I will add more details to the faq.

wattli on 28 Sep 2017

👍1

@wattli please could you add an ETA

sakshigoel12 on 29 Sep 2017

@prune998 / @FuzzOli87 / @immortalHuang / @hollinwilkins can you confirm if support for headless services work without Istio auth ?

rshriram on 29 Sep 2017

@rshriram it will take a little more time for me to be able to test... sorry. Will let you know ASAP.

prune998 on 29 Sep 2017

to be clear, try the 0.2.6 release from github..

rshriram on 29 Sep 2017

@rshriram Is the 2.7 release good to test this feature with? You mention the 2.6 release however the release notes specifically say that the talking to non-istio services is not yet part of the release.

Edit: Nevermind. I see the docs. Have to disable mutual TLS authentication.

FuzzOli87 on 17 Oct 2017

Yes.. with mutual TLS auth, it won't work.

rshriram on 17 Oct 2017

@rshriram will it work with SNI https://github.com/envoyproxy/envoy/issues/1843 ?

vadimeisenbergibm on 17 Oct 2017

@rshriram Will get a chance to test this this week, excited to using istio for our service mesh :)

hollinwilkins on 21 Oct 2017

would say it's OK for me when NOT using TLS.
I think you can close the issue except if you want to dig more to support TLS...

prune998 on 28 Dec 2017

since 0.3.0 you can mix and match mtls and non mtls on a per service basis

ldemailly on 28 Dec 2017

@rshriram I can access headless services now, but I think the connection gets interrupted. With an istio-injected container access a non-istio, headless postgres service, I get connection errors after a while. Does istio timeout TCP connections that don't show traffic for long periods of time?

hollinwilkins on 3 Jan 2018

That might be the case.. Would need more info. It might require configuring Envoy correctly to keep connections alive.

rshriram on 3 Jan 2018

which version are you using @hollinwilkins ? try 0.4.0 if possible...

prune998 on 3 Jan 2018

@prune998 I'm using 0.4.0

hollinwilkins on 3 Jan 2018

I will run some more experiments throughout the day and try to figure out how long it takes to time out and I'll copy/paste a report of the exact error from postgres. It seems like an inactivity thing to me, because as soon as I start hitting the database again the error goes away after one attempt.

hollinwilkins on 3 Jan 2018

Ok, here is some more information:

The connection times out somewhere between 10-35 minutes
The actual error I get (using the Ruby Sequel gem):

E, [2018-01-03T17:35:20.079557 #1] ERROR -- : PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Which I'm pretty sure indicates that the underlying TCP stream is being closed from the server end (probably Envoy closing the connection).

hollinwilkins on 3 Jan 2018

@prune998 I have a solution for the time being, I'm expiring connections to the database every 10 minutes. Would still like to figure out the underlying cause. Happy to provide any debugging information needed.

hollinwilkins on 4 Jan 2018

I checked in Envoy. There is nothing that terminates connections every 10 minutes. Is there some other OS setting or middlebox that is terminating idle connections? IS your issue a case of idle connections being terminated or is there data in the connection as well?

rshriram on 4 Jan 2018

Well, there must be some timeout somewhere... do you see a consistent timeout ?
What if you connect to your pod (kubectl exec -ti <pod> sh) and run a postgres command by hand like psql ?
If it's a defined timeout it should always close the cnx after a fixed time...
I'm not sure how connections to headless services are made... are you still going through the envoy proxy or is it just iptables magic ? Anyone with the answer is welcomed to comment :)

prune998 on 4 Jan 2018

It still goes through envoy.

rshriram on 4 Jan 2018

So maybe check the envoy /stats (on port 15000) and check for various timeout counters, if it's envoy it should be there...
if it's not... well, maybe it's the TCP stack ?

prune998 on 4 Jan 2018

@rshriram I only see this issue from istiofied pods (I have about 13 pods that are not istiofied, running the same stack that don't see this connection issue). I have one test pod right now, running one of the services that is istiofied to work through these connection issues. It could be some other middleware ultimately responsible for the issue, but I think Istio has something to do with it because of the setup I just described above. I am on GKE, and I don't think they impose any limits like this within their VMs. Also, I am not 100% sure if there is data across the wire or not. I have a connection pool, and I run health checks to the db fairly frequently (several times a minute). I saw this issue even with health checks running, but that could be because only one connection from the pool was being used by the health check, and when I go to do another query, a different connection gets pulled).

@prune998 The pod consistently times out if I let it sit without making any database calls. I will try the psql client and see if it has the same issue, that will rule out any library issues I may be running into. Also, I am running a new test without the connection expiration so I can check on the stats endpoint from the istio-proxy container.

@rshriram @prune998 Thank you both for the help! Thinking maybe we should put this into a separate issue?

hollinwilkins on 4 Jan 2018

Yes. Do you have Istio CA enabled? like mTLS enabled? Can you try with istio auth disabled and istio ca not deployed? I have a feeling that we are recycling Envoy every 10-15 minutes to refresh certificates. As part of the recycle, old connections are being terminated.

rshriram on 4 Jan 2018

@rshriram I didn't install istio CA intentionally, I really don't want to add that layer yet. I used this install command: kubectl apply -f install/kubernetes/istio.yaml

However, I get this back when I run kubectl get po -n istio-system:

➜  istio kubectl get po -n istio-system
NAME                            READY     STATUS    RESTARTS   AGE
istio-ca-55b954ff7-mdgvz        1/1       Running   0          1d
istio-ingress-948b746cb-7nm75   1/1       Running   0          1d
istio-mixer-59cc756b48-n67mc    3/3       Running   0          1d
istio-pilot-55bb7f5d9d-7ss2l    2/2       Running   0          1d

Is that istio-ca doing what I think it may be doing? How do I disable it?

hollinwilkins on 4 Jan 2018

@prune998 I just ran the test using the psql command line utility, and I got this error after waiting 30 minutes:

> select * from service_accounts limit 1;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

I think it is safe to assume it is a connection issue and not a library issue.

hollinwilkins on 4 Jan 2018

@hollinwilkins try stopping (scale to 0) the istio-ca pod.
I had an issue where envoy was "restared" to sync the CA every 30 mins. It's a bug which may not be resolved yet.
you can do this safely if you're not using mtls.
Will link the bug once at the airport, i'm in the bus right now :)

prune998 on 4 Jan 2018

@prune998 Can I just delete the deployment for istio-ca or should I scale?

hollinwilkins on 4 Jan 2018

delete it if you're sure you;'re not using it...

prune998 on 4 Jan 2018

@prune998 Hehe, gotcha. Also, there is too much information to sift through when I collect stats from istio-proxy. Is there a grep I can use to get you the useful stuff in regards to disconnects?

hollinwilkins on 4 Jan 2018

you don't need to touch the CA to turn on/off mtls you just need to : (from the security faq)

kubectl edit configmap -n istio-system istio
comment out or uncomment out authPolicy: MUTUAL_TLS to toggle mTLS and then

kubectl delete pods -n istio-system -l istio=pilot
to restart Pilot, after a few seconds (depending on your *RefreshDelay) your Envoy proxies will have picked up the change from Pilot. During that time your services may be unavailable.

ldemailly on 4 Jan 2018

@ldemailly I just edited the config, that line was already commented.

I did see a previous revision of the config had MUTUAL_TLS enabled, but that must have been from a long time ago.

# Uncomment the following line to enable mutual TLS between proxies
# authPolicy: MUTUAL_TLS

hollinwilkins on 4 Jan 2018

Actually, reviewing this:

Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","data":{"mesh":"# Uncomment the following line to enable mutual TLS between proxies\n# authPolicy: MUTUAL_TLS\n#\n# Edit this list t...

Coming from the config map, it looks like it was never enabled even in the previous revision.

hollinwilkins on 4 Jan 2018

@rshriram @prune998 Disabling istio-ca worked! I am no longer getting the disconnects after waiting 30 minutes.

hollinwilkins on 4 Jan 2018

well, bug is still here then... I think this will change with the new GRPC API in envoy...
I can't find the related issue to this. Maybe there is no issue and I had that from someone else's comment... can't remember. Glad you had it working finaly !

prune998 on 4 Jan 2018

@prune998 Added a PR for troubleshooting in Istio documentation: https://github.com/istio/istio.github.io/pull/835

hollinwilkins on 4 Jan 2018

@hollinwilkins @prune998 thanks for your patience in troubleshooting this and for writing the troubleshooting guide.

It appears there is still a minor bug (with an easy workaround), since pilot-agent should not restart Envoy if encryption is disabled, even if istio-ca is present and certificates are refreshed. Until support for SDS istio/istio#2120 will make envoy restart completely unnecessary, we need to allow both encrypted and un-encrypted services to co-exist in the same cluster, and we need the istio-ca in general. I opened istio/istio#2427 for not having to disable istio-ca. We can close this issue.

andraxylia on 5 Jan 2018

@prune998 Starting to see another issue with this. Not sure if it is related to headless services, but it seems istio has an effect here. After deploying a certain number of pods in a namespace, connections to headless services stop working for some reason.

I deploy 6 services injected with istio sidecar, and they connect to my database fine. When I deploy the 7th and 8th one, they cannot connect. Deploying all 8 without istio offers no issue connecting to the database.

hollinwilkins on 8 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

unable to preserve source ip

linsun · 58Comments

GateWay Ssl Error

hamon-e · 59Comments

Envoy shutting down before the thing it's wrapping can cause failed requests

Stono · 65Comments

503 errors when scaling down, or rolling out a new application version

Stono · 92Comments

UDP support

kyessenov · 72Comments