While I would like to route my traffic between my applications (http, grpc, tcp) using Istio/Envoy service mesh, some applications also need to reach some core TCP services like Zookeeper or Kafka.
I would like to be able to reach those core services using the regular K8s service endpoints.
app -> envoy proxy -> k8s service (by dns name
As far as I've found, it does not seem possible to route traffic out of the mesh, except using a Istio egress, which is http(s) only and is not meant to talk to k8s services.
Do you have any solution or plan for that ?
thanks.
If you don't use auth feature, then you should be able to reach non-istio pods from istio pods and vice versa in the normal way.
@kyessenov I should be able to reach pods or services ?
Actually I can reach neither...
~ # nslookup kafka-zk-broker-kafka.dev
Name: kafka-zk-broker-kafka.dev
Address 1: 10.33.0.11 kafka-zk-kafka-0.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 2: 10.38.96.16 kafka-zk-kafka-2.kafka-zk-broker-kafka.dev.svc.cluster.local
Address 3: 10.40.128.13 kafka-zk-kafka-1.kafka-zk-broker-kafka.dev.svc.cluster.local
~ # ping kafka-zk-broker-kafka.dev
PING kafka-zk-broker-kafka.dev (10.33.0.11): 56 data bytes
64 bytes from 10.33.0.11: seq=0 ttl=64 time=0.376 ms
64 bytes from 10.33.0.11: seq=1 ttl=64 time=0.348 ms
--- kafka-zk-broker-kafka.dev ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.348/0.362/0.376 ms
~ # telnet kafka-zk-broker-kafka.dev 9092
Connection closed by foreign host
Everything is working if I don't deploy the pod using the Istio sidecart
Pods can only be reached through service names in istio (we don't program all individual pod routes).
This is likely due to namespacing issue (istio only cares about the namespace it's deployed in, cc @andraxylia ). I'd hope that if you deploy istio in "dev" namespace, it would work (at least we test for that case).
in fact, everything is deployed in the "dev" namespace in my test.
My Kafka service is a headless one, trying to reach each pod individually, as suggest the result of the nslookup
command.
So, if Istio can't reach (route) to the Pod's IP :
1) i'm screwed
2) I will have to user a cluster-IP service
Any chance to have Istio route to the pod's IP ? I think my usecase is more than usual, especially when you have a "service" like Kafka or Mongodb, with a rich client, where you want your "client" to know about all the existing server endpoints.
Thanks for the suggestion, we'll consider adding explicit network endpoints for headless TCP services. We were trying to preserve the service abstraction and reduce configuration load, but a rich client trying to address endpoints directly is a legitimate use-case. This is even more true for headless services.
At some point, we would want Envoy to take some of the rich client functionality by adding a kafka filter and delegating LB and other features to Envoy from the rich client. This would require a ClusterIP service. Would that make sense to you?
cc @rshriram @louiscryan
@kyessenov I'm not sure it's a good idea to go the way of filters in Envoy. You will end re-writing applications clients logic in filter for as lot bunch of applications (Kafka, ZK, Mongo, Cassandra...).
The cool thing with rich clients is that they take care of the connection/deconnection/rebalance logic. There is no point going through another tool to gain nothing.
My suggestion would be enabling Istio/Envoy to route traffic to headless services, maybe by using some command line like --includeHeadlessServices
, like includeIPRanges
, or simply by discovering the headless services in the current namespace and maintaining a route for them...
Sadly, this discussion means I can't use Istio for now... at least with Envoy... How would it be with Linkerd ?
We'll add an option to route directly to endpoints for headless services in the next release.
I'm not sure about the state of TCP load balancing for linkerd.
can't wait for the next release then !
Any thought on the release date ? Will be testing right away !
It's not the proxy that's the issue. It's pilot. We don't configure Envoy or Linkerd with pod IPs due to the potentially large number of listener blocks or configs.
The fix for headless services would be a hack at most. It will face issues as more pods are added to the headless service or removed (if it's a statefulset there might be less churn). The sensible option for this is to have the passthrough mode support implemented in Envoy and then add a genetic tcp proxy listener in Envoy that matches traffic for the kube internal subnet range (e.g. 10.0..) and passes traffic through to original destination and port. Then one would be able to talk to pods directly irrespective of headless services or normal tcp services. We would probably even eliminate tcp proxy configuration completely.
Here is the issue in Envoy that is attempting to add this support. https://github.com/lyft/envoy/pull/1246
@kyessenov @rshriram How about enabling external traffic for TCP? Then headless services can be defined as external services (they are external to Istio).
A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for headful services?
On Fri, Jul 28, 2017 at 12:11 PM Vadim Eisenberg notifications@github.com
wrote:
@kyessenov https://github.com/kyessenov @rshriram
https://github.com/rshriram How about enabling external traffic for
TCP? Then headless services can be defined as external services (they are
external to Istio).K8s external services don't support tcp. Secondly, the user wants to
directly talk to pod IP. We need to process pods in statefulset or headless
services like any other pod, there by providing the ability to dynamically
add or remove pods from an upstream cluster.
A related question - does Istio handle TCP traffic (non-HTTP/HTTPS) for
headful services?
We setup tcp proxy in Envoy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/istio/istio/issues/506#issuecomment-318696038, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AH0qd1n9tAP3I-NbP25eWdRRaHU8JFulks5sSggwgaJpZM4OlbzW
.>
~shriram
How would transparent tcp proxying work with mTLS?
I opened an issue as well on the other repo. Just gathering them into one.
This would be very useful. I would like to use Istio but currently cannot because I need my services within the mesh to be able to access DBs and other services that are non-Istiofied.
I have the same problem, can't access StatefulSet.
Same as https://github.com/istio/pilot/issues/1015
(not sure we do have time to address the full solution but the minimal is k8 api server access)
@ldemailly those are not the same. This particular issue is about accessing headless services via pod IPs, without out. Its a bug in Pilot. @ijsnellf is working on it.
afaik the bug is we intercept everything but as long as we fix it's great
@wattli can you add the details to explained in the morning to this bug
Do you have any news on this ?
it likely won't be fixed in the very first 0.2 release but should be soon after depending on your exact case (for instance access to k8s api server and https services should be first to get working)
@prune998 we have support for headless services in master. If you are feeling a bit adventurous, we would appreciate some feedback if you could try out the istio.yaml from istio/istio master branch. You need to make sure that you name the ports for headless services, and that the port on which the headless service is listening does not collide with istio-fied service ports (e.g., both headless and istio service on port 80). You can find an example of headless service in istio/pilot/test/integration/testdata/
that's a good news. Will try it today.
Thanks.
building the whole stack from master seems to be a mess... I think I'll wait until release (or nightly ?)
@rshriram do you have a pointer to the commit that added headless services support ?
Thanks
https://github.com/istio/pilot/commit/29f0f191c7989648d47979e582c22acdd8d7311f from two weeks ago.
in the 0.2.4,i add zookeeper into service mesh but i can not access zookeeper, even inside the zookeeper cluster.
Work around Documented in FAQ
@sakshigoel12 @wattli where exactly? I don't see it in https://istio.io/docs/tasks/security/faq.html
@ldemailly , headless service is one kind of non-istio services, right? The first item of the FAQ mentions:
Can a service with Istio Auth enabled communicate with a service without Istio?
Currently it is not well supported. But we do have plan to support this in the near future.
maybe this is on the wrong issue (apologies) but I thought you mentioned we documented the healthcheck workaround somewhere - I do see curl mentioned but it's probably not enough for a user to solve
@ldemailly it is also documented here: https://istio.io/docs/setup/kubernetes/quick-start.html#installation-steps step 4.
@prune998, could you take a look to see if the doc is satisfactory?
@ldemailly it sounds good, even if it's really minimal explanations.
I'll try to get a look at it ASAP and see how it behave.
Thanks
@linsun my point is that there is a way to make liveness check work - we should actually not scare our users into turning off auth because of liveness - but for that to be practical we need a more detailed documentation of "how"
i see. my understanding is that is not planned for this month's 0.2 release.
the workaround works today, with 0.2.4, it just needs documentation
I will add more details to the faq.
@wattli please could you add an ETA
@prune998 / @FuzzOli87 / @immortalHuang / @hollinwilkins can you confirm if support for headless services work without Istio auth ?
@rshriram it will take a little more time for me to be able to test... sorry. Will let you know ASAP.
to be clear, try the 0.2.6 release from github..
@rshriram Is the 2.7 release good to test this feature with? You mention the 2.6 release however the release notes specifically say that the talking to non-istio services is not yet part of the release.
Edit: Nevermind. I see the docs. Have to disable mutual TLS authentication.
Yes.. with mutual TLS auth, it won't work.
@rshriram will it work with SNI https://github.com/envoyproxy/envoy/issues/1843 ?
@rshriram Will get a chance to test this this week, excited to using istio for our service mesh :)
would say it's OK for me when NOT using TLS.
I think you can close the issue except if you want to dig more to support TLS...
since 0.3.0 you can mix and match mtls and non mtls on a per service basis
@rshriram I can access headless services now, but I think the connection gets interrupted. With an istio-injected container access a non-istio, headless postgres service, I get connection errors after a while. Does istio timeout TCP connections that don't show traffic for long periods of time?
That might be the case.. Would need more info. It might require configuring Envoy correctly to keep connections alive.
which version are you using @hollinwilkins ? try 0.4.0 if possible...
@prune998 I'm using 0.4.0
I will run some more experiments throughout the day and try to figure out how long it takes to time out and I'll copy/paste a report of the exact error from postgres. It seems like an inactivity thing to me, because as soon as I start hitting the database again the error goes away after one attempt.
Ok, here is some more information:
E, [2018-01-03T17:35:20.079557 #1] ERROR -- : PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Which I'm pretty sure indicates that the underlying TCP stream is being closed from the server end (probably Envoy closing the connection).
@prune998 I have a solution for the time being, I'm expiring connections to the database every 10 minutes. Would still like to figure out the underlying cause. Happy to provide any debugging information needed.
I checked in Envoy. There is nothing that terminates connections every 10 minutes. Is there some other OS setting or middlebox that is terminating idle connections? IS your issue a case of idle connections being terminated or is there data in the connection as well?
Well, there must be some timeout somewhere... do you see a consistent timeout ?
What if you connect to your pod (kubectl exec -ti <pod> sh
) and run a postgres command by hand like psql ?
If it's a defined timeout it should always close the cnx after a fixed time...
I'm not sure how connections to headless services are made... are you still going through the envoy proxy or is it just iptables magic ? Anyone with the answer is welcomed to comment :)
It still goes through envoy.
So maybe check the envoy /stats (on port 15000) and check for various timeout counters, if it's envoy it should be there...
if it's not... well, maybe it's the TCP stack ?
@rshriram I only see this issue from istiofied pods (I have about 13 pods that are not istiofied, running the same stack that don't see this connection issue). I have one test pod right now, running one of the services that is istiofied to work through these connection issues. It could be some other middleware ultimately responsible for the issue, but I think Istio has something to do with it because of the setup I just described above. I am on GKE, and I don't think they impose any limits like this within their VMs. Also, I am not 100% sure if there is data across the wire or not. I have a connection pool, and I run health checks to the db fairly frequently (several times a minute). I saw this issue even with health checks running, but that could be because only one connection from the pool was being used by the health check, and when I go to do another query, a different connection gets pulled).
@prune998 The pod consistently times out if I let it sit without making any database calls. I will try the psql client and see if it has the same issue, that will rule out any library issues I may be running into. Also, I am running a new test without the connection expiration so I can check on the stats endpoint from the istio-proxy container.
@rshriram @prune998 Thank you both for the help! Thinking maybe we should put this into a separate issue?
Yes. Do you have Istio CA enabled? like mTLS enabled? Can you try with istio auth disabled and istio ca not deployed? I have a feeling that we are recycling Envoy every 10-15 minutes to refresh certificates. As part of the recycle, old connections are being terminated.
@rshriram I didn't install istio CA intentionally, I really don't want to add that layer yet. I used this install command: kubectl apply -f install/kubernetes/istio.yaml
However, I get this back when I run kubectl get po -n istio-system
:
➜ istio kubectl get po -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ca-55b954ff7-mdgvz 1/1 Running 0 1d
istio-ingress-948b746cb-7nm75 1/1 Running 0 1d
istio-mixer-59cc756b48-n67mc 3/3 Running 0 1d
istio-pilot-55bb7f5d9d-7ss2l 2/2 Running 0 1d
Is that istio-ca doing what I think it may be doing? How do I disable it?
@prune998 I just ran the test using the psql
command line utility, and I got this error after waiting 30 minutes:
> select * from service_accounts limit 1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
I think it is safe to assume it is a connection issue and not a library issue.
@hollinwilkins try stopping (scale to 0) the istio-ca pod.
I had an issue where envoy was "restared" to sync the CA every 30 mins. It's a bug which may not be resolved yet.
you can do this safely if you're not using mtls.
Will link the bug once at the airport, i'm in the bus right now :)
@prune998 Can I just delete the deployment for istio-ca or should I scale?
delete it if you're sure you;'re not using it...
@prune998 Hehe, gotcha. Also, there is too much information to sift through when I collect stats from istio-proxy. Is there a grep I can use to get you the useful stuff in regards to disconnects?
you don't need to touch the CA to turn on/off mtls you just need to : (from the security faq)
kubectl edit configmap -n istio-system istio
comment out or uncomment out authPolicy: MUTUAL_TLS to toggle mTLS and then
kubectl delete pods -n istio-system -l istio=pilot
to restart Pilot, after a few seconds (depending on your *RefreshDelay) your Envoy proxies will have picked up the change from Pilot. During that time your services may be unavailable.
@ldemailly I just edited the config, that line was already commented.
I did see a previous revision of the config had MUTUAL_TLS enabled, but that must have been from a long time ago.
# Uncomment the following line to enable mutual TLS between proxies
# authPolicy: MUTUAL_TLS
Actually, reviewing this:
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","data":{"mesh":"# Uncomment the following line to enable mutual TLS between proxies\n# authPolicy: MUTUAL_TLS\n#\n# Edit this list t...
Coming from the config map, it looks like it was never enabled even in the previous revision.
@rshriram @prune998 Disabling istio-ca worked! I am no longer getting the disconnects after waiting 30 minutes.
well, bug is still here then... I think this will change with the new GRPC API in envoy...
I can't find the related issue to this. Maybe there is no issue and I had that from someone else's comment... can't remember. Glad you had it working finaly !
@prune998 Added a PR for troubleshooting in Istio documentation: https://github.com/istio/istio.github.io/pull/835
@hollinwilkins @prune998 thanks for your patience in troubleshooting this and for writing the troubleshooting guide.
It appears there is still a minor bug (with an easy workaround), since pilot-agent should not restart Envoy if encryption is disabled, even if istio-ca is present and certificates are refreshed. Until support for SDS istio/istio#2120 will make envoy restart completely unnecessary, we need to allow both encrypted and un-encrypted services to co-exist in the same cluster, and we need the istio-ca in general. I opened istio/istio#2427 for not having to disable istio-ca. We can close this issue.
@prune998 Starting to see another issue with this. Not sure if it is related to headless services, but it seems istio has an effect here. After deploying a certain number of pods in a namespace, connections to headless services stop working for some reason.
I deploy 6 services injected with istio sidecar, and they connect to my database fine. When I deploy the 7th and 8th one, they cannot connect. Deploying all 8 without istio offers no issue connecting to the database.
Most helpful comment
@ldemailly those are not the same. This particular issue is about accessing headless services via pod IPs, without out. Its a bug in Pilot. @ijsnellf is working on it.