Istio: Failed to load certificate chain from <inline>

Created on 26 Mar 2020  Â·  3Comments  Â·  Source: istio/istio

Hello all,

We have been struggling to have Isito operational within our K8’s environment running within a VMWare PKS private cloud platform. Specifically, we are required, per our organization’s security policy and posture, to use our PKI’s keys.

A quick summary of our environment:
We have a 2-tier PKI environment (Root -> Issuing CA). We cut/signed a Subordinate CA certificate for the Istio CA (to sign on behalf of our PKI Issuing CA) from our issuing CA. In the end, here is what we are expecting the PKI to look like in context to Istio:
Root CA -> Issuing CA -> Istio CA -> Leafs (i.e. Envoy, etc.)

We have created all proper certificate files for Istio to use, following this document:
We have created a K8’s secret:

kubectl create secret generic cacerts -n istio-system --from-file=ca-cert.pem --from-file=ca-key.pem --from-file=root-cert.pem --from-file=cert-chain.pem

We are also installing Istio using istioctl; moved away from helm and ported our values over. In addition, we are applying the default profile:

istioctl manifest apply --set profile=default --set values.global.mtls.enabled=true

We started with v.1.46, and consistently we would identify within the logs our certificate chain would be loaded properly just after Citadel signed its certs.
Example:

validationServer x509 cert [0] - Issuer: (this is the Istio generated cert signed by the Istio CA as the Subject is blank) Subject: ""
validationServer x509 cert [1] - Issuer: (this is the subordinate CA public key/cert that our internal issuing CA signed)
validationServer x509 cert [2] - Issuer: (this is our Issuing CA’s public key/cert from our internal PKI)

All looks normal until after the ‘validationController’ completes, during discovery with Pilot, when Istiod is attempting to reach out to each [and every] service, it fails with a [generic] handshake error:
Example:

grpc: Server.Serve failed to complete security handshake from "###.###.###.3:48326": EOF

The ingress controller fails to start and its log states ‘unable to communicate with Envoy (Is Pilot running?)‘ type messages. The Istio-proxy container, within the Prometheus pod, states the same thing too. All makes sense given the condition.

We tried different angles, and nothing would work for us. On the cusp of seeing 1.51 being released in just a matter of a few days, we waited for that to be released and attempt with the new version.

We went ahead and tried on 1.51 with no success still; however, the outcome behavior is different. This time, the ingress controller came ‘up’ as healthy, but its log again barks about the same issue – unable to communicate with Envoy. The same behavior, like with 1.46, occurs within istio-proxy too (doesn’t start, same log entries). The kicker, we get a completely different error within Istiod’s logs. Instead of a [generic] security handshake message, we receive this (seems to be at the same procedure time when each service is being connected to initially like in 1.46):
> warn ads ADS:CDS: ACK ERROR 10.12.13.3:40606 router~###.###.###.3~istio-ingressgateway-75877dc5bf-jrnrs.istio-system~istio-system.svc.cluster.local-1 Internal:Error adding/updating cluster(s) outbound|15020||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|80||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|443||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|15029||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|15030||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|15031||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|15032||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|31400||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>, outbound|15443||istio-ingressgateway.istio-system.svc.cluster.local: Failed to load certificate chain from <inline>

How can it state this?: Failed to load certificate chain from <inline>

Here is what I mean by that question, I exported the cert that was generated/signed (from: /var/run/secrets/istio-dns), grab the chain, subordinate cert, and root cert, feed it into OpenSSL – it’s good.

Openssl verify -CAfile root-cert.crt -untrusted intCA.crt -untrusted istioca.[internal_FQDN].crt new-chain.pem
new-chain.pem: OK

We went ahead and turned on 'debug' based logging (wow, there's a lot to comb through!). All we were able to distinguish/confirm was our root-CA's public key does get passed down to each namespace; presuming this is to add it to a "trust store" to trust it. Which is good to see.

Help, please? :)

Thank you!

  • Eric Buono

Expected behavior
Istio to start properly; Envoy and pilot to use the certificate chain we have defined.

Steps to reproduce the bug
This occurs on every installation using our signed subordinate CA certificate. Using the demo certificates, Istio installs and starts up without any issues.

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)
istioctl version:

  • client version: 1.5.1
  • control plane version: 1.5.1
  • data plane version: 1.5.1 (5 proxies), 1.4.3 (1 proxies)

Kubectl version:

  • Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"windows/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?
istioctl

areconfig areenvironments aresecurity kindocs

Most helpful comment

Thanks again Howard for the pointer! We were able to extract the Envoy config and after decrypting the encoded chain, it was obvious what was happening. To provide insight to someone else down the road, here is what we saw:

-----BEGIN CERTIFICATE-----
[Leaf]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[SubCA-Cert]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[Issuing CA Cert]
-----END CERTIFICATE----------BEGIN CERTIFICATE-----
[Root CA Cert]
-----END CERTIFICATE-----

We just needed to add a carriage return at the end of the cert-chain file. The EOF was at the pipe (below):

-----END CERTIFICATE-----|

It needs to be like this (obviously!):

-----END CERTIFICATE-----
|

Got to love simple mistakes that make you chase your tail.

I hope this helps someone else down the road.

Thanks all!
-Eric

All 3 comments

That log comes from Envoy rejecting the config. For whatever reason it isn't loading the certs correctly. I would suggest getting a config dump from one of the pods that rejects this: https://github.com/istio/istio/wiki/Troubleshooting-Istio#collecting-information-2 and see what the cert config looks like, and additionally maybe turning envoy debug logging on to see if it explains why it failed to load

Thanks for the reply! Sounds good, that will be our next steps tomorrow then. We were out of ideas on where to go from here. I'll report back - thanks!

Thanks again Howard for the pointer! We were able to extract the Envoy config and after decrypting the encoded chain, it was obvious what was happening. To provide insight to someone else down the road, here is what we saw:

-----BEGIN CERTIFICATE-----
[Leaf]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[SubCA-Cert]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[Issuing CA Cert]
-----END CERTIFICATE----------BEGIN CERTIFICATE-----
[Root CA Cert]
-----END CERTIFICATE-----

We just needed to add a carriage return at the end of the cert-chain file. The EOF was at the pipe (below):

-----END CERTIFICATE-----|

It needs to be like this (obviously!):

-----END CERTIFICATE-----
|

Got to love simple mistakes that make you chase your tail.

I hope this helps someone else down the road.

Thanks all!
-Eric

Was this page helpful?
0 / 5 - 0 ratings