Serving: Need to recreate knative-ingress-gateway every few minutes

Created on 16 Aug 2019  路  25Comments  路  Source: knative/serving

In what area(s)?

/area networking

What version of Knative?

0.8.x

Expected Behavior

Once I configure my cluster with knative-ingress-gateway it should remain as is and I should not have to keep it reconfiguring every few minutes!

Actual Behavior

I configured Static IP, Custom Domain and TLS on my cluster by following Knative docs.
Everything works fine for a while. HTTP traffic gets redirected to https properly, I can load my services using https://...

However, after a few minutes. I can no longer any of my services using https. However, I can access them using http.

Then, I manually apply the following once again, and everything is back to as expected for another few minutes. Then the behavior repeats once again.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-ingress-gateway
  namespace: knative-serving
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
    tls:
      # Sends 301 redirect for all http requests.
      # Omit to allow http and https.
      httpsRedirect: true
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - "*"
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

It almost seems like knative-ingress-gateway gets removed after a while and I have to keep adding it back to the cluster.

arenetworking kinbug

All 25 comments

It looks like the Gateway is being reconciled, resulting in your changes being overwritten.
This can happen if you have either:

Can you confirm this is the case?

@JRBANCEL yes, I have enabled AutoTLS.

/assign @ZhiminXiang

You manually configured TLS, right?
Do you need AutoTLS? If not, disable AutoTLS.

Today, you can't have both at the same time because AutoTLS will edit (overwrite) the Gateway.
We are working on enabling this.

@JRBANCEL I used Knative's docs to configure Let's Encrypt SSL with Google Cloud DNS. Isn't AutoTLS required to have those certs re-issued every 3 months?

Ok. I thought you configured TLS manually. My bad.
You are using AutoTLS, therefore Knative is going to modify the Gateway.

Can you provide a dump of the Gateway, when HTTPS works and when it doesn't work?
kubectl get gateway -n knative-serving knative-ingress-gateway -o yaml

Sure @JRBANCEL

Here is the dump when it is not working:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"Gateway","metadata":{"annotations":{},"name":"knative-ingress-gateway","namespace":"knative-serving"},"spec":{"selector":{"istio":"ingressgateway"},"servers":[{"hosts":["*"],"port":{"name":"http","number":80,"protocol":"HTTP"},"tls":{"httpsRedirect":true}},{"hosts":["*"],"port":{"name":"https","number":443,"protocol":"HTTPS"},"tls":{"mode":"SIMPLE","privateKey":"/etc/istio/ingressgateway-certs/tls.key","serverCertificate":"/etc/istio/ingressgateway-certs/tls.crt"}}]}}
  creationTimestamp: "2019-08-15T13:53:35Z"
  generation: 19
  name: knative-ingress-gateway
  namespace: knative-serving
  resourceVersion: "434598"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/knative-serving/gateways/knative-ingress-gateway
  uid: 136d4c03-bf64-11e9-a864-42010a8a023e
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - authentication.security.staging.company.com
    port:
      name: authentication:0
      number: 443
      protocol: HTTPS
    tls:
      caCertificates: ""
      credentialName: authentication-93bab3b6-c046-11e9-bf83-42010a8a009f
      httpsRedirect: false
      mode: SIMPLE
      privateKey: tls.key
      serverCertificate: tls.crt
      subjectAltNames: null
  - hosts:
    - '*'
    port:
      name: http-server
      number: 80
      protocol: HTTP

And when it is working:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"Gateway","metadata":{"annotations":{},"name":"knative-ingress-gateway","namespace":"knative-serving"},"spec":{"selector":{"istio":"ingressgateway"},"servers":[{"hosts":["*"],"port":{"name":"http","number":80,"protocol":"HTTP"},"tls":{"httpsRedirect":true}},{"hosts":["*"],"port":{"name":"https","number":443,"protocol":"HTTPS"},"tls":{"mode":"SIMPLE","privateKey":"/etc/istio/ingressgateway-certs/tls.key","serverCertificate":"/etc/istio/ingressgateway-certs/tls.crt"}}]}}
  creationTimestamp: "2019-08-15T13:53:35Z"
  generation: 20
  name: knative-ingress-gateway
  namespace: knative-serving
  resourceVersion: "465132"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/knative-serving/gateways/knative-ingress-gateway
  uid: 136d4c03-bf64-11e9-a864-42010a8a023e
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

The Gateway you apply looks to be correctly configured and therefore it works as expected.
As I explained earlier, the issue is that you are both manually setting up TLS and also enabling AutoTLS which is going to overwrite the Gateway. And AutoTLS seems to be failing somewhere.

@ZhiminXiang, can you help root cause what's the issue with AutoTLS?

Sorry for interrupt this thread. I agree @JRBANCEL, both manual&autoTLS would cause the issue.

@munjal-patel let me confirm one thing, When the problem happened, the redirection from HTTP to HTTPS stopped working or direct HTTPS access also stopped working? In the former case, can you please confirm that httpProtocol: "Redirected" is configured in config-network?

$ kubectl edit cm -n knative-serving config-network
  httpProtocol: "Redirected"

From your first description and dump config, I wonder that you manually configured httpsRedirect: true in the gateway. (Sorry if you configured properly, but just in case.)

@nak3 I manually configured httpsRedirect: true as I don't want any non-https traffic coming into the cluster.

@munjal-patel Then, you should configure it in config-networkconfigmap as:

$ kubectl edit cm -n knative-serving config-network
  httpProtocol: "Redirected"

You can also refer to the docs as well. After this settings, httpsRedirect: true is automatically configured in gateway and it will be persist.

@nak3 if I remember correctly I did try that option a few days ago and was running into some other complications. However, I will give it a try again. If I set httpProtocol: "Redirected" I don't have to make any changes in my gateway and can enable AutoTLS once again tight?

@nak3 I just enabled AutoTLS again and configured httpProtocol: "Redirected" and now I can't access my service on https anymore (unless I reapply ingress gateway). http does properly redirect to https as expected though.

Thank you for sharing. As httpProtocol: "Redirected" just adds httpsRedirect: true into gateway, it should not make any complications. So, the AutoTLS config is not working properly.

For further debugging of AutoTLS, investigating logs kubectl logs -n knative-serving networking-istio-xxxx and the output of curl -vvvv -k https://DOMAIN wold be the first step, I think.

@munjal-patel AutoTLS will not only provide you TLS certificates but also configure/reconcile your Gateway to enable TLS termination with the certs.

Seems like in your case auto tls does not work properly. Could you please run below command and share the result?

kubectl get Certificate --all-namespaces -oyaml

In addition, looks like you want to get certificate for domain authentication.security.staging.company.com. Just want to confirm with you that you followed the steps to configure cert manager as well as config-certmanager ConfigMap.

@nak3 I have attached logs for networking-istio-6d86d69644-69cql.
networking-istio-6d86d69644-69cql.log

Here is the log from curl -vvvv:

* Rebuilt URL to: https://authentication.security.staging.company.com/
*   Trying 35.197.69.80...
* TCP_NODELAY set
* Connected to authentication.security.staging.company.com (35.197.69.80) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.services.staging.company.com
*  start date: Aug 16 14:21:55 2019 GMT
*  expire date: Nov 14 14:21:55 2019 GMT
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fe006800400)
> GET / HTTP/2
> Host: authentication.security.staging.company.com
> User-Agent: curl/7.54.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 404
< date: Sat, 17 Aug 2019 06:09:32 GMT
< server: istio-envoy
<
* Connection #0 to host authentication.security.staging.company.com left intact

@ZhiminXiang here the output of all certificates:

apiVersion: v1
items:
- apiVersion: certmanager.k8s.io/v1alpha1
  kind: Certificate
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"labels":{"app":"webhook","chart":"webhook-v0.6.3","heritage":"Tiller","release":"cert-manager"},"name":"cert-manager-webhook-ca","namespace":"cert-manager"},"spec":{"commonName":"ca.webhook.cert-manager","duration":"43800h","isCA":true,"issuerRef":{"name":"cert-manager-webhook-selfsign"},"secretName":"cert-manager-webhook-ca"}}
    creationTimestamp: "2019-08-15T13:57:00Z"
    generation: 3
    labels:
      app: webhook
      chart: webhook-v0.6.3
      heritage: Tiller
      release: cert-manager
    name: cert-manager-webhook-ca
    namespace: cert-manager
    resourceVersion: "3608"
    selfLink: /apis/certmanager.k8s.io/v1alpha1/namespaces/cert-manager/certificates/cert-manager-webhook-ca
    uid: 8d6cbf35-bf64-11e9-a864-42010a8a023e
  spec:
    commonName: ca.webhook.cert-manager
    duration: 43800h0m0s
    isCA: true
    issuerRef:
      name: cert-manager-webhook-selfsign
    secretName: cert-manager-webhook-ca
  status:
    conditions:
    - lastTransitionTime: "2019-08-15T13:57:00Z"
      message: Certificate is up to date and has not expired
      reason: Ready
      status: "True"
      type: Ready
    notAfter: "2024-08-13T13:57:00Z"
- apiVersion: certmanager.k8s.io/v1alpha1
  kind: Certificate
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"labels":{"app":"webhook","chart":"webhook-v0.6.3","heritage":"Tiller","release":"cert-manager"},"name":"cert-manager-webhook-webhook-tls","namespace":"cert-manager"},"spec":{"dnsNames":["cert-manager-webhook","cert-manager-webhook.cert-manager","cert-manager-webhook.cert-manager.svc"],"duration":"8760h","issuerRef":{"name":"cert-manager-webhook-ca"},"secretName":"cert-manager-webhook-webhook-tls"}}
    creationTimestamp: "2019-08-15T13:57:01Z"
    generation: 3
    labels:
      app: webhook
      chart: webhook-v0.6.3
      heritage: Tiller
      release: cert-manager
    name: cert-manager-webhook-webhook-tls
    namespace: cert-manager
    resourceVersion: "3620"
    selfLink: /apis/certmanager.k8s.io/v1alpha1/namespaces/cert-manager/certificates/cert-manager-webhook-webhook-tls
    uid: 8df9ee6c-bf64-11e9-a864-42010a8a023e
  spec:
    dnsNames:
    - cert-manager-webhook
    - cert-manager-webhook.cert-manager
    - cert-manager-webhook.cert-manager.svc
    duration: 8760h0m0s
    issuerRef:
      name: cert-manager-webhook-ca
    secretName: cert-manager-webhook-webhook-tls
  status:
    conditions:
    - lastTransitionTime: "2019-08-15T13:57:02Z"
      message: Certificate is up to date and has not expired
      reason: Ready
      status: "True"
      type: Ready
    notAfter: "2020-08-14T13:57:01Z"
- apiVersion: certmanager.k8s.io/v1alpha1
  kind: Certificate
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"name":"company-staging","namespace":"istio-system"},"spec":{"acme":{"config":[{"dns01":{"provider":"cloud-dns-provider"},"domains":["*.default.staging.company.com","*.security.staging.company.com","*.services.staging.company.com"]}]},"commonName":"*.services.staging.company.com","dnsNames":["*.default.staging.company.com","*.security.staging.company.com","*.services.staging.company.com"],"issuerRef":{"kind":"ClusterIssuer","name":"letsencrypt-issuer"},"secretName":"istio-ingressgateway-certs"}}
    creationTimestamp: "2019-08-15T14:01:08Z"
    generation: 6
    name: company-staging
    namespace: istio-system
    resourceVersion: "404139"
    selfLink: /apis/certmanager.k8s.io/v1alpha1/namespaces/istio-system/certificates/company-staging
    uid: 2162553c-bf65-11e9-a864-42010a8a023e
  spec:
    acme:
      config:
      - dns01:
          provider: cloud-dns-provider
        domains:
        - '*.default.staging.company.com'
        - '*.security.staging.company.com'
        - '*.services.staging.company.com'
    commonName: '*.services.staging.company.com'
    dnsNames:
    - '*.default.staging.company.com'
    - '*.security.staging.company.com'
    - '*.services.staging.company.com'
    issuerRef:
      kind: ClusterIssuer
      name: letsencrypt-issuer
    secretName: istio-ingressgateway-certs
  status:
    conditions:
    - lastTransitionTime: "2019-08-16T15:21:56Z"
      message: Certificate is up to date and has not expired
      reason: Ready
      status: "True"
      type: Ready
    notAfter: "2019-11-14T14:21:55Z"
- apiVersion: certmanager.k8s.io/v1alpha1
  kind: Certificate
  metadata:
    creationTimestamp: "2019-08-16T16:54:56Z"
    generation: 3
    name: route-9054a31e-c046-11e9-bf83-42010a8a009f
    namespace: security
    ownerReferences:
    - apiVersion: networking.internal.knative.dev/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: Certificate
      name: route-9054a31e-c046-11e9-bf83-42010a8a009f
      uid: 937f1fdd-c046-11e9-bf83-42010a8a009f
    resourceVersion: "429705"
    selfLink: /apis/certmanager.k8s.io/v1alpha1/namespaces/security/certificates/route-9054a31e-c046-11e9-bf83-42010a8a009f
    uid: 93812243-c046-11e9-bf83-42010a8a009f
  spec:
    acme:
      config:
      - dns01:
          provider: cloud-dns-provider
        domains:
        - authentication.security.staging.company.com
    dnsNames:
    - authentication.security.staging.company.com
    issuerRef:
      kind: ClusterIssuer
      name: letsencrypt-issuer
    secretName: route-9054a31e-c046-11e9-bf83-42010a8a009f
  status:
    conditions:
    - lastTransitionTime: "2019-08-16T16:54:59Z"
      message: Certificate is up to date and has not expired
      reason: Ready
      status: "True"
      type: Ready
    notAfter: "2019-11-14T15:54:58Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Yes, I have followed all the steps as per Knative's docs.

As httpProtocol: "Redirected" just adds httpsRedirect: true into gateway, it should not make any complications.

Oops... I'm sorry, I understood what was the "complications". 0.8.x has an issue https://github.com/knative/serving/issues/5129. Your log produces "Probing of 10.0.0.16 failed: ready: false, error: <nil>". The probing fails with error: <nil> happens when "Redirected" is enabled.
The issue was fixed in https://github.com/knative/serving/pull/5149 (a few days ago), so you can deploy serving from the latest code.

Sorry https://github.com/knative/serving/pull/5149 did not fix the issue for Redirect option (it just fix for Disabled option). I will file the issue ticket.

Oops... I'm sorry, I understood what was the "complications". 0.8.x has an issue #5129. Your log produces "Probing of 10.0.0.16 failed: ready: false, error: <nil>". The probing fails with error: <nil> happens when "Redirected" is enabled.

Yes, that's exactly what I ran into and then stopped using Redirected.

Thanks @nak3 for taking a look at this issue.
According to the certificate result in https://github.com/knative/serving/issues/5181#issuecomment-522209196, the certificates were successfully provisioned.

As we already identified the root cause and used the issue https://github.com/knative/serving/issues/5192 to track it, I am gonna close this issue.

Feel free to reopen it if anyone has questions.

@ZhiminXiang I don't think we can use #5192 to track the issue I am reporting. The issue reported here is that the knative-ingress-gateway configurations gets overwritten because of reconciliation.

For now I have disabled AutoTLS but this would invalidate all our certificates in about 3 months.

@munjal-patel It is expected that the wildcard HTTPS server in knative-ingress-gateway gets overwritten when auto TLS is enabled because the wildcard HTTPS server conflicts with the HTTPS server generated by auto TLS feature.

In your case, if you still need your HTTPS configuration when auto TLS is enabled, you can 1) change the port name of the default HTTPS server from https to something else, AND 2) replace the wildcard host with the specific hosts that you want to cover.

@ZhiminXiang I understand that they both conflict. However, it does not make sense to _not_ ~change~ fix that behavior. Both wildcard certificates _and_ enabling of Auto TLS are equally important in a lot of deployment scenarios.

2) replace the wildcard host with the specific hosts that you want to cover

This is not practical nor scalable! There will be lots and lots of services running in a cluster. It will be very expensive and maintenance hell to configure domains for each of them manually.

@munjal-patel We have a tracking issue to make manual TLS and auto TLS work together https://github.com/knative/serving/issues/4631. In general, with this feature, you can provide a namespace level certificate (e.g. certificate for domain *.default.example.com). And it will be picked up by Knative, and applied to Routes of the namespace.

Currently, if you already have the namespace level certificate (e.g. certificate for domain *.default.example.com), you can modify the Gateway with 1) changing the port name of the default HTTPS server from https to something else, AND 2) replacing the wildcard host (*) with the namespace-level hosts *.default.example.com.

I am not sure what you expect when keeping both auto TLS and global wildcard host.

OK. Thanks for the workaround.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ZhiminXiang picture ZhiminXiang  路  3Comments

greghaynes picture greghaynes  路  6Comments

maxiloEmmmm picture maxiloEmmmm  路  4Comments

VladimirSmogitel picture VladimirSmogitel  路  7Comments

alexnederlof picture alexnederlof  路  5Comments