/area networking
/kind bug
tl;dr deployed KService stops working after a while, starts returning 503 from the gateway when invoked externally. cluster-local domain works fine. redeploying fixes the issue, only to be broken later again N hours.
v0.9.0-gke.3, istio no-mesh mode
I deploy a very basic KService like this
apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
name: hello
spec:
template:
spec:
containers:
- image: gcr.io/google-samples/hello-app:1.0
I query it like
curl -vH "Host: hello.default.example.com" 34.68.171.188
Nothing interesting.
## Actual Behavior
After some time (~6-12 hours based on my estimates) this service starts returning HTTP 503 Service Unavailable from the gateway. **However,** the cluster-local route continues to work fine.
curl -vH "Host: hello.default.example.com" 34.68.171.188
< HTTP/1.1 503 Service Unavailable
< date: Tue, 05 Nov 2019 22:49:04 GMT
< server: istio-envoy
< content-length: 0
<
* Connection #0 to host 34.68.171.188 left intact
Traffic isn't even making it to activator or pod, the pod isn't scaling up from 0-to-1 when this happens.
I literally take the same YAML and change `name: hello` to `name: hello2` and redeploy as a new KService, **it works just fine**.
I've been observing this for several days, and delete/redeploy seems to be working, I am not able to explain.
Here are some outputs:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"serving.knative.dev/v1alpha1","kind":"Service","metadata":{"annotations":{},"name":"hello","namespace":"default"},"spec":{"template":{"spec":{"containers":[{"image":"gcr.io/google-samples/hello-app:1.0"}]}}}}
serving.knative.dev/creator: [email protected]
serving.knative.dev/lastModifier: [email protected]
creationTimestamp: "2019-11-05T05:28:51Z"
generation: 1
name: hello
namespace: default
resourceVersion: "43474394"
selfLink: /apis/serving.knative.dev/v1/namespaces/default/services/hello
uid: 26e02c24-ff8d-11e9-a378-42010a80012d
spec:
template:
metadata:
creationTimestamp: null
spec:
containerConcurrency: 0
containers:
- image: gcr.io/google-samples/hello-app:1.0
name: user-container
readinessProbe:
successThreshold: 1
tcpSocket:
port: 0
resources: {}
timeoutSeconds: 300
traffic:
- latestRevision: true
percent: 100
status:
address:
url: http://hello.default.svc.cluster.local
conditions:
- lastTransitionTime: "2019-11-05T05:28:54Z"
status: "True"
type: ConfigurationsReady
- lastTransitionTime: "2019-11-05T05:28:56Z"
status: "True"
type: Ready
- lastTransitionTime: "2019-11-05T05:28:56Z"
status: "True"
type: RoutesReady
latestCreatedRevisionName: hello-kmjgg
latestReadyRevisionName: hello-kmjgg
observedGeneration: 1
traffic:
- latestRevision: true
percent: 100
revisionName: hello-kmjgg
url: http://hello.default.example.com
kubectl get revision
apiVersion: serving.knative.dev/v1
kind: Revision
metadata:
annotations:
serving.knative.dev/creator: [email protected]
serving.knative.dev/lastPinned: "1572991246"
creationTimestamp: "2019-11-05T05:28:51Z"
generateName: hello-
generation: 1
labels:
serving.knative.dev/configuration: hello
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/route: hello
serving.knative.dev/service: hello
name: hello-kmjgg
namespace: default
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Configuration
name: hello
uid: 26e19591-ff8d-11e9-a378-42010a80012d
resourceVersion: "43690190"
selfLink: /apis/serving.knative.dev/v1/namespaces/default/revisions/hello-kmjgg
uid: 26e3420f-ff8d-11e9-a378-42010a80012d
spec:
containerConcurrency: 0
containers:
- image: gcr.io/google-samples/hello-app:1.0
name: user-container
readinessProbe:
successThreshold: 1
tcpSocket:
port: 0
resources: {}
timeoutSeconds: 300
status:
conditions:
- lastTransitionTime: "2019-11-05T05:29:54Z"
message: The target is not receiving traffic.
reason: NoTraffic
severity: Info
status: "False"
type: Active
- lastTransitionTime: "2019-11-05T05:28:54Z"
status: "True"
type: ContainerHealthy
- lastTransitionTime: "2019-11-05T05:28:54Z"
status: "True"
type: Ready
- lastTransitionTime: "2019-11-05T05:28:54Z"
status: "True"
type: ResourcesAvailable
imageDigest: gcr.io/google-samples/hello-app@sha256:c62ead5b8c15c231f9e786250b07909daf6c266d0fcddd93fea882eb722c3be4
logUrl: https://console.cloud.google.com/logs/viewer?advancedFilter=resource.type%3D%22k8s_container%22%0Aresource.labels.container_name%3D%22user-container%22%0Alabels.%22k8s-pod%2Fserving_knative_dev%2FrevisionUID%22%3D%2226e3420f-ff8d-11e9-a378-42010a80012d%22
observedGeneration: 1
serviceName: hello-kmjgg
kubectl get route
apiVersion: serving.knative.dev/v1
kind: Route
metadata:
annotations:
serving.knative.dev/creator: [email protected]
serving.knative.dev/lastModifier: [email protected]
creationTimestamp: "2019-11-05T05:28:51Z"
finalizers:
- routes.serving.knative.dev
generation: 1
labels:
serving.knative.dev/service: hello
name: hello
namespace: default
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Service
name: hello
uid: 26e02c24-ff8d-11e9-a378-42010a80012d
resourceVersion: "43474393"
selfLink: /apis/serving.knative.dev/v1/namespaces/default/routes/hello
uid: 26e3fdf6-ff8d-11e9-a378-42010a80012d
spec:
traffic:
- configurationName: hello
latestRevision: true
percent: 100
status:
address:
url: http://hello.default.svc.cluster.local
conditions:
- lastTransitionTime: "2019-11-05T05:28:54Z"
status: "True"
type: AllTrafficAssigned
- lastTransitionTime: "2019-11-05T05:28:56Z"
status: "True"
type: IngressReady
- lastTransitionTime: "2019-11-05T05:28:56Z"
status: "True"
type: Ready
observedGeneration: 1
traffic:
- latestRevision: true
percent: 100
revisionName: hello-kmjgg
url: http://hello.default.example.com
kubectl get ingress.networking
apiVersion: networking.internal.knative.dev/v1alpha1
kind: Ingress
metadata:
annotations:
networking.knative.dev/ingress.class: istio.ingress.networking.knative.dev
serving.knative.dev/creator: [email protected]
serving.knative.dev/lastModifier: [email protected]
creationTimestamp: "2019-11-05T05:28:54Z"
generation: 1
labels:
serving.knative.dev/route: hello
serving.knative.dev/routeNamespace: default
serving.knative.dev/service: hello
name: hello
namespace: default
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Route
name: hello
uid: 26e3fdf6-ff8d-11e9-a378-42010a80012d
resourceVersion: "43690196"
selfLink: /apis/networking.internal.knative.dev/v1alpha1/namespaces/default/ingresses/hello
uid: 28a0037b-ff8d-11e9-a378-42010a80012d
spec:
rules:
- hosts:
- hello.default.svc.cluster.local
- hello.default.example.com
http:
paths:
- retries:
attempts: 3
perTryTimeout: 15m0s
splits:
- appendHeaders:
Knative-Serving-Namespace: default
Knative-Serving-Revision: hello-kmjgg
percent: 100
serviceName: hello-kmjgg
serviceNamespace: default
servicePort: 80
timeout: 15m0s
visibility: ExternalIP
visibility: ExternalIP
status:
conditions:
- lastTransitionTime: "2019-11-05T22:00:46Z"
status: "True"
type: LoadBalancerReady
- lastTransitionTime: "2019-11-05T05:28:54Z"
status: "True"
type: NetworkConfigured
- lastTransitionTime: "2019-11-05T22:00:46Z"
status: "True"
type: Ready
loadBalancer:
ingress:
- domainInternal: istio-ingress.gke-system.svc.cluster.local
observedGeneration: 1
privateLoadBalancer:
ingress:
- domainInternal: cluster-local-gateway.gke-system.svc.cluster.local
publicLoadBalancer:
ingress:
- domainInternal: istio-ingress.gke-system.svc.cluster.local
kubectl get virtualservice
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
annotations:
networking.knative.dev/ingress.class: istio.ingress.networking.knative.dev
serving.knative.dev/creator: [email protected]
serving.knative.dev/lastModifier: [email protected]
creationTimestamp: "2019-11-05T05:28:54Z"
generation: 1
labels:
serving.knative.dev/route: hello
serving.knative.dev/routeNamespace: default
name: hello
namespace: default
ownerReferences:
- apiVersion: networking.internal.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Ingress
name: hello
uid: 28a0037b-ff8d-11e9-a378-42010a80012d
resourceVersion: "43474383"
selfLink: /apis/networking.istio.io/v1alpha3/namespaces/default/virtualservices/hello
uid: 28ae02b1-ff8d-11e9-a378-42010a80012d
spec:
gateways:
- knative-serving/cluster-local-gateway
- knative-serving/gke-system-gateway
- knative-serving/knative-ingress-gateway
hosts:
- hello.default
- hello.default.example.com
- hello.default.svc
- hello.default.svc.cluster.local
- c0d2f6b75318fcbab3006314bec06026.probe.invalid
http:
- match:
- authority:
regex: ^hello\.default\.example\.com(?::\d{1,5})?$
gateways:
- knative-serving/gke-system-gateway
- knative-serving/knative-ingress-gateway
- authority:
regex: ^hello\.default(\.svc(\.cluster\.local)?)?(?::\d{1,5})?$
gateways:
- knative-serving/cluster-local-gateway
retries:
attempts: 3
perTryTimeout: 15m0s
route:
- destination:
host: hello-kmjgg.default.svc.cluster.local
port:
number: 80
headers:
request:
add:
Knative-Serving-Namespace: default
Knative-Serving-Revision: hello-kmjgg
weight: 100
timeout: 15m0s
websocketUpgrade: true
- fault:
abort:
httpStatus: 200
percent: 100
match:
- authority:
exact: c0d2f6b75318fcbab3006314bec06026.probe.invalid
route:
- destination:
host: null.invalid
port:
number: 80
weight: 0
(no logs on istio-ingressgateway-* pod while I query the ksvc)
(no logs on activator* pod while I query the ksvc)
Use the yaml above, wait for several hours. Then query the service, observe 503.
Use the same yaml, change name, redeploy, query. Observe http 200.
How do you check that traffic doesn't make it to the activator?
Are there any relevant logs out in the activator?
Because of an upgrade from 0.6 to 0.9, some orphan VirtualService were left in knative-serving leading to an invalid Envoy config (non-existing backend).
Indeed that was the problem. We can /close if direct upgrades are not supported (at least for now).
/close
@JRBANCEL: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Because of an upgrade from 0.6 to 0.9, some orphan
VirtualServicewere left inknative-servingleading to an invalid Envoy config (non-existing backend).