Serving: Magic DNS Pod Error on GKE

Created on 7 Mar 2020  Â·  11Comments  Â·  Source: knative/serving

$title

In what area(s)?

/area networking

What version of Knative?

0.13.x

Actual Behavior

kubectl get pods -n knative-serving

NAME READY STATUS RESTARTS AGE
activator-869f6d4f9f-fttmj 2/2 Running 0 62m
autoscaler-78994c9fdf-fhdnw 2/2 Running 0 62m
controller-b94c5b667-n5llq 2/2 Running 0 62m
default-domain-sd6sk 1/2 Error 0 72s
networking-istio-5847754959-tlhtx 1/1 Running 0 61m
webhook-7cdb467d79-45pzq 2/2 Running 2 62m

kubectl logs -n knative-serving default-domain-sd6sk default-domain

W0307 04:32:57.965250 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. {"level":"fatal","ts":1583555577.9773126,"logger":"fallback.default-domain","caller":"default-domain/main.go:173","msg":"Error getting ConfigMap","error":"Get https://10.0.0.1:443/api/v1/namespaces/knative-serving/configmaps/config-domain: dial tcp 10.0.0.1:443: connect: connection refused","stacktrace":"main.main\n\tknative.dev/serving/cmd/default-domain/main.go:173\nruntime.main\n\truntime/proc.go:203"}

Steps to Reproduce the Problem

Simply follow https://knative.dev/docs/install/any-kubernetes-cluster/ in a new GKE with istio
pre installed

kinbug

All 11 comments

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

I hit a similar issue on default-domain Pod.

✗ k logs -f default-domain-zf8wk -n knative-serving
W0619 03:18:16.158822       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
{"level":"fatal","ts":1592536696.2584615,"logger":"fallback.default-domain","caller":"default-domain/main.go:197","msg":"Error finding gateway address","error":"the server could not find the requested resource (post ingresses.networking.internal.knative.dev)","stacktrace":"main.main\n\tknative.dev/serving/cmd/default-domain/main.go:197\nruntime.main\n\truntime/proc.go:203"}

I think there are two separate issues here.

I suspect that the ORIGINAL issue is that the default-domain job was injected with a sidecar, which blocked communication with the API server. Get https://10.0.0.1:443/api/v1/namespaces/knative-serving/configmaps/config-domain: dial tcp 10.0.0.1:443: connect: connection refused

The other issue looks like maybe the default-domain Job's first replica came up faster than the kingress CRD was registered leading to: the server could not find the requested resource (post ingresses.networking.internal.knative.dev)

/assign @ZhiminXiang @tcnghia

as our local resident GKE networking gurus.

I think we already configured a retry for the Job, so the second issue should be resolved in a few retries.

I hit similar issue like this and it panics at the end eventually , I'm using ambassador instead of istio:

kubectl -n knative-serving logs default-domain-k25pv 
W1022 08:30:21.290614       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x158b7ff]

goroutine 1 [running]:
main.findGatewayAddress(0x1cbc7c0, 0xc00042a000, 0xc0003a6160, 0xc000242ee0, 0x0, 0x0, 0x0)
    knative.dev/serving/cmd/default-domain/main.go:135 +0x75f
main.main()
    knative.dev/serving/cmd/default-domain/main.go:197 +0x994

This means the ing.Status.PublicLoadBalancer is nil. I guess it's easy to add a check, but rather it seems an error with Ambassador that they don't _set_ this field.

Thanks, @vagababov for the reply, I didn't add a load balancer when I install ambassador, the load balancer is added after installing ambassador. I'll test if adding a load balancer before installing ambassador can avoid this issue.

@vagababov I've reinstalled ambassador , and still got the same nil pointer issue.
Should I open an issue to ambassador? or who from ambassador should I @ ?

Thanks
Ben

@benjaminhuo Yeah, could you please open a issue ticket against ambassador? I still don't know who is the best person to add @ (@alexgervais maybe?) but let's just report it.

The culprit is that ambassador still uses deprecated ingress.status.LoadBalancer instead of ingress.status.PublicLoadBalancer.

https://github.com/datawire/ambassador/blob/0fcc86d0ea7557245d3e31eb2ae21c9129d43922/python/ambassador/fetch/knative.py#L129-L130

Sure, it's amazing that you already find the root cause in ambassador's code...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maxiloEmmmm picture maxiloEmmmm  Â·  4Comments

wtam2018 picture wtam2018  Â·  4Comments

tcnghia picture tcnghia  Â·  3Comments

ysjjovo picture ysjjovo  Â·  5Comments

mattmoor picture mattmoor  Â·  7Comments