Hi All,
Require your help as new to kubeflow. Basically in order to understand kubeflow i followed steps by steps to deploy an example which is - summarizing GitHub issues using a trained model. I deployed this on Kubeflow Version (0.4.0-rc.2). For some of the pods i am getting the below error as they are kept in the ContainerCreating stage. Also i have shown the logs related to describe. Please help me on this.
NAME READY STATUS RESTARTS AGE
ambassador-5cf8cd97d5-4tz5h 1/1 Running 0 10h
ambassador-5cf8cd97d5-fr2bt 1/1 Running 0 10h
ambassador-5cf8cd97d5-k8r75 1/1 Running 0 10h
argo-ui-7c9c69d464-hx654 1/1 Running 0 9h
backend-updater-0 0/1 ContainerCreating 0 9h
centraldashboard-6f47d694bd-qtlwg 1/1 Running 0 10h
cert-manager-5cb7b9fb67-7ckfk 1/1 Running 0 9h
cloud-endpoints-controller-5888c755cb-bkvfz 0/1 ContainerCreating 0 9h
cm-acme-http-solver-prjrf 1/1 Running 0 9h
envoy-69bf97959c-gbnhd 0/2 ContainerCreating 0 9h
envoy-69bf97959c-sm5fx 0/2 ContainerCreating 0 9h
envoy-69bf97959c-vxfhz 0/2 ContainerCreating 0 9h
iap-enabler-6df4f5bcd9-4v5mh 0/1 ContainerCreating 0 9h
ingress-bootstrap-wm4lx 1/1 Running 0 9h
issue-summarization-issue-summarization-issue-summarizatiokvflq 1/1 Running 1 9h
issue-summarization-issue-summarization-issue-summarizatioqg4cg 1/1 Running 6 9h
kubectl describe pod/envoy-69bf97959c-gbnhd -n kubeflow:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 8m (x257 over 9h) kubelet, gke-kubeflow-qwiklab-default-pool-abefdec2-q2jp Unable to mount volumes for pod "envoy-69bf97959c-gbnhd_kubeflow(8e383413-8494-11e9-b62f-42010a800051)": timeout expired waiting for volumes to attach or mount for pod "kubeflow"/"envoy-69bf97959c-gbnhd". list of unmounted volumes=[sa-key]. list of unattached volumes=[config-volume shared sa-key envoy-token-8s65v]
Warning FailedMount 3s (x298 over 9h) kubelet, gke-kubeflow-qwiklab-default-pool-abefdec2-q2jp MountVolume.SetUp failed for volume "sa-key" : secrets "admin-gcp-sa" not found
Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.
Is there a particular reason you are using an RC for 0.4 as opposed to using 0.5 which is our most recent stable release
https://v0-5.kubeflow.org/docs/gke/deploy/
The error indicates the k8s secret admin-gcp-sa was not created.
I would suggest trying to follow the 0.5 instructions and letting us know if you still have problems.
If you followed some doc/tutorial telling you to use 0.4 could you please let us know so we could get it updated?
Thanks @jlewi . This is resolved. I created secret in 0.5 and this was resolved.
I used this to resolve the issue:
kubectl get secret admin-gcp-sa --export -o yaml | kubectl apply --namespace=istio-system -f -
Most helpful comment
I used this to resolve the issue:
kubectl get secret admin-gcp-sa --export -o yaml | kubectl apply --namespace=istio-system -f -