For some reason today when I rolled out a new version to one of our deployments the pod got stuck in ContainerCreating with this error events:
1h 1m 37 some-api-2275263275-01pq7 Pod Warning FailedMount {kubelet gke-cluster-1-default-pool-4399eaa3-os4v} Unable to mount volumes for pod "some-api-2275263275-01pq7_default(afc5ae68-5b5e-11e6-afbb-42010a800105)": timeout expired waiting for volumes to attach/mount for pod "some-api-2275263275-01pq7"/"default". list of unattached/unmounted volumes=[default-token-880jy]
1h 1m 37 some-api-2275263275-01pq7 Pod Warning FailedSync {kubelet gke-cluster-1-default-pool-4399eaa3-os4v} Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "some-api-2275263275-01pq7"/"default". list of unattached/unmounted volumes=[default-token-880jy]
I then attempted to scale the cluster and more than 75% of the previously running pods switched to ContainerCreating and also got stuck there. This caused widespread failure in our system and I had to quickly create a new cluster.
We're using google cloud platform's container engine and the cluster version is 1.3.2.
@montanaflynn There were a number of storage related issues with v1.3.2 that were fixed with v1.3.4. You probably hit one of those.
If you share the complete /var/log/kubelet log
from a node with a stuck deployment I can take a look and confirm if it's a known issue or not. I'd need your GKE project name/cluster name/zone as well to grab your master logs. Feel free to email me if you don't want to share publicly.
I saw a similar issue with v1.3.3 but in my case, the root cause was a lot more pedestrian. My deployment requires a secrets volume and I had forgotten to create the associated secret for the cluster to which I was trying to perform the new deployment. I saw no errors when using kubectl describe
or kubectl logs
but eventually realized that the deployment stayed stuck in ContainerCreating
state (without logs afaict) if a volume that it depended on is missing.
This issue is stale. Closing.
Most helpful comment
I saw a similar issue with v1.3.3 but in my case, the root cause was a lot more pedestrian. My deployment requires a secrets volume and I had forgotten to create the associated secret for the cluster to which I was trying to perform the new deployment. I saw no errors when using
kubectl describe
orkubectl logs
but eventually realized that the deployment stayed stuck inContainerCreating
state (without logs afaict) if a volume that it depended on is missing.