skaffold ignores init-containers status

Created on 12 Apr 2020  Â·  20Comments  Â·  Source: GoogleContainerTools/skaffold

Expected behavior

skaffold should wait for the init-conatiners to finish running.
I use init-containers to perform some actions (create config files, wait for DBs to load, etc).
when bootstrapping an entire local dev environment this is a must, especially when using multiple DBs and frameworks. Some takes a while to load in a container (like Kafka, Aerospike, etc)

Actual behavior

when using the new skaffold version, it ignores the init-containers.
(in previous versions of skaffold, I would see the logs of these init-containers printed to the logger when using the skaffold dev command).
also, after 2 minutes skaffold exits with an error -
could not stabilize within 2m0s: context deadline exceeded.

Information

  • Skaffold version: 1.7.0
  • Operating system: macOS
areerrors arestatus-check kinbug prioritp2

Most helpful comment

I was facing the some other issue in minikube. I manually deleted the cluster. Once I created a new cluster I started facing the same issue could not stabilize within 2m0s: context deadline exceeded.. because I forgot to create a secret in the cluster.
next, I created a secret using the below command, and it started working fine.
kubectl create secret generic jwt-secret --from-literal=JWT_KEY=asdf

All 20 comments

Would it be possible to provide reproduction instructions so that I could replicate this locally?

@tstromberg consider the following example -

skaffold yaml

apiVersion: skaffold/v2beta2
kind: Config
build:
  local:
    push: false
deploy:
  kubectl:
    manifests:
      - apps/**

and in the local apps folder, the following manifest -

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app2
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app2
  template:
    metadata:
      labels:
        app: app2
    spec:
      restartPolicy: Always
      containers:
        - name: app
          image: nginx:latest
          imagePullPolicy: IfNotPresent
          ports:
            - name: app-port
              containerPort: 3000
          command: ["/bin/sh"]
          args: ["-c", "while true; do echo sleeping in app2; sleep 2;done"]
---
apiVersion: v1
kind: Service
metadata:
  name: app2
spec:
  selector:
    app: app2
  type: NodePort
  ports:
  - name: app2
    port: 3000
    targetPort: 3000
    nodePort: 30300

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      restartPolicy: Always
      initContainers:
        - name: check-app2
          image: busybox:1.30.0
          command:
            [
              "sh",
              "-c",
              'while ! nc -zv -w 1 app2 3000 > /dev/null 2>&1; do echo "waiting for app2 to be available"; sleep 3; done;',
            ]
      containers:
        - name: app
          image: nginx:latest
          imagePullPolicy: IfNotPresent
          ports:
            - name: app-port
              containerPort: 8080

(consider app2 is a DB which takes a little while to load).

when running the skaffold dev command, it will fail after 2 min -

➜ skaffold dev
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
 - deployment.apps/app2 created
 - service/app2 created
 - deployment.apps/app created
Waiting for deployments to stabilize...
 - deployment/app: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - deployment/app2: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - deployment/app2 is ready. [1/2 deployment(s) still pending]


 - deployment/app failed. Error: could not stabilize within 2m0s: context deadline exceeded.
Cleaning up...
 - deployment.apps "app2" deleted
 - service "app2" deleted
 - deployment.apps "app" deleted
FATA[0120] exiting dev mode because first deploy failed: 1/2 deployment(s) failed

I consider this a regression, but in the mean-time, you should be able to workaround this using: --status-check=false

@tstromberg thank you for the workaround tip.
is there also an option to stream the init-containers logs to the console?

we don't use initContainers but still fails with the same error. I think if for some reason the container initialization is slow it will get killed after 2 min. I am updating our server over very flaky internet connection. I wonder if the cause of slowness is communicating with the local cli client of skaffold.

@giladsh1 - not that I am aware of (yet)

I'm experiencing the same, and thought an additional MRE would be helpful. In my case running postgres:

skaffold.yaml:


apiVersion: skaffold/v2beta2
kind: Config
profiles:
  - name: dev
    activation:
      - command: dev
    deploy:
      kubectl:
        manifests:
          - k8s/postgres.yaml

k8s/postgres.yaml


apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:10.4
          ports:
            - containerPort: 5432
          envFrom:
            - configMapRef:
                name: postgres-config
          volumeMounts:
            - mountPath: "/var/lib/postgresql/data"
              name: postgresdb
      volumes:
        - name: postgresdb
          persistentVolumeClaim:
            claimName: postgres-persistent-volume-claim

---

apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  type: NodePort
  ports:
   - port: 5432
  selector:
   app: postgres

---

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  labels:
    app: postgres
data:
  POSTGRES_DB: postgres_db
  POSTGRES_USER: postgres_user
  POSTGRES_PASSWORD: postgres_password

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-persistent-volume
  labels:
    type: local
    app: postgres
spec:
  storageClassName: manual
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "./data"

---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: postgres-persistent-volume-claim
  labels:
    app: postgres
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi

Run without the status-check=false workaround, the logs say:

$ skaffold dev
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
 - deployment.apps/postgres created
 - service/postgres created
 - configmap/postgres-config created
 - persistentvolume/postgres-persistent-volume created
 - persistentvolumeclaim/postgres-persistent-volume-claim created
Waiting for deployments to stabilize...
 - deployment/postgres: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - deployment/postgres failed. Error: could not stabilize within 2m0s: context deadline exceeded.
Cleaning up...
 - deployment.apps "postgres" deleted
 - service "postgres" deleted
 - configmap "postgres-config" deleted
 - persistentvolume "postgres-persistent-volume" deleted
 - persistentvolumeclaim "postgres-persistent-volume-claim" deleted
FATA[0123] exiting dev mode because first deploy failed: 1/1 deployment(s) failed 

I had the same problem. For me it turned out my secret pod wasn't running. I created a secret object again and everything worked fine. I hope this helps in someway

@shashang29 thanks for the input!
can you please explain WDYM by secret pod?

We recently change the default status check deadline from 10 mins to 2 mins when we turned status check on by default.

This helped users who application was small to get early feedback.
Can you please update your skaffold.yaml and add statusCheckDeadlineSeconds to deploy config section to increase the status check deadline?

@tejal29 thank you for the proposal.
I assume this will work, however it doesn't fix the init-containers logs streaming to the console, which is unfortunate.
I will test and reply soon

@tejal29 I may be wrong, but I think this flag is only relevant when using kustomize, which I'm not...

I was facing the some other issue in minikube. I manually deleted the cluster. Once I created a new cluster I started facing the same issue could not stabilize within 2m0s: context deadline exceeded.. because I forgot to create a secret in the cluster.
next, I created a secret using the below command, and it started working fine.
kubectl create secret generic jwt-secret --from-literal=JWT_KEY=asdf

@giladsh1, we now integrated pod health check with --status-check.

On latest master, you can now see why a deployment is pending.

On your example,


tejaldesai@tejaldesai-macbookpro2 init_container~/workspace/skaffold/out/skaffold dev --status-check=true
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
 - deployment.apps/app2 created
 - service/app2 created
 - deployment.apps/app created
Waiting for deployments to stabilize...
 - deployment/app: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/app-6759645cc4-vwldw: container check-app2 in error: 
 - deployment/app2: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/app2-85d4d7cdf5-rd7rg: creating container app
 - deployment/app2 is ready. [1/2 deployment(s) still pending]

I will look into why the init container logs are not present.

@thclark your example, we can see the persistent volume claim being not present.

tejaldesai@tejaldesai-macbookpro2 init_container~/workspace/skaffold/out/skaffold dev -d gcr.io/tejal --status-check=true
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
 - deployment.apps/postgres created
 - service/postgres created
 - configmap/postgres-config created
 - persistentvolume/postgres-persistent-volume created
 - persistentvolumeclaim/postgres-persistent-volume-claim created
Waiting for deployments to stabilize...
 - deployment/postgres: waiting for rollout to finish: 0 of 1 updated replicas are available...
    - pod/postgres-7bdc7fd9dc-8fhcz: Unschedulable: persistentvolumeclaim "postgres-persistent-volume-claim" not found

This experience is better in v1.11.0 and we will continue to improve this.

Changing the priority from p1 to p2 since we are not surfacing errors to the users.

@giladsh1 Skaffold currently only logs images that it has built. Your initContainer is using busybox and so its logs aren't shown. This is filed as issue #3712.

Closing this issue as I believe the original issue is now fixed.

@briandealwis I built my initContainers, and neither _logs_ nor _status_ are shown:

$ skaffold version
v1.12.2-k3dv3-patched  # 1.12.1 for k3d v3

```console
$ skaffold dev --status-check=true
[...]
Waiting for deployments to stabilize...

  • deployment/aedir-provider: waiting for rollout to finish: 0 of 1 updated replicas are available...
  • deployment/aedir-provider: waiting for rollout to finish: 0 of 1 updated replicas are available...

    • pod/aedir-provider-645b58dbf8-gvb8c: BackOff: Back-off restarting failed container

  • deployment/aedir-provider failed. Error: could not stabilize within 2m0s: context deadline exceeded.
    [...]
with:
```yaml
[...]
      initContainers:
        - name: config-initializer
          image: ae-dir/config-initializer
[...]
        - name: config-checker
          image: ae-dir/openldap-ms
[...]
      containers:
        - name: ldap
          image: ae-dir/openldap-ms

If I'm quick and happy enough, from a different terminal, I can get:

$ kubectl logs aedir-provider-645b58dbf8-gvb8c -c config-checker
5f1ce14e lt_dlopenext failed: (back_mdb) file not found
slaptest: bad configuration file!

Only if I'm quick enough -> unusable.

Hi @blaggacao, I'm going to try to fix your issue.

For now, what I can say is that when the application is fine, I can see the logs for both initContainers and normal containers. Is it the same for you?

This could need some debugging in the diag package.

Is it the same for you?

:smile: I haven't got there yet, where the application is fine. I assume it is.

I'm going to try to fix your issue.

Thank you, much appreciated. Then I'll be empowered to respond the above question, for sure.

Was this page helpful?
0 / 5 - 0 ratings