skaffold should wait for the init-conatiners to finish running.
I use init-containers to perform some actions (create config files, wait for DBs to load, etc).
when bootstrapping an entire local dev environment this is a must, especially when using multiple DBs and frameworks. Some takes a while to load in a container (like Kafka, Aerospike, etc)
when using the new skaffold version, it ignores the init-containers.
(in previous versions of skaffold, I would see the logs of these init-containers printed to the logger when using the skaffold dev command).
also, after 2 minutes skaffold exits with an error -
could not stabilize within 2m0s: context deadline exceeded.
Would it be possible to provide reproduction instructions so that I could replicate this locally?
@tstromberg consider the following example -
apiVersion: skaffold/v2beta2
kind: Config
build:
local:
push: false
deploy:
kubectl:
manifests:
- apps/**
and in the local apps folder, the following manifest -
apiVersion: apps/v1
kind: Deployment
metadata:
name: app2
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: app2
template:
metadata:
labels:
app: app2
spec:
restartPolicy: Always
containers:
- name: app
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- name: app-port
containerPort: 3000
command: ["/bin/sh"]
args: ["-c", "while true; do echo sleeping in app2; sleep 2;done"]
---
apiVersion: v1
kind: Service
metadata:
name: app2
spec:
selector:
app: app2
type: NodePort
ports:
- name: app2
port: 3000
targetPort: 3000
nodePort: 30300
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
restartPolicy: Always
initContainers:
- name: check-app2
image: busybox:1.30.0
command:
[
"sh",
"-c",
'while ! nc -zv -w 1 app2 3000 > /dev/null 2>&1; do echo "waiting for app2 to be available"; sleep 3; done;',
]
containers:
- name: app
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- name: app-port
containerPort: 8080
(consider app2 is a DB which takes a little while to load).
when running the skaffold dev command, it will fail after 2 min -
➜ skaffold dev
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
- deployment.apps/app2 created
- service/app2 created
- deployment.apps/app created
Waiting for deployments to stabilize...
- deployment/app: waiting for rollout to finish: 0 of 1 updated replicas are available...
- deployment/app2: waiting for rollout to finish: 0 of 1 updated replicas are available...
- deployment/app2 is ready. [1/2 deployment(s) still pending]
- deployment/app failed. Error: could not stabilize within 2m0s: context deadline exceeded.
Cleaning up...
- deployment.apps "app2" deleted
- service "app2" deleted
- deployment.apps "app" deleted
FATA[0120] exiting dev mode because first deploy failed: 1/2 deployment(s) failed
I consider this a regression, but in the mean-time, you should be able to workaround this using: --status-check=false
@tstromberg thank you for the workaround tip.
is there also an option to stream the init-containers logs to the console?
we don't use initContainers but still fails with the same error. I think if for some reason the container initialization is slow it will get killed after 2 min. I am updating our server over very flaky internet connection. I wonder if the cause of slowness is communicating with the local cli client of skaffold.
@giladsh1 - not that I am aware of (yet)
I'm experiencing the same, and thought an additional MRE would be helpful. In my case running postgres:
apiVersion: skaffold/v2beta2
kind: Config
profiles:
- name: dev
activation:
- command: dev
deploy:
kubectl:
manifests:
- k8s/postgres.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:10.4
ports:
- containerPort: 5432
envFrom:
- configMapRef:
name: postgres-config
volumeMounts:
- mountPath: "/var/lib/postgresql/data"
name: postgresdb
volumes:
- name: postgresdb
persistentVolumeClaim:
claimName: postgres-persistent-volume-claim
---
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
type: NodePort
ports:
- port: 5432
selector:
app: postgres
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
labels:
app: postgres
data:
POSTGRES_DB: postgres_db
POSTGRES_USER: postgres_user
POSTGRES_PASSWORD: postgres_password
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-persistent-volume
labels:
type: local
app: postgres
spec:
storageClassName: manual
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
hostPath:
path: "./data"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: postgres-persistent-volume-claim
labels:
app: postgres
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
Run without the status-check=false workaround, the logs say:
$ skaffold dev
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
- deployment.apps/postgres created
- service/postgres created
- configmap/postgres-config created
- persistentvolume/postgres-persistent-volume created
- persistentvolumeclaim/postgres-persistent-volume-claim created
Waiting for deployments to stabilize...
- deployment/postgres: waiting for rollout to finish: 0 of 1 updated replicas are available...
- deployment/postgres failed. Error: could not stabilize within 2m0s: context deadline exceeded.
Cleaning up...
- deployment.apps "postgres" deleted
- service "postgres" deleted
- configmap "postgres-config" deleted
- persistentvolume "postgres-persistent-volume" deleted
- persistentvolumeclaim "postgres-persistent-volume-claim" deleted
FATA[0123] exiting dev mode because first deploy failed: 1/1 deployment(s) failed
I had the same problem. For me it turned out my secret pod wasn't running. I created a secret object again and everything worked fine. I hope this helps in someway
@shashang29 thanks for the input!
can you please explain WDYM by secret pod?
We recently change the default status check deadline from 10 mins to 2 mins when we turned status check on by default.
This helped users who application was small to get early feedback.
Can you please update your skaffold.yaml and add statusCheckDeadlineSeconds to deploy config section to increase the status check deadline?
@tejal29 thank you for the proposal.
I assume this will work, however it doesn't fix the init-containers logs streaming to the console, which is unfortunate.
I will test and reply soon
@tejal29 I may be wrong, but I think this flag is only relevant when using kustomize, which I'm not...
I was facing the some other issue in minikube. I manually deleted the cluster. Once I created a new cluster I started facing the same issue could not stabilize within 2m0s: context deadline exceeded.. because I forgot to create a secret in the cluster.
next, I created a secret using the below command, and it started working fine.
kubectl create secret generic jwt-secret --from-literal=JWT_KEY=asdf
@giladsh1, we now integrated pod health check with --status-check.
On latest master, you can now see why a deployment is pending.
On your example,
tejaldesai@tejaldesai-macbookpro2 init_container~/workspace/skaffold/out/skaffold dev --status-check=true
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
- deployment.apps/app2 created
- service/app2 created
- deployment.apps/app created
Waiting for deployments to stabilize...
- deployment/app: waiting for rollout to finish: 0 of 1 updated replicas are available...
- pod/app-6759645cc4-vwldw: container check-app2 in error:
- deployment/app2: waiting for rollout to finish: 0 of 1 updated replicas are available...
- pod/app2-85d4d7cdf5-rd7rg: creating container app
- deployment/app2 is ready. [1/2 deployment(s) still pending]
I will look into why the init container logs are not present.
@thclark your example, we can see the persistent volume claim being not present.
tejaldesai@tejaldesai-macbookpro2 init_container~/workspace/skaffold/out/skaffold dev -d gcr.io/tejal --status-check=true
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
- deployment.apps/postgres created
- service/postgres created
- configmap/postgres-config created
- persistentvolume/postgres-persistent-volume created
- persistentvolumeclaim/postgres-persistent-volume-claim created
Waiting for deployments to stabilize...
- deployment/postgres: waiting for rollout to finish: 0 of 1 updated replicas are available...
- pod/postgres-7bdc7fd9dc-8fhcz: Unschedulable: persistentvolumeclaim "postgres-persistent-volume-claim" not found
This experience is better in v1.11.0 and we will continue to improve this.
Changing the priority from p1 to p2 since we are not surfacing errors to the users.
@giladsh1 Skaffold currently only logs images that it has built. Your initContainer is using busybox and so its logs aren't shown. This is filed as issue #3712.
Closing this issue as I believe the original issue is now fixed.
@briandealwis I built my initContainers, and neither _logs_ nor _status_ are shown:
$ skaffold version
v1.12.2-k3dv3-patched # 1.12.1 for k3d v3
```console
$ skaffold dev --status-check=true
[...]
Waiting for deployments to stabilize...
with:
```yaml
[...]
initContainers:
- name: config-initializer
image: ae-dir/config-initializer
[...]
- name: config-checker
image: ae-dir/openldap-ms
[...]
containers:
- name: ldap
image: ae-dir/openldap-ms
If I'm quick and happy enough, from a different terminal, I can get:
$ kubectl logs aedir-provider-645b58dbf8-gvb8c -c config-checker
5f1ce14e lt_dlopenext failed: (back_mdb) file not found
slaptest: bad configuration file!
Only if I'm quick enough -> unusable.
Hi @blaggacao, I'm going to try to fix your issue.
For now, what I can say is that when the application is fine, I can see the logs for both initContainers and normal containers. Is it the same for you?
This could need some debugging in the diag package.
Is it the same for you?
:smile: I haven't got there yet, where the application is fine. I assume it is.
I'm going to try to fix your issue.
Thank you, much appreciated. Then I'll be empowered to respond the above question, for sure.
Most helpful comment
I was facing the some other issue in minikube. I manually deleted the cluster. Once I created a new cluster I started facing the same issue could not stabilize within 2m0s: context deadline exceeded.. because I forgot to create a secret in the cluster.
next, I created a secret using the below command, and it started working fine.
kubectl create secret generic jwt-secret --from-literal=JWT_KEY=asdf