Hi,
I have a issue which i opened in kaniko repository, but I think this should fall in skaffold also.
here is the link
Issue597
Please go through the above issue.
Below are the possible approach to stream all the error logs before image builder container dies.
Skaffold should have extra configuration lets say podGracePeriodSecond, which should prevent kaniko or any other image builder container to die.
How to prevent kaniko container to die:
Containers: []v1.Container{
{
Name: constants.DefaultKanikoContainerName,
Image: cfg.Image,
Args: args,
ImagePullPolicy: v1.PullIfNotPresent,
Env: []v1.EnvVar{},
VolumeMounts: []v1.VolumeMount{},
},
{
Name: "side-car",
Image: constants.DefaultBusyboxImage,
ImagePullPolicy: v1.PullIfNotPresent,
Env: []v1.EnvVar{{
Name: "GOOGLE_APPLICATION_CREDENTIALS",
Value: "/secret/kaniko-secret",
},
},
Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},
},
}
User should have ability to configure pod grace termination period.
<paste your skaffold.yaml here>
Thank you,
@balopat what's your suggestion with this approach, I don't know whether we can do something similar with other image builder.
Containers: []v1.Container{ { Name: constants.DefaultKanikoContainerName, Image: cfg.Image, Args: args, ImagePullPolicy: v1.PullIfNotPresent, Env: []v1.EnvVar{}, VolumeMounts: []v1.VolumeMount{}, }, { Name: "side-car", Image: constants.DefaultBusyboxImage, ImagePullPolicy: v1.PullIfNotPresent, Env: []v1.EnvVar{{ Name: "GOOGLE_APPLICATION_CREDENTIALS", Value: "/secret/kaniko-secret", }, }, Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do sleep 1; done; sleep " + cfg.PodGracePeriodSeconds}, }, }
@prary thanks for the issue. are you seeing this with all kaniko build failures? maybe I'm misunderstanding the issue but I'm surprised we didn't catch this earlier.
I wonder if we can use something like terminationGracePeriodSeconds in the pod spec to control the lifecycle a bit so we can grab some info from the pod before it terminates. we might be able to add this to the kaniko pod spec before we create it on the cluster.
@nkubala I guess terminationGracePeriodSeconds works when pod is failing to respond for health check and kubernetes sends a signal to shut down but if Pod terminates itself like in present case. It would do graceful shutdown and wait for terminationGracePeriosSeconds.
In our case internal error occurs due to invalid config and container stops normally hence for pod its a normal termination, so it doesnot call for PreStop function. I have tried out terminationGracePeriodSeconds doesn't work because request for pod termination is not from kubernetes server.
Here is excerpt from the above link.
PreStop : This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others. A call to the preStop hook fails if the container is already in terminated or completed state.
Apologies if my understanding is wrong.
@nkubala @balopat Right now we manually fetch the kaniko logs like in this issue where kaniko pod failed before logger was attached.
Thanks for filing this.
When skaffold detects the kaniko pod failure, we should provide more information - e.g.
kubectl describe for the kaniko pod Design doc proposal #2083
* the output of `kubectl describe` for the kaniko podwondering why do we need to describe the kaniko pod?
Hey @prary, if the kaniko pod fails before it starts to run (say, a secret was unable to mount), then the output of kubectl describe will provide that information. This is useful since there won't be any logs in that scenario.
Hi @priyawadhwa @balopat
kindly review following comment.
https://github.com/GoogleContainerTools/skaffold/pull/2083/files#r287750800
Closing this issue it got fixed by https://github.com/GoogleContainerTools/skaffold/issues/2352.
Please let me know @prary if you still see the issue!
Thanks!
Most helpful comment
Hey @prary, if the kaniko pod fails before it starts to run (say, a secret was unable to mount), then the output of
kubectl describewill provide that information. This is useful since there won't be any logs in that scenario.