Skaffold: Skaffold not streaming all the kaniko logs

Created on 18 Apr 2019 · 10Comments · Source: GoogleContainerTools/skaffold

Hi,

I have a issue which i opened in kaniko repository, but I think this should fall in skaffold also.

here is the link
Issue597
Please go through the above issue.

Below are the possible approach to stream all the error logs before image builder container dies.

Skaffold should have extra configuration lets say podGracePeriodSecond, which should prevent kaniko or any other image builder container to die.

How to prevent kaniko container to die:

By spinning up new container in kaniko pod which keeps checking the health kaniko container, as soon as kaniko container dies it puts other container to sleep whatever user have defined. This concept is similar to sidecar container.

Containers: []v1.Container{
    {
        Name:            constants.DefaultKanikoContainerName,
        Image:           cfg.Image,
        Args:            args,
        ImagePullPolicy: v1.PullIfNotPresent,
        Env:             []v1.EnvVar{},
        VolumeMounts:    []v1.VolumeMount{},
    },
    {
        Name:            "side-car",
        Image:           constants.DefaultBusyboxImage,
        ImagePullPolicy: v1.PullIfNotPresent,
        Env: []v1.EnvVar{{
            Name:  "GOOGLE_APPLICATION_CREDENTIALS",
            Value: "/secret/kaniko-secret",
        },
        },
        Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do   sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},
    },
}

Other approach would be put a sleep in kaniko pod after the kaniko failure(i.e changes in kaniko code) occur.

Expected behavior

User should have ability to configure pod grace termination period.

Actual behavior

Information

Skaffold version: 26
Operating system: linux
Contents of skaffold.yaml:

<paste your skaffold.yaml here>

Steps to reproduce the behavior

Download Skaffold
Do the required setup to run skaffold.
Create skaffold.yaml file and Dockerfile.
In your Dockerfile put invalid
FROM gcr.io/invalidImageNameORUnReachableImage
Do a skaffold run and your would get the following output.
build step: building [gcr.io/invalidImageNameORUnReachableImage]: kaniko build for [gcr.io/invalidImageNameORUnReachableImage]: waiting for pod to complete: pod already in terminal phase: Failed
Pod already in terminal phased.

Thank you,

arelogging builkaniko good first issue help wanted kinbug

Source

prary

Most helpful comment

Hey @prary, if the kaniko pod fails before it starts to run (say, a secret was unable to mount), then the output of kubectl describe will provide that information. This is useful since there won't be any logs in that scenario.

priyawadhwa on 8 May 2019

👍2

All 10 comments

@balopat what's your suggestion with this approach, I don't know whether we can do something similar with other image builder.

Containers: []v1.Container{
  {
      Name:            constants.DefaultKanikoContainerName,
      Image:           cfg.Image,
      Args:            args,
      ImagePullPolicy: v1.PullIfNotPresent,
      Env:             []v1.EnvVar{},
      VolumeMounts:    []v1.VolumeMount{},
  },
  {
      Name:            "side-car",
      Image:           constants.DefaultBusyboxImage,
      ImagePullPolicy: v1.PullIfNotPresent,
      Env: []v1.EnvVar{{
          Name:  "GOOGLE_APPLICATION_CREDENTIALS",
          Value: "/secret/kaniko-secret",
      },
      },
      Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do   sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},
  },
}

prary on 19 Apr 2019

@prary thanks for the issue. are you seeing this with all kaniko build failures? maybe I'm misunderstanding the issue but I'm surprised we didn't catch this earlier.

I wonder if we can use something like terminationGracePeriodSeconds in the pod spec to control the lifecycle a bit so we can grab some info from the pod before it terminates. we might be able to add this to the kaniko pod spec before we create it on the cluster.

nkubala on 2 May 2019

@nkubala I guess terminationGracePeriodSeconds works when pod is failing to respond for health check and kubernetes sends a signal to shut down but if Pod terminates itself like in present case. It would do graceful shutdown and wait for terminationGracePeriosSeconds.

In our case internal error occurs due to invalid config and container stops normally hence for pod its a normal termination, so it doesnot call for PreStop function. I have tried out terminationGracePeriodSeconds doesn't work because request for pod termination is not from kubernetes server.

Here is excerpt from the above link.

PreStop : This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others. A call to the preStop hook fails if the container is already in terminated or completed state.

Apologies if my understanding is wrong.

prary on 3 May 2019

@nkubala @balopat Right now we manually fetch the kaniko logs like in this issue where kaniko pod failed before logger was attached.

prary on 5 May 2019

Thanks for filing this.
When skaffold detects the kaniko pod failure, we should provide more information - e.g.