Skaffold: Skaffold not streaming all the kaniko logs

Created on 18 Apr 2019  路  10Comments  路  Source: GoogleContainerTools/skaffold

Hi,

I have a issue which i opened in kaniko repository, but I think this should fall in skaffold also.

here is the link
Issue597
Please go through the above issue.

Below are the possible approach to stream all the error logs before image builder container dies.

Skaffold should have extra configuration lets say podGracePeriodSecond, which should prevent kaniko or any other image builder container to die.

How to prevent kaniko container to die:

  1. By spinning up new container in kaniko pod which keeps checking the health kaniko container, as soon as kaniko container dies it puts other container to sleep whatever user have defined. This concept is similar to sidecar container.
Containers: []v1.Container{
    {
        Name:            constants.DefaultKanikoContainerName,
        Image:           cfg.Image,
        Args:            args,
        ImagePullPolicy: v1.PullIfNotPresent,
        Env:             []v1.EnvVar{},
        VolumeMounts:    []v1.VolumeMount{},
    },
    {
        Name:            "side-car",
        Image:           constants.DefaultBusyboxImage,
        ImagePullPolicy: v1.PullIfNotPresent,
        Env: []v1.EnvVar{{
            Name:  "GOOGLE_APPLICATION_CREDENTIALS",
            Value: "/secret/kaniko-secret",
        },
        },
        Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do   sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},
    },
}
  1. Other approach would be put a sleep in kaniko pod after the kaniko failure(i.e changes in kaniko code) occur.

Expected behavior

User should have ability to configure pod grace termination period.

Actual behavior

Information

  • Skaffold version: 26
  • Operating system: linux
  • Contents of skaffold.yaml:
<paste your skaffold.yaml here>

Steps to reproduce the behavior

  1. Download Skaffold
  2. Do the required setup to run skaffold.
  3. Create skaffold.yaml file and Dockerfile.
  4. In your Dockerfile put invalid
  5. FROM gcr.io/invalidImageNameORUnReachableImage
  6. Do a skaffold run and your would get the following output.
  7. build step: building [gcr.io/invalidImageNameORUnReachableImage]: kaniko build for [gcr.io/invalidImageNameORUnReachableImage]: waiting for pod to complete: pod already in terminal phase: Failed
  8. Pod already in terminal phased.

Thank you,

arelogging builkaniko good first issue help wanted kinbug

Most helpful comment

Hey @prary, if the kaniko pod fails before it starts to run (say, a secret was unable to mount), then the output of kubectl describe will provide that information. This is useful since there won't be any logs in that scenario.

All 10 comments

@balopat what's your suggestion with this approach, I don't know whether we can do something similar with other image builder.

Containers: []v1.Container{
  {
      Name:            constants.DefaultKanikoContainerName,
      Image:           cfg.Image,
      Args:            args,
      ImagePullPolicy: v1.PullIfNotPresent,
      Env:             []v1.EnvVar{},
      VolumeMounts:    []v1.VolumeMount{},
  },
  {
      Name:            "side-car",
      Image:           constants.DefaultBusyboxImage,
      ImagePullPolicy: v1.PullIfNotPresent,
      Env: []v1.EnvVar{{
          Name:  "GOOGLE_APPLICATION_CREDENTIALS",
          Value: "/secret/kaniko-secret",
      },
      },
      Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do   sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},
  },
}

@prary thanks for the issue. are you seeing this with all kaniko build failures? maybe I'm misunderstanding the issue but I'm surprised we didn't catch this earlier.

I wonder if we can use something like terminationGracePeriodSeconds in the pod spec to control the lifecycle a bit so we can grab some info from the pod before it terminates. we might be able to add this to the kaniko pod spec before we create it on the cluster.

@nkubala I guess terminationGracePeriodSeconds works when pod is failing to respond for health check and kubernetes sends a signal to shut down but if Pod terminates itself like in present case. It would do graceful shutdown and wait for terminationGracePeriosSeconds.

In our case internal error occurs due to invalid config and container stops normally hence for pod its a normal termination, so it doesnot call for PreStop function. I have tried out terminationGracePeriodSeconds doesn't work because request for pod termination is not from kubernetes server.

Here is excerpt from the above link.

PreStop : This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others. A call to the preStop hook fails if the container is already in terminated or completed state.

Apologies if my understanding is wrong.

@nkubala @balopat Right now we manually fetch the kaniko logs like in this issue where kaniko pod failed before logger was attached.

Thanks for filing this.
When skaffold detects the kaniko pod failure, we should provide more information - e.g.

  • pod logs if they exist
  • the output of kubectl describe for the kaniko pod

Design doc proposal #2083

* the output of `kubectl describe` for the kaniko pod

wondering why do we need to describe the kaniko pod?

Hey @prary, if the kaniko pod fails before it starts to run (say, a secret was unable to mount), then the output of kubectl describe will provide that information. This is useful since there won't be any logs in that scenario.

Hi @priyawadhwa @balopat
kindly review following comment.
https://github.com/GoogleContainerTools/skaffold/pull/2083/files#r287750800

Closing this issue it got fixed by https://github.com/GoogleContainerTools/skaffold/issues/2352.

Please let me know @prary if you still see the issue!

Thanks!

Was this page helpful?
0 / 5 - 0 ratings