Flux: Flux may be leaking files in /tmp

Created on 27 Dec 2019  ·  32Comments  ·  Source: fluxcd/flux

Describe the bug
Flux fills up /tmp

To Reproduce
Large monorepo. Flux runs fine for a few days, then slowly leaks disk and then ultimately runs out of space. I've mounted a PVC at /tmp so that it doesn't consume disk on the node itself.

Expected behavior
Fluxcd should probably clean-up after itself?

Logs

ts=2019-12-27T00:58:18.491235642Z caller=images.go:23 component=sync-loop error="getting unlocked automated resources: mkdir /tmp/flux-working546801009: no space left on device"

Additional context
Add any other context about the problem here, e.g

  • Flux version: 1.17.0
bug

Most helpful comment

Fantastic! Thank _you_ @groodt for your persistence!

All 32 comments

Can you list what's in tmp? How many directories? How much does each one
occupy? Can you show us the rest of the logs?

On Fri, Dec 27, 2019, 02:03 Greg Roodt notifications@github.com wrote:

Describe the bug
Flux fills up /tmp

To Reproduce
Large monorepo. Flux runs fine for a few days, then slowly leaks disk and
then ultimately runs out of space. I've mounted a PVC at /tmp so that it
doesn't consume disk on the node itself.

Expected behavior
Fluxcd should probably clean-up after itself?

Logs

ts=2019-12-27T00:58:18.491235642Z caller=images.go:23 component=sync-loop error="getting unlocked automated resources: mkdir /tmp/flux-working546801009: no space left on device"

Additional context
Add any other context about the problem here, e.g

  • Flux version: 1.17.0


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fluxcd/flux/issues/2713?email_source=notifications&email_token=AASA4JFCJALQMPJUZHL2XALQ2VH6PA5CNFSM4J7S3772YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ICZBBOA,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AASA4JAXHURU63K7IDTQL2LQ2VH6PANCNFSM4J7S377Q
.

I've removed the PVC now and letting it grow to fill up the host volume until it gets evicted on disk pressure.

This is what I'm currently seeing:

bash-5.0# cd /tmp
bash-5.0# ls -la
total 24
drwxrwxrwt    1 root     root           210 Dec 27 23:49 .
drwxr-xr-x    1 root     root            62 Dec 27 03:55 ..
drwx------    7 root     root           156 Dec 27 07:08 flux-gitclone589837000
drwx------   22 root     root          4096 Dec 27 03:58 flux-working047812410
drwx------   22 root     root          4096 Dec 27 07:09 flux-working102335891
drwx------   22 root     root          4096 Dec 27 06:02 flux-working187868532
drwx------   22 root     root          4096 Dec 27 04:22 flux-working298972273
drwx------   22 root     root          4096 Dec 27 04:12 flux-working660156126
drwx------   22 root     root          4096 Dec 27 04:34 flux-working917047553
bash-5.0# du -sh .
4.5G    .
bash-5.0# du -sh *
627.2M  flux-gitclone589837000
1.3G    flux-working047812410
1.3G    flux-working102335891
1.3G    flux-working187868532
1.3G    flux-working298972273
1.3G    flux-working660156126
1.3G    flux-working917047553
bash-5.0#

Here's an example of an eviction message from before I tried mounting a PVC at /tmp

Name:           flux-5785f78767-w29wb
Namespace:      flux
Priority:       0
Node:           ip-10-0-78-34.ec2.internal/
Start Time:     Mon, 23 Dec 2019 13:57:48 +1100
Labels:         name=flux
                pod-template-hash=5785f78767
Annotations:    kubernetes.io/psp: eks.privileged
                prometheus.io/port: 3031
Status:         Failed
Reason:         Evicted
Message:        Pod The node had condition: [DiskPressure].

When I saw the evictions happening, I thought that I could add a large PVC to give flux longer runway, but instead of evictions, it eventually runs into no space left on device. I unfortunately don't have more logs to hand of this happening. I can add back a PVC to reproduce if necessary. It might take a while to happen though, as it only seems to happen after running for some time and after handling some updates.

It may not be a leak (Flux creates a main clone and then an additional
clone for each operation). If the clones end up disappearing eventually, it
could simply have happened during an active period.

On Sat, Dec 28, 2019, 01:01 Greg Roodt notifications@github.com wrote:

Here's an example of an eviction message from before I tried mounting a
PVC at /tmp

Name: flux-5785f78767-w29wb
Namespace: flux
Priority: 0
Node: ip-10-0-78-34.ec2.internal/
Start Time: Mon, 23 Dec 2019 13:57:48 +1100
Labels: name=flux
pod-template-hash=5785f78767
Annotations: kubernetes.io/psp: eks.privileged
prometheus.io/port: 3031
Status: Failed
Reason: Evicted
Message: Pod The node had condition: [DiskPressure].

When I saw the evictions happening, I thought that I could add a large PVC
to give flux longer runway, but instead of evictions, it eventually runs
into no space left on device. I unfortunately don't have more logs to
hand of this happening. I can add back a PVC to reproduce if necessary. It
might take a while to happen though, as it only seems to happen after
running for some time and after handling some updates.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fluxcd/flux/issues/2713?email_source=notifications&email_token=AASA4JAMM2LV3QS5WAXP2NDQ22JOLA5CNFSM4J7S3772YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHX5OHA#issuecomment-569366300,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AASA4JHQ33YJTDJK6ECHFTDQ22JOLANCNFSM4J7S377Q
.

Ok, this implies a few things. Can you please confirm for my understanding?

  1. Flux can only use a host filesystem (not possible to run fluxcd on a cluster of nodes without disks) because it relies on automated clean-up of /tmp outside fluxcd.
  2. It relies on the /tmp clean-up semantics of the underlying nodes. This varies by distro. Sometimes only on boot, sometimes using tmpwatch etc.
  3. Before sizing the disks on the kubernetes nodes, one needs to consider the size of the repo being cloned and the frequency of changes?

Flux cleans after itself, but it will create a local clone for each
operation it applies on the repository.

On Sat, Dec 28, 2019, 22:31 Greg Roodt notifications@github.com wrote:

Ok, this implies a few things. Can you please confirm for my understanding?

  1. Flux can only use a host filesystem (not possible to run fluxcd on
    a cluster of nodes without disks) because it relies on automated clean-up
    of /tmp outside fluxcd.
  2. It relies on the /tmp clean-up semantics of the underlying nodes.
    This varies by distro. Sometimes only on boot, sometimes using tmpwatch etc.
  3. Before sizing the disks on the kubernetes nodes, one needs to
    consider the size of the repo being cloned and the frequency of changes?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fluxcd/flux/issues/2713?email_source=notifications&email_token=AASA4JGZGPLPTMQY52MYUWDQ27AU3A5CNFSM4J7S3772YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYSRGA#issuecomment-569452696,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AASA4JH7XZWUSC2RJJF5KH3Q27AU3ANCNFSM4J7S377Q
.

About (3), in the case of a very big repo I guess that's the case. It's not a problem with usual repos.

On Sat, Dec 28, 2019, 22:38 Alfonso Acosta fons@weave.works wrote:

Flux cleans after itself, but it will create a local clone for each
operation it applies on the repository.

On Sat, Dec 28, 2019, 22:31 Greg Roodt notifications@github.com wrote:

Ok, this implies a few things. Can you please confirm for my
understanding?

  1. Flux can only use a host filesystem (not possible to run fluxcd on
    a cluster of nodes without disks) because it relies on automated clean-up
    of /tmp outside fluxcd.
  2. It relies on the /tmp clean-up semantics of the underlying nodes.
    This varies by distro. Sometimes only on boot, sometimes using tmpwatch etc.
  3. Before sizing the disks on the kubernetes nodes, one needs to
    consider the size of the repo being cloned and the frequency of changes?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fluxcd/flux/issues/2713?email_source=notifications&email_token=AASA4JGZGPLPTMQY52MYUWDQ27AU3A5CNFSM4J7S3772YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYSRGA#issuecomment-569452696,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AASA4JH7XZWUSC2RJJF5KH3Q27AU3ANCNFSM4J7S377Q
.

Thanks for the replies.

Flux cleans after itself, but it will create a local clone for each operation it applies on the repository.

Ok, this sounds good. Do you have more information about how frequently clean-up is happening or if there is a mechanism to configure it? I looked briefly at the code and nothing jumped out at me. If fluxcd is cleaning up after itself predictably, then it is possible to mount a PVC to /tmp as long as it is appropriately sized for the size of the repo (+ future growth of the repo) and the number of copies that flux keeps around. This would make item 1 and 2 not that much of an issue.

About (3), in the case of a very big repo I guess that's the case. It's not usually a problem with usual repos.

For the monorepo I've got my hands on at the moment, after git gc I'm seeing ~500MB. Surprised me too. It seems that Terraform state files are larger than one might think. There isn't much that I can do really, apart from convincing for a new repo, which is a large ask for something running out of disk space. I would be happy to throw 200-250GB of EBS at /tmp for example if I could be relatively sure that I won't run out of space too quickly. At the moment, it seems like every change is taking ~1GB and not reclaiming. I don't know why it's not ~500MB, probably gc compacts it a lot locally. So after ~200 changes, flux would stop working, which isn't great. Another option I guess would be to use a PVC and make sure that the pod gets killed every so often somehow. Any ideas?

Another update on this, it happened again. The pod goes into CrashLookBackOff so it's not possible to get onto the container to see the contents of /tmp.

Ok, this sounds good. Do you have more information about how frequently clean-up is happening or if there is a mechanism to configure it?

Flux has a main _mirror_ clone which periodically fetches from the repository pointed to by --git-url. This mirror is never removed.

On top of that, Flux creates a working clone from the _mirror_ clone every time it performs an operation (such as automatically updating an image version). The working clone should be removed as soon as the operation is finished.

For the monorepo I've got my hands on at the moment, after git gc I'm seeing ~500MB. Surprised me too.

I would need to know more details about the repo's activity but I would be surprised you would have an issue after giving Flux say 20 GB.

Another update on this, it happened again. The pod goes into CrashLookBackOff so it's not possible to get onto the container to see the contents of /tmp.

Could you leave a script running which periodically prints out a summary of what's in tmp?

I am also interested in the flux logs. Leaks may be happening due to some error during the existence of the working clone, leading to a codepath which doesn't clean the clone properly.

We've run out of space again and had to restart flux. After running for approximately 24h, this is what we see:

❯ kubectl -n flux get po
NAME                         READY     STATUS    RESTARTS   AGE
flux-7499cd866f-kgsqh        1/1       Running   0          23h

exec into pod:

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G      8.5G     40.6G  17% /tmp
bash-5.0# ls -la /tmp
total 68
drwxr-xr-x   15 root     root          4096 Jan 25 03:24 .
drwxr-xr-x    1 root     root            51 Jan 24 03:23 ..
drwx------    7 root     root          4096 Jan 24 06:45 flux-gitclone162428788
drwx------   23 root     root          4096 Jan 24 04:51 flux-working014604578
drwx------   23 root     root          4096 Jan 24 06:45 flux-working247434560
drwx------   23 root     root          4096 Jan 24 05:00 flux-working277166046
drwx------   23 root     root          4096 Jan 24 03:26 flux-working464685254
drwx------   23 root     root          4096 Jan 24 04:31 flux-working596953207
drwx------   23 root     root          4096 Jan 24 04:00 flux-working617540452
drwx------   23 root     root          4096 Jan 24 06:36 flux-working740637988
drwx------   23 root     root          4096 Jan 24 06:08 flux-working746937199
drwx------   23 root     root          4096 Jan 24 03:36 flux-working891134660
drwx------   23 root     root          4096 Jan 24 05:42 flux-working901227618
drwx------   23 root     root          4096 Jan 24 03:36 flux-working996116118
drwx------    2 root     root         16384 Jan 24 03:23 lost+found
bash-5.0# du -sh /tmp/*
669.7M  /tmp/flux-gitclone162428788
1.4G    /tmp/flux-working014604578
1.4G    /tmp/flux-working247434560
1.4G    /tmp/flux-working277166046
1.4G    /tmp/flux-working464685254
1.4G    /tmp/flux-working596953207
1.4G    /tmp/flux-working617540452
1.4G    /tmp/flux-working740637988
1.4G    /tmp/flux-working746937199
1.4G    /tmp/flux-working891134660
1.4G    /tmp/flux-working901227618
1.4G    /tmp/flux-working996116118
16.0K   /tmp/lost+found

Logs: I'll email you to your personal email due to the verbosity and potentially sensitive nature, but I don't see any indications of errors in the logs.

We have a large monorepo, with some large files in the history, not too much we can easily do about that, but it does seem that with every change it consumes more space.

~19h since previous check.

Same pod.

NAME                         READY     STATUS    RESTARTS   AGE
flux-7499cd866f-kgsqh        1/1       Running   0          1d

Exec into Pod:

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G      8.5G     40.6G  17% /tmp

bash-5.0# ls -la /tmp
total 72
drwxr-xr-x   16 root     root          4096 Jan 25 22:58 .
drwxr-xr-x    1 root     root            51 Jan 24 03:23 ..
drwx------    7 root     root          4096 Jan 24 06:45 flux-gitclone162428788
drwx------   23 root     root          4096 Jan 24 04:51 flux-working014604578
drwx------   23 root     root          4096 Jan 24 06:45 flux-working247434560
drwx------   23 root     root          4096 Jan 24 05:00 flux-working277166046
drwx------    7 root     root          4096 Jan 25 22:58 flux-working409479553
drwx------   23 root     root          4096 Jan 24 03:26 flux-working464685254
drwx------   23 root     root          4096 Jan 24 04:31 flux-working596953207
drwx------   23 root     root          4096 Jan 24 04:00 flux-working617540452
drwx------   23 root     root          4096 Jan 24 06:36 flux-working740637988
drwx------   23 root     root          4096 Jan 24 06:08 flux-working746937199
drwx------   23 root     root          4096 Jan 24 03:36 flux-working891134660
drwx------   23 root     root          4096 Jan 24 05:42 flux-working901227618
drwx------   23 root     root          4096 Jan 24 03:36 flux-working996116118
drwx------    2 root     root         16384 Jan 24 03:23 lost+found

bash-5.0# du -sh /tmp/*
669.7M  /tmp/flux-gitclone162428788
1.4G    /tmp/flux-working014604578
1.4G    /tmp/flux-working247434560
1.4G    /tmp/flux-working277166046
1.4G    /tmp/flux-working464685254
1.4G    /tmp/flux-working596953207
1.4G    /tmp/flux-working617540452
1.4G    /tmp/flux-working740637988
1.4G    /tmp/flux-working746937199
1.4G    /tmp/flux-working891134660
1.4G    /tmp/flux-working901227618
1.4G    /tmp/flux-working996116118
16.0K   /tmp/lost+found

So far, it appears nothing has leaked in the last 19h, while there have also been no automated updates to the cluster.

Next time, I'll run a test with a trivial change or addition of a resource to kick flux into action to see the impact on /tmp.

~24h since previous check. Some evidence of leakage. There have been no automated updates to resources by flux.

Same pod.

NAME                         READY     STATUS    RESTARTS   AGE
flux-7499cd866f-kgsqh        1/1       Running   0          2d

Exec into Pod:

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G      9.2G     39.8G  19% /tmp
bash-5.0# ls -la /tmp
total 76
drwxr-xr-x   17 root     root          4096 Jan 26 23:40 .
drwxr-xr-x    1 root     root            51 Jan 24 03:23 ..
drwx------    7 root     root          4096 Jan 25 23:12 flux-gitclone162428788
drwx------   23 root     root          4096 Jan 24 04:51 flux-working014604578
drwx------   23 root     root          4096 Jan 24 06:45 flux-working247434560
drwx------   23 root     root          4096 Jan 24 05:00 flux-working277166046
drwx------   23 root     root          4096 Jan 24 03:26 flux-working464685254
drwx------   23 root     root          4096 Jan 25 23:12 flux-working575359694
drwx------   23 root     root          4096 Jan 24 04:31 flux-working596953207
drwx------   23 root     root          4096 Jan 24 04:00 flux-working617540452
drwx------    7 root     root          4096 Jan 26 23:40 flux-working624824473
drwx------   23 root     root          4096 Jan 24 06:36 flux-working740637988
drwx------   23 root     root          4096 Jan 24 06:08 flux-working746937199
drwx------   23 root     root          4096 Jan 24 03:36 flux-working891134660
drwx------   23 root     root          4096 Jan 24 05:42 flux-working901227618
drwx------   23 root     root          4096 Jan 24 03:36 flux-working996116118
drwx------    2 root     root         16384 Jan 24 03:23 lost+found
bash-5.0# du -sh /tmp/*
669.8M  /tmp/flux-gitclone162428788
1.4G    /tmp/flux-working014604578
1.4G    /tmp/flux-working247434560
1.4G    /tmp/flux-working277166046
1.4G    /tmp/flux-working464685254
1.4G    /tmp/flux-working575359694
1.4G    /tmp/flux-working596953207
1.4G    /tmp/flux-working617540452
1.4G    /tmp/flux-working740637988
1.4G    /tmp/flux-working746937199
1.4G    /tmp/flux-working891134660
1.4G    /tmp/flux-working901227618
1.4G    /tmp/flux-working996116118
16.0K   /tmp/lost+found

/tmp/flux-working575359694 is new since the previous listing.

I'm now going to have flux create a namespace resource to see what happens to the listings and available space.

After flux applied a resource change (created a namespace) the contents of /tmp is as follows:

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G      9.9G     39.1G  20% /tmp
bash-5.0# ls -la /tmp
total 76
drwxr-xr-x   17 root     root          4096 Jan 27 00:09 .
drwxr-xr-x    1 root     root            51 Jan 24 03:23 ..
drwx------    7 root     root          4096 Jan 27 00:07 flux-gitclone162428788
drwx------   23 root     root          4096 Jan 24 04:51 flux-working014604578
drwx------   23 root     root          4096 Jan 27 00:07 flux-working076747204
drwx------   23 root     root          4096 Jan 24 06:45 flux-working247434560
drwx------   23 root     root          4096 Jan 24 05:00 flux-working277166046
drwx------   23 root     root          4096 Jan 24 03:26 flux-working464685254
drwx------   23 root     root          4096 Jan 25 23:12 flux-working575359694
drwx------   23 root     root          4096 Jan 24 04:31 flux-working596953207
drwx------   23 root     root          4096 Jan 24 04:00 flux-working617540452
drwx------   23 root     root          4096 Jan 24 06:36 flux-working740637988
drwx------   23 root     root          4096 Jan 24 06:08 flux-working746937199
drwx------   23 root     root          4096 Jan 24 03:36 flux-working891134660
drwx------   23 root     root          4096 Jan 24 05:42 flux-working901227618
drwx------   23 root     root          4096 Jan 24 03:36 flux-working996116118
drwx------    2 root     root         16384 Jan 24 03:23 lost+found
bash-5.0# du -sh /tmp/*
670.0M  /tmp/flux-gitclone162428788
1.4G    /tmp/flux-working014604578
1.4G    /tmp/flux-working076747204
1.4G    /tmp/flux-working247434560
1.4G    /tmp/flux-working277166046
1.4G    /tmp/flux-working464685254
1.4G    /tmp/flux-working575359694
1.4G    /tmp/flux-working596953207
1.4G    /tmp/flux-working617540452
1.4G    /tmp/flux-working740637988
1.4G    /tmp/flux-working746937199
1.4G    /tmp/flux-working891134660
1.4G    /tmp/flux-working901227618
1.4G    /tmp/flux-working996116118
16.0K   /tmp/lost+found

This time /tmp/flux-working076747204 appears to be a new leak. I've also captured the logs.

Certainly appears that whenever flux makes a change, there are directories leaking.

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G     10.7G     38.4G  22% /tmp
bash-5.0# ls -la /tmp
total 84
drwxr-xr-x   19 root     root          4096 Jan 27 00:24 .
drwxr-xr-x    1 root     root            51 Jan 24 03:23 ..
drwx------    7 root     root          4096 Jan 27 00:21 flux-gitclone162428788
drwx------   23 root     root          4096 Jan 24 04:51 flux-working014604578
drwx------    7 root     root          4096 Jan 27 00:24 flux-working063421434
drwx------   23 root     root          4096 Jan 27 00:07 flux-working076747204
drwx------   23 root     root          4096 Jan 24 06:45 flux-working247434560
drwx------   23 root     root          4096 Jan 24 05:00 flux-working277166046
drwx------   23 root     root          4096 Jan 24 03:26 flux-working464685254
drwx------   23 root     root          4096 Jan 25 23:12 flux-working575359694
drwx------   23 root     root          4096 Jan 24 04:31 flux-working596953207
drwx------   23 root     root          4096 Jan 24 04:00 flux-working617540452
drwx------   23 root     root          4096 Jan 24 06:36 flux-working740637988
drwx------   23 root     root          4096 Jan 24 06:08 flux-working746937199
drwx------   23 root     root          4096 Jan 24 03:36 flux-working891134660
drwx------   23 root     root          4096 Jan 24 05:42 flux-working901227618
drwx------   23 root     root          4096 Jan 27 00:21 flux-working941249421
drwx------   23 root     root          4096 Jan 24 03:36 flux-working996116118
drwx------    2 root     root         16384 Jan 24 03:23 lost+found
bash-5.0# du -sh /tmp/*
670.1M  /tmp/flux-gitclone162428788
1.4G    /tmp/flux-working014604578
1.4G    /tmp/flux-working063421434
1.4G    /tmp/flux-working076747204
1.4G    /tmp/flux-working247434560
1.4G    /tmp/flux-working277166046
1.4G    /tmp/flux-working464685254
1.4G    /tmp/flux-working575359694
1.4G    /tmp/flux-working596953207
1.4G    /tmp/flux-working617540452
1.4G    /tmp/flux-working740637988
1.4G    /tmp/flux-working746937199
1.4G    /tmp/flux-working891134660
1.4G    /tmp/flux-working901227618
1.4G    /tmp/flux-working941249421
1.4G    /tmp/flux-working996116118
16.0K   /tmp/lost+found

This time /tmp/flux-working941249421 and /tmp/tmp/flux-working063421434 are new leaks. There were 2 changes applied by flux automatically and 2 newly leaked directories. Seems too much of a coincidence to be unrelated.

I've now run this on the pod to delete directories older than 5h:
find /tmp -type d -name 'flux-working*' -maxdepth 1 -mmin +300 -exec rm -rf {} \;

The listing is now:

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G      2.8G     46.2G   6% /tmp
bash-5.0# ls -la /tmp
total 36
drwxr-xr-x    7 root     root          4096 Jan 27 00:58 .
drwxr-xr-x    1 root     root            51 Jan 24 03:23 ..
drwx------    7 root     root          4096 Jan 27 00:24 flux-gitclone162428788
drwx------   23 root     root          4096 Jan 27 00:07 flux-working076747204
drwx------   23 root     root          4096 Jan 27 00:24 flux-working680104764
drwx------   23 root     root          4096 Jan 27 00:21 flux-working941249421
drwx------    2 root     root         16384 Jan 24 03:23 lost+found
bash-5.0# du -sh /tmp/*
670.1M  /tmp/flux-gitclone162428788
1.4G    /tmp/flux-working076747204
1.4G    /tmp/flux-working680104764
1.4G    /tmp/flux-working941249421
16.0K   /tmp/lost+found

I will keep an eye on this the next few days for further leaks and look into ways of automatically cleaning up old files.

Great. I will also take a look at the logs you sent me and I will go through the working-directory code looking for leaks.

If you could give further information about the automatic updates which you think caused the last two leaks (e.g. logs and nature of them) that would be helpful

I think I may have fixed this in https://github.com/fluxcd/flux/pull/2788

@groodt would you mind trying out container image 2opremio/flux:more-robust-clone-cleanup-3c3c0c9d to confirm?

Amazing! Thanks!

Trying it out now and will keep an eye on it over the next few days.

❯ kubectl -n flux get deployment flux -o=jsonpath='{$.spec.template.spec.containers[:1].image}'
docker.io/2opremio/flux:more-robust-clone-cleanup-3c3c0c9d

bash-5.0# df -ah /tmp
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme1n1             49.1G    717.9M     48.4G   1% /tmp

bash-5.0# ls -la /tmp
total 24
drwxr-xr-x    4 root     root          4096 Jan 27 07:04 .
drwxr-xr-x    1 root     root            51 Jan 27 06:56 ..
drwx------    7 root     root          4096 Jan 27 06:59 flux-gitclone379546054
drwx------    2 root     root         16384 Jan 24 03:23 lost+found

bash-5.0# du -sh /tmp/*
665.9M  /tmp/flux-gitclone379546054
16.0K   /tmp/lost+found

Reopening until I get a confirmation from @groodt that we fixed the problem

I believe I may be seeing a similar thing. This pod has been running for a while:

/tmp # du -sh .
7.2G    .
/tmp # ls -l | wc -l
188

Each of the working copies is ~144M.

Once this happens Flux fails to apply any configs, logging the following quite often:

{
caller: sync.go:548
cmd: kubectl apply -f -
err: running kubectl: 
errStatus: Error
method: Sync
output: took: 256.059441ms
ts: 2020-01-27T17:53:47.502774212Z
}

Every once in a while we'll notice a pod dieing with OOM, but checking the usage stats it's memory usage is nowhere near it's configured limits.

K8s events:

OOM    k8s_flux_flux-d99bd9cf7-j8zsp_default_2473c9de-3276-11ea-baf8-064581248a72_0
OOM    k8s_flux_flux-d99bd9cf7-j8zsp_default_2473c9de-3276-11ea-baf8-064581248a72_0
DIE    k8s_flux_flux-d99bd9cf7-j8zsp_default_2473c9de-3276-11ea-baf8-064581…

Image: docker.io/fluxcd/flux:1.17.0

@bheesham have you tried 2opremio/flux:more-robust-clone-cleanup-3c3c0c9d?

@stefanprodan I have not. I'll try it out on our daily environments and report back.

@2opremio Thanks so much.

I believe the problem we've been seeing has been fixed. I've tested the 2opremio/flux:more-robust-clone-cleanup-3c3c0c9d image overnight (~14h) and saw no leaks.

I then had flux make 2 Kubernetes resource changes (adding a namespace and then removing a namespace, not in the same commit) and while watching both changes apply, I saw the /tmp/flux-working* directories get created and then removed.

I think we can consider this resolved with #2788.

Fantastic! Thank _you_ @groodt for your persistence!

Similar results here with the image you provided (though only have data for about 2 hours).

Thank you!

I have the same problem. Currently flux is using about 500GB of space with thousands of flux-working directories.

I am using v1.20.2 which I guess should include the fix?

Is it cool if I just delete these for now?

Yeah so I am pretty sure that this bug is still in the latest version 1.21.0

IMG_7285

can we reopen this or should I create a new issue?

I confirm @magnusja. That bug seems to be back in 1.21.0.
We had that problem since ~one week. Cleaning of /tmp was not done.
Flux was upgraded on our clusters from 1.19.0 to 1.21.0 ~two weeks ago.
Rollback to 1.19.0 solved the issue.

44K     ./diff/tmp/flux-working905991546/data-keeper/data-keeper-image/tests/integration_tests
220K    ./diff/tmp/flux-working905991546/data-keeper/data-keeper-image/tests
500K    ./diff/tmp/flux-working905991546/data-keeper/data-keeper-image
584K    ./diff/tmp/flux-working905991546/data-keeper
36K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper/charts
12K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper/templates/model-keeper
16K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper/templates/mlflow
8,0K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper/templates/tests
48K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper/templates
108K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper
8,0K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/openapi/paths
8,0K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/openapi/components
28K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/openapi
8,0K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/src/model_keeper/api/endpoints
20K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/src/model_keeper/api
52K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/src/model_keeper
56K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/src
24K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/mlflow-plugin/lumi_mlflow
96K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/mlflow-plugin
20K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/docker
8,0K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/tests/unit_tests
20K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/tests/integration_tests
36K     ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image/tests
332K    ./diff/tmp/flux-working905991546/model-keeper/model-keeper-image
444K    ./diff/tmp/flux-working905991546/model-keeper
28K     ./diff/tmp/flux-working905991546/devbox
36K     ./diff/tmp/flux-working905991546/gateway/gateway/charts
32K     ./diff/tmp/flux-working905991546/gateway/gateway/templates
92K     ./diff/tmp/flux-working905991546/gateway/gateway
28K     ./diff/tmp/flux-working905991546/gateway/gateway-image/src/gateway/api/routers
32K     ./diff/tmp/flux-working905991546/gateway/gateway-image/src/gateway/api
88K     ./diff/tmp/flux-working905991546/gateway/gateway-image/src/gateway
92K     ./diff/tmp/flux-working905991546/gateway/gateway-image/src
16K     ./diff/tmp/flux-working905991546/gateway/gateway-image/docker
16K     ./diff/tmp/flux-working905991546/gateway/gateway-image/tests/mock_integration_tests
16K     ./diff/tmp/flux-working905991546/gateway/gateway-image/tests/unit_tests
22M     ./diff/tmp/flux-working905991546/gateway/gateway-image/tests/assets/data
22M     ./diff/tmp/flux-working905991546/gateway/gateway-image/tests/assets
104K    ./diff/tmp/flux-working905991546/gateway/gateway-image/tests/integration_tests
22M     ./diff/tmp/flux-working905991546/gateway/gateway-image/tests
23M     ./diff/tmp/flux-working905991546/gateway/gateway-image
8,0K    ./diff/tmp/flux-working905991546/gateway/ocr-worker-image/docker
52K     ./diff/tmp/flux-working905991546/gateway/ocr-worker-image
23M     ./diff/tmp/flux-working905991546/gateway
16K     ./diff/tmp/flux-working905991546/training/src/training/logger
12K     ./diff/tmp/flux-working905991546/training/src/training/datasource
44K     ./diff/tmp/flux-working905991546/training/src/training
56K     ./diff/tmp/flux-working905991546/training/src
16K     ./diff/tmp/flux-working905991546/training/docker
12K     ./diff/tmp/flux-working905991546/training/tests/unit_tests
8,0K    ./diff/tmp/flux-working905991546/training/tests/assets/scripts
1,5M    ./diff/tmp/flux-working905991546/training/tests/assets/data
1,5M    ./diff/tmp/flux-working905991546/training/tests/assets
44K     ./diff/tmp/flux-working905991546/training/tests/integration_tests/logger
60K     ./diff/tmp/flux-working905991546/training/tests/integration_tests
1,6M    ./diff/tmp/flux-working905991546/training/tests
164K    ./diff/tmp/flux-working905991546/training/deps
2,1M    ./diff/tmp/flux-working905991546/training
16K     ./diff/tmp/flux-working905991546/inference/inference/templates
36K     ./diff/tmp/flux-working905991546/inference/inference
12K     ./diff/tmp/flux-working905991546/inference/inference-image/src/inference/api/routers
16K     ./diff/tmp/flux-working905991546/inference/inference-image/src/inference/api
36K     ./diff/tmp/flux-working905991546/inference/inference-image/src/inference
40K     ./diff/tmp/flux-working905991546/inference/inference-image/src
20K     ./diff/tmp/flux-working905991546/inference/inference-image/docker
41M     ./diff/tmp/flux-working905991546/inference/inference-image/tests/assets/classification
81M     ./diff/tmp/flux-working905991546/inference/inference-image/tests/assets/exported_color_bw
121M    ./diff/tmp/flux-working905991546/inference/inference-image/tests/assets
8,0K    ./diff/tmp/flux-working905991546/inference/inference-image/tests/integration_tests
121M    ./diff/tmp/flux-working905991546/inference/inference-image/tests
164K    ./diff/tmp/flux-working905991546/inference/inference-image/deps
121M    ./diff/tmp/flux-working905991546/inference/inference-image
121M    ./diff/tmp/flux-working905991546/inference
8,0K    ./diff/tmp/flux-working905991546/docs/docker
1,3M    ./diff/tmp/flux-working905991546/docs/docs/assets/images
1,4M    ./diff/tmp/flux-working905991546/docs/docs/assets
3,0M    ./diff/tmp/flux-working905991546/docs/docs
3,0M    ./diff/tmp/flux-working905991546/docs
24K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator-image/src/inference_orchestrator
28K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator-image/src
8,0K    ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator-image/docker
12K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator-image/tests
72K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator-image
16K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-operator-image/src
8,0K    ./diff/tmp/flux-working905991546/inference-orchestrator/inference-operator-image/docker
8,0K    ./diff/tmp/flux-working905991546/inference-orchestrator/inference-operator-image/tests
76K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-operator-image
16K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator/templates/inference-operator
20K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator/templates/inference-orchestrator
48K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator/templates
68K     ./diff/tmp/flux-working905991546/inference-orchestrator/inference-orchestrator
220K    ./diff/tmp/flux-working905991546/inference-orchestrator
16K     ./diff/tmp/flux-working905991546/cluster
12K     ./diff/tmp/flux-working905991546/training-orchestrator/training-operator-image/src
8,0K    ./diff/tmp/flux-working905991546/training-orchestrator/training-operator-image/docker
64K     ./diff/tmp/flux-working905991546/training-orchestrator/training-operator-image
48K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator-image/src/training_orchestrator
52K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator-image/src
8,0K    ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator-image/docker
96K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator-image/tests
180K    ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator-image
16K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator/templates/training-operator
20K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator/templates/training-orchestrator
48K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator/templates
72K     ./diff/tmp/flux-working905991546/training-orchestrator/training-orchestrator
324K    ./diff/tmp/flux-working905991546/training-orchestrator
8,0K    ./diff/tmp/flux-working905991546/.git/logs/refs/heads
8,0K    ./diff/tmp/flux-working905991546/.git/logs/refs/remotes/origin
12K     ./diff/tmp/flux-working905991546/.git/logs/refs/remotes
24K     ./diff/tmp/flux-working905991546/.git/logs/refs
32K     ./diff/tmp/flux-working905991546/.git/logs
8,0K    ./diff/tmp/flux-working905991546/.git/refs/heads
4,0K    ./diff/tmp/flux-working905991546/.git/refs/tags
8,0K    ./diff/tmp/flux-working905991546/.git/refs/remotes/origin
12K     ./diff/tmp/flux-working905991546/.git/refs/remotes
28K     ./diff/tmp/flux-working905991546/.git/refs
48K     ./diff/tmp/flux-working905991546/.git/hooks
8,0K    ./diff/tmp/flux-working905991546/.git/info
4,0K    ./diff/tmp/flux-working905991546/.git/branches
4,0K    ./diff/tmp/flux-working905991546/.git/objects/info
4,0K    ./diff/tmp/flux-working905991546/.git/objects/pack
12K     ./diff/tmp/flux-working905991546/.git/objects
196K    ./diff/tmp/flux-working905991546/.git
150M    ./diff/tmp/flux-working905991546
52K     ./diff/tmp/flux-working875122815/testing/end2end
56K     ./diff/tmp/flux-working875122815/testing
36K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper/charts
20K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper/templates
80K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper
40K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/openapi/paths
24K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/openapi/components
108K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/openapi
32K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/src/data_keeper/api/endpoints
44K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/src/data_keeper/api
24K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/src/data_keeper/database
12K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/src/data_keeper/core
104K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/src/data_keeper
108K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/src
12K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/docker
12K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/tests/unit_tests
156K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/tests/assets/data
160K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/tests/assets
44K     ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/tests/integration_tests
220K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image/tests
500K    ./diff/tmp/flux-working875122815/data-keeper/data-keeper-image
584K    ./diff/tmp/flux-working875122815/data-keeper
36K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper/charts
12K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper/templates/model-keeper
16K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper/templates/mlflow
8,0K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper/templates/tests
48K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper/templates
108K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper
8,0K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/openapi/paths
8,0K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/openapi/components
28K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/openapi
8,0K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/src/model_keeper/api/endpoints
20K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/src/model_keeper/api
52K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/src/model_keeper
56K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/src
24K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/mlflow-plugin/lumi_mlflow
96K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/mlflow-plugin
20K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/docker
8,0K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/tests/unit_tests
20K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/tests/integration_tests
36K     ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image/tests
332K    ./diff/tmp/flux-working875122815/model-keeper/model-keeper-image
444K    ./diff/tmp/flux-working875122815/model-keeper
28K     ./diff/tmp/flux-working875122815/devbox
36K     ./diff/tmp/flux-working875122815/gateway/gateway/charts
32K     ./diff/tmp/flux-working875122815/gateway/gateway/templates
92K     ./diff/tmp/flux-working875122815/gateway/gateway
28K     ./diff/tmp/flux-working875122815/gateway/gateway-image/src/gateway/api/routers
32K     ./diff/tmp/flux-working875122815/gateway/gateway-image/src/gateway/api
88K     ./diff/tmp/flux-working875122815/gateway/gateway-image/src/gateway
92K     ./diff/tmp/flux-working875122815/gateway/gateway-image/src
16K     ./diff/tmp/flux-working875122815/gateway/gateway-image/docker
16K     ./diff/tmp/flux-working875122815/gateway/gateway-image/tests/mock_integration_tests
16K     ./diff/tmp/flux-working875122815/gateway/gateway-image/tests/unit_tests
22M     ./diff/tmp/flux-working875122815/gateway/gateway-image/tests/assets/data
22M     ./diff/tmp/flux-working875122815/gateway/gateway-image/tests/assets
104K    ./diff/tmp/flux-working875122815/gateway/gateway-image/tests/integration_tests
22M     ./diff/tmp/flux-working875122815/gateway/gateway-image/tests
23M     ./diff/tmp/flux-working875122815/gateway/gateway-image
8,0K    ./diff/tmp/flux-working875122815/gateway/ocr-worker-image/docker
52K     ./diff/tmp/flux-working875122815/gateway/ocr-worker-image
23M     ./diff/tmp/flux-working875122815/gateway
16K     ./diff/tmp/flux-working875122815/training/src/training/logger
12K     ./diff/tmp/flux-working875122815/training/src/training/datasource
44K     ./diff/tmp/flux-working875122815/training/src/training
56K     ./diff/tmp/flux-working875122815/training/src
16K     ./diff/tmp/flux-working875122815/training/docker
12K     ./diff/tmp/flux-working875122815/training/tests/unit_tests
8,0K    ./diff/tmp/flux-working875122815/training/tests/assets/scripts
1,5M    ./diff/tmp/flux-working875122815/training/tests/assets/data
1,5M    ./diff/tmp/flux-working875122815/training/tests/assets
44K     ./diff/tmp/flux-working875122815/training/tests/integration_tests/logger
60K     ./diff/tmp/flux-working875122815/training/tests/integration_tests
1,6M    ./diff/tmp/flux-working875122815/training/tests
164K    ./diff/tmp/flux-working875122815/training/deps
2,1M    ./diff/tmp/flux-working875122815/training
16K     ./diff/tmp/flux-working875122815/inference/inference/templates
36K     ./diff/tmp/flux-working875122815/inference/inference
12K     ./diff/tmp/flux-working875122815/inference/inference-image/src/inference/api/routers
16K     ./diff/tmp/flux-working875122815/inference/inference-image/src/inference/api
36K     ./diff/tmp/flux-working875122815/inference/inference-image/src/inference
40K     ./diff/tmp/flux-working875122815/inference/inference-image/src
20K     ./diff/tmp/flux-working875122815/inference/inference-image/docker
41M     ./diff/tmp/flux-working875122815/inference/inference-image/tests/assets/classification
81M     ./diff/tmp/flux-working875122815/inference/inference-image/tests/assets/exported_color_bw
121M    ./diff/tmp/flux-working875122815/inference/inference-image/tests/assets
8,0K    ./diff/tmp/flux-working875122815/inference/inference-image/tests/integration_tests
121M    ./diff/tmp/flux-working875122815/inference/inference-image/tests
164K    ./diff/tmp/flux-working875122815/inference/inference-image/deps
121M    ./diff/tmp/flux-working875122815/inference/inference-image
121M    ./diff/tmp/flux-working875122815/inference
8,0K    ./diff/tmp/flux-working875122815/docs/docker
1,3M    ./diff/tmp/flux-working875122815/docs/docs/assets/images
1,4M    ./diff/tmp/flux-working875122815/docs/docs/assets
3,0M    ./diff/tmp/flux-working875122815/docs/docs
3,0M    ./diff/tmp/flux-working875122815/docs
24K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator-image/src/inference_orchestrator
28K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator-image/src
8,0K    ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator-image/docker
12K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator-image/tests
72K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator-image
16K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-operator-image/src
8,0K    ./diff/tmp/flux-working875122815/inference-orchestrator/inference-operator-image/docker
8,0K    ./diff/tmp/flux-working875122815/inference-orchestrator/inference-operator-image/tests
76K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-operator-image
16K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator/templates/inference-operator
20K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator/templates/inference-orchestrator
48K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator/templates
68K     ./diff/tmp/flux-working875122815/inference-orchestrator/inference-orchestrator
220K    ./diff/tmp/flux-working875122815/inference-orchestrator
16K     ./diff/tmp/flux-working875122815/cluster
12K     ./diff/tmp/flux-working875122815/training-orchestrator/training-operator-image/src
8,0K    ./diff/tmp/flux-working875122815/training-orchestrator/training-operator-image/docker
64K     ./diff/tmp/flux-working875122815/training-orchestrator/training-operator-image
48K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator-image/src/training_orchestrator
52K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator-image/src
8,0K    ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator-image/docker
96K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator-image/tests
180K    ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator-image
16K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator/templates/training-operator
20K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator/templates/training-orchestrator
48K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator/templates
72K     ./diff/tmp/flux-working875122815/training-orchestrator/training-orchestrator
324K    ./diff/tmp/flux-working875122815/training-orchestrator
8,0K    ./diff/tmp/flux-working875122815/.git/logs/refs/heads
8,0K    ./diff/tmp/flux-working875122815/.git/logs/refs/remotes/origin
12K     ./diff/tmp/flux-working875122815/.git/logs/refs/remotes
24K     ./diff/tmp/flux-working875122815/.git/logs/refs
32K     ./diff/tmp/flux-working875122815/.git/logs
8,0K    ./diff/tmp/flux-working875122815/.git/refs/heads
4,0K    ./diff/tmp/flux-working875122815/.git/refs/tags
8,0K    ./diff/tmp/flux-working875122815/.git/refs/remotes/origin
12K     ./diff/tmp/flux-working875122815/.git/refs/remotes
28K     ./diff/tmp/flux-working875122815/.git/refs
48K     ./diff/tmp/flux-working875122815/.git/hooks
8,0K    ./diff/tmp/flux-working875122815/.git/info
4,0K    ./diff/tmp/flux-working875122815/.git/branches
4,0K    ./diff/tmp/flux-working875122815/.git/objects/info
4,0K    ./diff/tmp/flux-working875122815/.git/objects/pack
12K     ./diff/tmp/flux-working875122815/.git/objects
196K    ./diff/tmp/flux-working875122815/.git
150M    ./diff/tmp/flux-working875122815
343G    ./diff/tmp
4,0K    ./diff/root/.cache/helm/repository
8,0K    ./diff/root/.cache/helm
12K     ./diff/root/.cache
4,0K    ./diff/root/.kube
4,0K    ./diff/root/.helm/repository
8,0K    ./diff/root/.helm
8,0K    ./diff/root/.config/helm
12K     ./diff/root/.config
4,0K    ./diff/root/.ssh
44K     ./diff/root
4,0K    ./diff/run/secrets/kubernetes.io/serviceaccount
8,0K    ./diff/run/secrets/kubernetes.io
12K     ./diff/run/secrets
16K     ./diff/run
343G    ./diff
4,0K    ./work/work
8,0K    ./work
686G    .

Unfortunately, this did not cut it for me...

Is this being looked at ? We are also hitting this frequently after recent upgrade.

I dont think so as v1 is obsolete. Maybe it is also necessary to to open a new issue to draw some more attention.

My hack to his is to restart the flux pod every hour. The /tmp is inside the pod file system so it gets deleted.

---
kind: ServiceAccount
apiVersion: v1
metadata:
  name: flux-restart
  namespace: flux

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: flux-restart
  namespace: flux
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "patch", "list", "watch", "delete"] 

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: flux-restart
  namespace: flux
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: flux-restart
subjects:
  - kind: ServiceAccount
    name: flux-restart

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: flux-pod-restart
  namespace: flux
spec:
  concurrencyPolicy: Forbid
  schedule: "0 */1 * * *" 
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 600
      template:
        spec:
          serviceAccountName: flux-restart 
          restartPolicy: Never
          containers:
            - name: kubectl
              image: bitnami/kubectl
              command:
                - kubectl
                - delete
                - pod
                - -l app=flux
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: flux-helm-restart
  namespace: flux
spec:
  concurrencyPolicy: Forbid
  schedule: "0 */1 * * *" 
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 600
      template:
        spec:
          serviceAccountName: flux-restart 
          restartPolicy: Never
          containers:
            - name: kubectl
              image: bitnami/kubectl
              command:
                - kubectl
                - delete
                - pod
                - -l app=helm-operator
Was this page helpful?
0 / 5 - 0 ratings