Terragrunt: Terragrunt cache bloats the disk usage really fast.

Created on 4 Sep 2018  路  12Comments  路  Source: gruntwork-io/terragrunt

Each instance of the terragrunt module creates it's own cache that bloats the disk usage.

$ ls -la
-rw-r--r-- 1 ed kvm  3640 Aug  2 10:18 README.md
-rw-r--r-- 1 ed kvm  1313 Sep  4 16:26 terraform.tfvars
drwx------ 3 ed kvm  4096 Sep  3 10:11 .terragrunt-cache
$ du -sh .terragrunt-cache/
289M    .terragrunt-cache/

Is it possible to use a shared cache that re-uses already downloaded modules (and their versions), so I don't have to download all of the dependencies for each module instantiation?
Collectively it is 10GB of cache.

enhancement help wanted

Most helpful comment

There are a few aspects to this:

  1. We want to make debugging easy. We used to download code into a tmp dir or home folder, but that made it tough to find which folders Terragrunt was using. Having everything in the local .terragrunt-cache folder makes it easier to see and figure out what Terragrunt is doing.
  2. However, downloading the full repo every time eats up lots of disk space. Is it possible to download the repo into a common location (e.g., ~/.terragrunt-cache) and symlink it to the local .terragrunt-cache? Does this work properly if you're running apply-all and lots of downloads are happening concurrently? Does this work properly if you are using different versions of a repo? How do we version the repo so you can share code from the same versions but don't mix up code from different versions?
  3. Terraform then downloads the provider binaries. You can reduce the disk space usage here by enabling the terraform provider cache.
  4. Terraform then downloads the modules your .tf files reference. I have no clue if we can do anything to optimize this.

Suggestions on how to improve this are welcome!

All 12 comments

There are a few aspects to this:

  1. We want to make debugging easy. We used to download code into a tmp dir or home folder, but that made it tough to find which folders Terragrunt was using. Having everything in the local .terragrunt-cache folder makes it easier to see and figure out what Terragrunt is doing.
  2. However, downloading the full repo every time eats up lots of disk space. Is it possible to download the repo into a common location (e.g., ~/.terragrunt-cache) and symlink it to the local .terragrunt-cache? Does this work properly if you're running apply-all and lots of downloads are happening concurrently? Does this work properly if you are using different versions of a repo? How do we version the repo so you can share code from the same versions but don't mix up code from different versions?
  3. Terraform then downloads the provider binaries. You can reduce the disk space usage here by enabling the terraform provider cache.
  4. Terraform then downloads the modules your .tf files reference. I have no clue if we can do anything to optimize this.

Suggestions on how to improve this are welcome!

I would be nice to have an option to auto remove cache after execution. I.e. after apply command.

I just deleted 120GB of .terragrunt-cache. I'm working on multiple environments (17 to be exact, 2 mostly destroyed and left on standby, ~750 modules used across), all are aligned with same versions of modules.
Keeping .terragrunt-cache per module is wrong architectural design. terragrunt shouldn't force me to download same repos over and over again. By default it should have common directory and use symlink as @brikis98 said. Having a flag to create local .terragrunt-cache could be an option to debugg (I've never debugged it tho).

I can't imagine supporting and working on infrastructure with 100 clients or more. I would have to either delete local .terragrunt-cache after every apply, forcing me to redownload hundreds of repos or upgrade my SSD (macosx not that easy and not that cheap) with at least 1TB.
terragrunt in this form does not scale.

@3h4x Ideas on how to improve this are welcome, but we need something that explicitly explains how it solves the issues in https://github.com/gruntwork-io/terragrunt/issues/561#issuecomment-418692976.

@brikis98 I have few ideas but I'm not sure how complicated and feasible they are.
One is symlinks already mentioned, second one is proxy and replacing source with adhoc localhost cache repo. Kinda nasty hack.
Unfortunately I don't think I will be able to help to sort this issue.

Proxy sounds a bit too hacky. Symlinks are more promising, but not without a lot of complexities and gotchas. We're certainly open to PRs that can think through those issues, but for now, periodically clearing the cache as documented here is hopefully a good-enough workaround.

I'm also interested in this problem. In my case, we have throttled speed to the git server and a fresh pull every time gets slow really fast.

I may have time to invest in this for a MR if we have a favorable approach.

I may have time to invest in this for a MR if we have a favorable approach.

A PR with a proposal (e.g., just written in a README) that thinks through all the corner cases I mentioned above is welcome!

To help with the problem if you are using terraform 0.12 you can add depth=1 as a param to your source path to have terraform only do a shallow clone of the git repo. Especially when combined with the plugin cache mentioned earlier this really cut down my disk space usage.

e.g:

terraform {
  source = "git::https://github.com/lgallard/terraform-aws-cognito-user-pool.git//?ref=0.4.0&depth=1"
}

It's notable that the plugin cache uses hard links at least in some cases so some tools (including du) inflate how much space is used up, notice the inode numbers at the start of this ls output are identical

$ ls -i ~/.terraform.d/plugin_cache/linux_amd64/terraform-provider-aws_v2.60.0_x4 
55312406 /home/jfharden/.terraform.d/plugin_cache/linux_amd64/terraform-provider-aws_v2.60.0_x4
$ ls -i .terragrunt-cache/TmsRQq5jb8Fikqhb4v0N_CPOD8Y/wvSG5F9NOzb3ZsP4sykMWVf-V1c/.terraform/plugins/linux_amd64/terraform-provider-aws_v2.60.0_x4 
55312406 .terragrunt-cache/TmsRQq5jb8Fikqhb4v0N_CPOD8Y/wvSG5F9NOzb3ZsP4sykMWVf-V1c/.terraform/plugins/linux_amd64/terraform-provider-aws_v2.60.0_x4

I'm pretty convinced symlinking is going to cause all kinds of trouble, especially with the generators creating provider files etc inside the module directory, but one possible solution which does have some caveats:

The repos could be cloned into a cache directory, something like ~/.terragrunt-cache/modules/github.com/owner/repo.git/<gitref>/ and then you could hardlink instead of symlink. Orchestrating this yourself would be painful, but if you were to rsync the directory you could use the --link-dest option which would deal with all the intricacies, this way you cut the amount of disk space used dramatically if the same module has been cloned more than once, or if the same repo has multiple modules in.

The caveats here are:

  1. It means the OS must have rsync installed (if using rsync to orchestrate)
  2. It doesn't work across filesystem boundaries
  3. I am unsure of the windows support, I know it's _possible_ to get rsync to work on windows, but I suspect it's not as trivial as on linux/osx (where it's usually preinstalled, or a simple apt/yum/brew command away).
  4. It only works on _some_ filesystems (but on all the major default ones now that I know of, ext3, ext4, zfs, NTFS, hfs+)

What we ended up doing internally is to create a really small wrapper over terragrunt.

This tool will fetch all sources, clone them with the format ~/path-to-cache-dir-/source-name/<gitref> then apply using this wrapper and use --terragrunt-source to specify the source.

This is highly tailored to our usecase/directory structure and module source :(

I wonder if leveraging gits reference feature may help us here? Or at least worth exploring... Somehow fetch it once and force all the others to be reference clones.

git clone --reference

https://randyfay.com/content/reference-cache-repositories-speed-clones-git-clone-reference

Was this page helpful?
0 / 5 - 0 ratings