Terragrunt: RFC: single terragrunt.hcl per environment?

Created on 19 Jun 2019 · 15Comments · Source: gruntwork-io/terragrunt

Current state

The current practice for using Terragrunt is to create one folder for each module and put a terragrunt.hcl file in it. You also have one terragrunt.hcl at the root of each environment with shared configurations for that environment. The folder structure looks like this:

staging
├── terragrunt.hcl
├── frontend-app
│   └── terragrunt.hcl
├── mysql
│   └── terragrunt.hcl
└── vpc
    └── terragrunt.hcl

production
├── terragrunt.hcl
├── frontend-app
│   └── terragrunt.hcl
├── mysql
│   └── terragrunt.hcl
└── vpc
    └── terragrunt.hcl

New proposal

One thing I've been contemplating is if it makes sense to instead aim for the ability to have one terragrunt.hcl per environment so that the folder structure looks like this:

staging
└── terragrunt.hcl
production
└── terragrunt.hcl

To make this work, terragrunt.hcl would allow you to define the configuration for multiple modules in a single file. The code would look almost exactly like normal Terraform syntax. Here's a rough sketch of what that might look like:

# You can include one or more modules using a syntax nearly identical to Terraform
module "frontend_app" {
  # Use the source param to specify where the code for this module lives
  source = "github.com/acme/modules.git//frontend-app?ref=v0.3.4"

  # Specify input parameters for that module the same as how Terraform does it!
  instance_type = "t2.micro"
  instance_count = 10

  # You can reference outputs from other modules using the normal Terraform syntax!
  # This also defines dependencies between modules for apply-all style commands.
  vpc_id = module.vpc.id
  db_url = module.mysql.endpoint
}

module "mysql" {
  # Each source can point to different module and different versions
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  # You can reference shared inputs defined in this file below
  name = "${local.env_name}-mysql"

  memory = 4096
  disk_space = 250

  vpc_id = module.vpc.id
}

module "vpc" {
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  name = local.env_name
  cidr_block = "10.0.0.0/16"
}

# Local variables you can share amongst all the modules above
locals = {
  env_name = "staging"
}

# Backend configuration applied to all the modules above
remote_state {
  backend = "s3"
  config {
    bucket = "foo"
    region = "us-east-1"
    # module_name returns the name of the current module (e.g., frontend-app, mysql, etc)
    key = "${module_name()}/terraform.tfstate"
  }
}

Note that, instead of a single (potentially massive) .hcl file per environment, we could support loading all .hcl files in a folder, just like Terraform loads all .tf files in a folder. This would allow you to break up your definition into a few smaller files:

staging
├── production
│   ├── apps.hcl
│   ├── data-stores.hcl
│   └── networking.hcl
└── staging
    ├── apps.hcl
    ├── data-stores.hcl
    └── networking.hcl

Inside of apps.hcl you could have all your app modules (e.g., module.frontend_app from the code snippet above), inside data-stores.hcl you could have all your data stores (e.g., module.mysql from the code snippet above), and so on. The key is that all the references between modules and usage of local variables would work just as if everything was in one big file (as in the original code snippet above).

Example usage

Deploy the VPC module:

$ terragrunt apply --terragrunt-module vpc

This will check out the code for the VPC module into a temp dir, run terraform apply, and pass it the input variables you set.

Deploy the VPC and mysql modules:

$ terragrunt apply --terragrunt-module vpc --terragrunt-module mysql

This will check out both modules into temp dirs and run apply in each one, but since one module depends on the other, it will do so in the right order (vpc first, then mysql).

Deploy all modules:

$ terragrunt apply-all

This will check out all modules into temp dirs and run apply on all of them, again respecting the dependency order.

Advantages

Just one file (or a small number of files) per environment to manage instead of dozens or hundreds of files and folders
Easier to enforce dependencies
Easier to share variables
Easier to share backend configuration
Unlike writing the same code in one big .tf file, you get separate state files and can have separate permissions for each module, reducing the blast radius of errors

Disadvantages

You no longer get separation between modules by being in different folders (though environments are still in different folders), but only through the commands you run.

Other thoughts

If you really think about it, this is _very close_ to defining all of the exact same infrastructure in a single .tf file (or several .tf files in the same folder) with pure Terraform (no Terragrunt), and running terraform apply -target=module.<MODULE>? You could then apply changes to just one module at a time, with affecting any of the others, and get full code reuse, dependency management, etc from Terraform natively.

However, the downsides of doing this with pure Terraform are:

All the state is stored in one state file, so if you mess up anywhere, you could break everything.
Even with -target=module.<MODULE>, I believe Terraform applies not just MODULE, but all of the dependencies of <MODULE> too.
Does the state and all data sources have to be updated for everything, even if you're only targeting a subset of the modules? If so, plan and apply would be very slow.
Permissions management is crappy, as you'd have to have permissions to at least read everything, even if you're only updating a subset of the data.

But still... It's so close!

enhancement help wanted

Source

brikis98

👍53 👀2

Most helpful comment

@brikis98 This may have been mentioned elsewhere (or possible and I can't figure it out) but why don't you recursively include all terragrunt.hcl files? If you did, I wouldn't have to move my state files from environments/development to nonprod/development and declare my remote_state multiple times.

terraform
|-- environments
|   |-- development
|   |   |-- frontend
|   |   |   `-- terragrunt.hcl
|   |   |-- mysql
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl  (inputs {} only)
|   |-- production
|   |   |-- frontend
|   |   |   `-- terragrunt.hcl
|   |   |-- mysql
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl (inputs {} only)
|   `-- staging
|       |-- frontend
|       |   `-- terragrunt.hcl
|       |-- mysql
|       |   `-- terragrunt.hcl
|       `-- terragrunt.hcl (inputs {} only)
|-- infra
|   |-- nonprod
|   |   |-- s3
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl (inputs {} only)
|   `-- prod
|       |-- s3
|       |   `-- terragrunt.hcl
|       `-- terragrunt.hcl (inputs {} only)
`-- terragrunt.hcl (remote_state declaration and global vars)

This allows the flexibility to define variables at any hierarchy level.

soneil123 on 27 Jun 2019

👍12

All 15 comments

Before I begin, I'll throw in a caveat that I'm working in an enterprise trying to balance our Infrastructure as Code for ~15 business units and ~500 systems containing ~1500 applications in a centralized cloud team.

We are also currently still on v0.11.13 so I haven't had the chance to consider exactly how the v0.12 changes affect us both from a Terraform perspective and the v0.19 Terragrunt changes. Terraform not allowing passing variables not in use will likely require us to refactor some things.

Our usage of Terragrunt is a layered approach, which I'll attempt to break down:

Build our Enterprise-wide shared services that all accounts will utilize
- Transit Gateway(s) that hook back to our Direct Connect Gateway to our edge and back on-prem
- AWS Endpoints VPC that will route resolution to for any AWS services
Build each business unit a VPC in their own account (split by both unit and nonprod/prod)
- Includes building the VPC, subnets, 'default' security groups that have been vetted and approved by IT and Infosec

That's the first core usage, our current plan is that it all lives within a single repository and churn of configuration here will be minimal.

The repository currently looks like this:

terraform
├── terraform.tfvars
├── shared-services
|  ├── nonprod
|  |  ├── environment-class.tfvars
|  |  ├── information-technology
|  |  |  ├── account.tfvars
|  |  |  ├── _global
|  |  |  |  └── iam
|  |  |  |     └── terraform.tfvars
|  |  |  ├── us-east-2
|  |  |     ├── region.tfvars
|  |  |     └── endpoints
|  |  |        └── terraform.tfvars
|  |  |     └── transit-gateway
|  |  |        └── terraform.tfvars
|  |  ├── information-security
|  ├── prod
├── business-units
|  ├── nonprod
|  |  ├── environment-class.tfvars
|  |  ├── information-technology
|  |  |  ├── account.tfvars
|  |  |  ├── _global
|  |  |  ├── us-east-2
|  |  |     └── nonprod
|  |  |        └── terraform.tfvars
|  |  ├── <unit2>
|  |  ├── <unit3>
|  |  ├── <unit4>
|  |  ├── <unit5>
|  |  ├── <unit6>
|  |  ├── <unit7>
|  |  ├── <unit8>
|  |  ├── <unit9>
|  |  ├── <unit10>
|  |  ├── <unit11>
|  |  ├── <unit12>
|  |  ├── <unit13>
|  |  ├── <unit14>
|  |  ├── <unit15>
|  └── prod
|     ├── <unit1>
|     ├── <unit2>
|     ├── <unit3>
|     ├── <unit4>
|     ├── <unit5>
|     ├── <unit6>
|     ├── <unit7>
|     ├── <unit8>
|     ├── <unit9>
|     ├── <unit10>
|     ├── <unit11>
|     ├── <unit12>
|     ├── <unit13>
|     ├── <unit14>
|     ├── <unit15>
├── isolated-systems
|  ├── nonprod
|  |  ├── <isolatedsystem1>
|  |  └── <isolatedsystem2>
|  └── prod
|     ├── <isolatedsystem1>
|     └── <isolatedsystem2>
└── sandboxes
   └── developersandbox1

With this setup, we are creating an Infrastructure Module for the following pieces:

Shared services
Business Units
Isolated Systems
Sandboxes

Within those Infrastructure Modules we are implementing our Core Modules which describe how we implement resources. The Infrastructure Modules for each of the individual business units will be quite simple, at least to begin with, most just creating a VPC. Since that is all in Git, it allows us to simply pass in only the changes required between environments. It is this portion that our usage likely differs than most, electing to call each module from Terragrunt rather than building the Infrastructure Modules.

After all of that is complete, we will have another set of Terraform configs (likely structured very similarly) that will just take an account and VPC ID and will lookup various resources (subnets/security groups) via a defined contract of tags and build out that particular Systems configuration (servers, ALB, IAM roles, S3 buckets, etc).

There is a whole lot of connectivity (as well as resource sharing on the DB side to reduce cost) required between systems of a business unit and on-prem which is why each system is not it's own 'stack'.

With all that said, I probably didn't address what you're asking.

jakauppila on 20 Jun 2019

I am generally against using commandline arguments for targeting resources to deploy in a routine scenario, primarily because it is so easy to use Ctrl+R or up arrow in the history to accidentally apply the wrong thing. One of the nice things about the current terragrunt folder structure implementation is that when you are iterating, it is relatively hard to accidentally apply the wrong plan because you are unlikely to be in that folder.

yorinasub17 on 20 Jun 2019

Sounds a bit scary to me. The individual tfvars lets various teams in our organisation work on different aspects the infrastructure in nice isolation and there is low barrier of entry for newcomers when you are dealing with a single thing at the time. We also have a bunch tfvars generated by build automation and self-services (like deploying a new fargate service or creating a new rds cluster etc), that sounds very problematic if stuff would need to be appended to single shared file.

Also how do you organise everything in that single file in a way that makes sense when you have hundreds or thousands of resources to manage? I think the possibility to use normal directory structures provides nice way of grouping relevant things and makes them easy to find.

Personally in general I dislike massive configuration files and prefer a larger number of well structured smaller files. But I guess thats a bit like the mono repo discussion, everyone has their opinion and theres multiple sides to it :)

terozio on 20 Jun 2019

I prefer the current structure of different folders for prod/staging etc with sub folders underneath for each module. It makes it easier to digest what is being used for a given environment by just looking at the folder structure.

kelsmj on 25 Jun 2019

terraform
|-- environments
|   |-- development
|   |   |-- frontend
|   |   |   `-- terragrunt.hcl
|   |   |-- mysql
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl  (inputs {} only)
|   |-- production
|   |   |-- frontend
|   |   |   `-- terragrunt.hcl
|   |   |-- mysql
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl (inputs {} only)
|   `-- staging
|       |-- frontend
|       |   `-- terragrunt.hcl
|       |-- mysql
|       |   `-- terragrunt.hcl
|       `-- terragrunt.hcl (inputs {} only)
|-- infra
|   |-- nonprod
|   |   |-- s3
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl (inputs {} only)
|   `-- prod
|       |-- s3
|       |   `-- terragrunt.hcl
|       `-- terragrunt.hcl (inputs {} only)
`-- terragrunt.hcl (remote_state declaration and global vars)

This allows the flexibility to define variables at any hierarchy level.

soneil123 on 27 Jun 2019

👍12

I think this is awesome, personally. ~Everyone~ Some people on our team were apprehensive of using terragrunt because of how nested the structure gets, compared to our hand-rolled terraform helper that simply selected a tfvars file depending on your environment/workspace. This suggestion is a very welcome middle ground -- and like you said, it very much looks like vanilla terraform, which is a good thing.

That said if you're worried about the file getting too large, why not allow loading multiple .hcl files in a directory? That way people can still break up the module configs however they want, without needing to make a bunch of subdirectories.

jfunnell on 28 Jun 2019

That said if you're worried about the file getting too large, why not allow loading multiple .hcl files in a directory? That way people can still break up the module configs however they want, without needing to make a bunch of subdirectories.

Yup! I mentioned just that in the original description:

The one file could become massive. Perhaps we'd instead need to load all *-terragrunt.hcl files, similar to how Terraform loads all *.tf files?

brikis98 on 28 Jun 2019

👍6

This feature in itself is truly either an enabler or a blocker to my process of moving from our current standard terraform repos to terragrunt empowered repos. There is no way we will start segregating our state files by using the current terragrunt modular approach, however we obviously want to move away from the main.tf modules approach which implies so much unnecessary code duplication to reach the same result that we have right now in terms of state file (our safety net at the moment is whether the planned state file is unchanged or not, following a refactor of our code base to make it much prettier with no duplication at all and a modular approach).

Also, it would be a great plus to completely avoid the provider, template, and terraform versions duplication between modules, and be able to apply those dynamically from the central hcl file to all modules used in a stack as required, in the same fashion that we declare the state file backend generically. I am not sure if any or all of these items can be achieved with current terragrunt, so far I have only seen duplicate declarations of provider and TF version requirements in the modules examples.

singularit-io on 30 Oct 2019

I would prefer to have completely single .hcl file which will be dynamically resolved based on variables

for example:
set environment variable TF_VAR_ENV=["dev" | "qa" | "prod"]

variables.tf

# Empty, injected from TF_VAR_ENV
variable ENV {}

variable AWS_REGION {
    default = {
        dev = "eu-west-1"
        qa = "<some_region>"
        prod = "<some_region>"
    }
}

terragrunt.hcl

remote_state {
  backend = "s3"
  config = {
    bucket         = "<your-bucket-name>-terraform-state-${var.ENV}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = var.AWS_REGION[var.ENV]
    encrypt        = true
    dynamodb_table = "terraform-lock-table"
  }
}

Bryksin on 21 Nov 2019

I 'd like to implement the stack based terragrunt (no need environment based)

For example, I need create one application which need reference several modules. But in the same environment, I may have several other applications to be managed as well

So the terragrunt.hcl should support as below way:

terragrunt "module_1"  {         # < = that's the key part to support multiple modules.
  source = "<git_module_1_path>"

  include {
    path = find_in_parent_folders()
  }

  inputs = {
    name = "module_1"
  }
}

terragrunt "module_2"  {
  source = "<git_module_2_path>"

  dependency "module_1" {
    config_path = "."
  }
  include {
    path = find_in_parent_folders()
  }

  inputs = {
    name = "module_2"
  }
}

I have mentioned the same idea in #957

so nothing else need be changed. We just need support terragrunt "module_2" {} more than terragrunt {} . The rest should be exact same.

ozbillwang on 28 Nov 2019

👍2

One more thought on this RFC: if you take a step back and squint at it, I'm basically arguing for turning Terragrunt into a preprocessor for Terraform. This is similar to how SASS and Less are preprocessors for CSS code.

Perhaps the way to think of it is that users write normal Terraform code, perhaps even in normal .tf files

.
├── production
│   ├── main.tf
│   ├── outputs.tf
│   └── variables.tf
└── staging
    ├── main.tf
    ├── outputs.tf
    └── variables.tf

The code in those files will look almost like 100% vanilla Terraform code:

# Use normal Terraform syntax to define modules
module "frontend_app" {
  # Use the source param to specify where the code for this module lives
  source = "github.com/acme/modules.git//frontend-app?ref=v0.3.4"

  # Specify input parameters the usual way
  instance_type = "t2.micro"
  instance_count = 10

  # You can reference outputs from other modules the usual way
  vpc_id = module.vpc.id
  db_url = module.mysql.endpoint
}

module "mysql" {
  # Each source can point to different module and different versions
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  # You can reference local variables the usual way
  name = "${local.env_name}-mysql"

  memory = 4096
  disk_space = 250

  vpc_id = module.vpc.id
}

module "vpc" {
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  name = local.env_name
  cidr_block = "10.0.0.0/16"
}

# The normal way Terraform handles local variables
locals = {
  env_name = "staging"
}

# Backend configuration defined the usual way
terraform {
  backend "s3" {
    bucket = "foo"
    region = "us-east-1"
    # The terragrunt_module_name() helper is the only Terragrunt-specific thing here
    key = "${terragrunt_module_name()}/terraform.tfstate"
  }
}

When you run terragrunt apply --terragrunt-module vpc, Terragrunt "preprocesses" your Terraform code by rewriting it into a temporary folder where each module is defined in a separate folder with abackend config with a unique key value, and then runs terraform init and terraform apply in one of those folders.

In other words, it's doing what we think Terraform should do natively, on top of more or less normal Terraform code.

brikis98 on 12 Feb 2020

👍10 👀2

A pre-processor would be the perfect solution for Terraform and Terragrunt can handle dependency management.

Right now terragrunt seems to be solving for 3 things:

Generating Terraform Config with less code
Adding new HCL functions that solve common problems
Dependency management and ordering.

Those could be 3 separate tools.

kzap on 5 Sep 2020

About less code, that does sometimes make me think. I'm relatively new as a serious Terraformer / Terragrunter, but let's say I have similar environments across several regions, and I change something common like a module input variable. I have to go through all my terragrunt.hcls e.g.

sandbox/us-east-1/some-module/terragrunt.hcl
prod/eu-central-1/some-module/terragrunt.hcl
# (another few of those)

and make the exact same change e.g. my application module now outputs load_balancer_url instead of endpoint_url.

I also have to go into sandbox/us-east-1, prod-eu-central-1, and every other directory to run terragrunt apply-all.

This grates, but is still better than using vanilla Terraform or manual configuration.

Are these things that resonate, or have I missed something (even something obvious) which is quite possible because there's a lot to learn? If so, happy to be pointed in the right direction.

antgel on 5 Sep 2020

I believe the answer to your question is 'automation'. As in some kind of CI setup if you feel that you are safe to deploy everything at once. I can't imagine you actually want to deploy _everything_ at the same time without some kind of promotion strategy though, as the larger your environments become, the more fragile they can be.

I think you want to have a CI tool that applies your stacks 'constantly' (ie: every time they change), and you should be using module references (tags or a commit SHA) to ensure you have your terragrunt files correctly updated, before then committing changes to each environment one at a time and letting the CI tool roll the changes out.

e: And I don't really think this proposal would really change your specific issue? It might make some shared code easier to share, I guess.

jfunnell on 7 Sep 2020

Hello,

I was rethinking the way i'm managing a whole GCP infrastructure, and was asking myself of how to use additional layers of modules, and this looks promising.

I have a HCL hierarchy, which represents all the resources deployed accross all the projects and folders of my organization.

Each one of the folders will host a HCL file calling for a module. These modules, for now, regroups all the resources needed for this particular folder and subsequent project.

Obvisouly, continuing that way, will not be DRY compliant at all.

So i was thinking of how i could manage that :
any folder with .hcl -> source module (service definition) -> module (resource blocks to answer the service needs)

And the RFC looks pretty nice, because it would bring the services composition to the first layer.

Do you have any update on this ?