terraform init seems to not support basic terraform best practice

Created on 8 Aug 2018  路  12Comments  路  Source: hashicorp/terraform

Terraform all versions.

It's terraform best practice to store environment config in tfvars files . e.g:

    environments/
        |-- dev.tfvars      # store dev configuration
        |-- prod.tfvars    # store stage configuration
    resources/
        |-- main.tf           # resources etc.

(Nice clean separation of environment config from other config from terraform resources - good!)

And to use e.g. terraform plan -var-file environments/dev.tfvars

But (in terms of AWS at least) its also best practice to separate dev, prod in terms of aws accounts (for one, you like to have business/enterprise support on prod which is charged as a %age of complete account cost, so having everything in the same account is very expensive for support; for another its good for security boundaries anyway as some security fundamentals can only be applied on an account basis). This yields a natural fit of storing state in a different S3 bucket for each environment too - eg terraform-dev in dev, terraform-prod in prod. (If you dont do this, then you have far more cross-account assume-roles to do, because the state has to live in one account, whilst the terraforming is done in another account, but even if you do store in the same bucket the following still applies...)

Up to now, all fine and nice and clean, but terraform init doesn't cope with this at all. Why? because terraform init is rigid and has to use .terraform locally to store its local version of a single state.

Therefore a backend that looks like this:

terraform {
  backend "s3" {
#   The bucket cannot be specified here as it changes between environments
    dynamodb_table = "terraform_lock"
    encrypt        = true
    key            = "main.tfstate"
    region         = "eu-west-1"
  }
}

Should work in conjunction with terraform init -no-input -backend-config=bucket=terraform-dev Likewise it should work when "switching" with terraform init -no-input -backend-config=bucket=terraform-prod

But it's made needlessly complicated because firstly the second terraform init needs to see both buckets if you dont add -reconfigure (For some reason, I guess it wants to support state migration?):

Prior to changing backends, Terraform inspects the source and destination
states to determine what kind of migration steps need to be taken, if any.
Terraform failed to load the states. The data in both the source and the
destination remain unmodified. Please resolve the above error and try again.

and secondly, with -reconfigure, it introduces a huge delay anyway as downloading modules takes a while, and syncing/reading the state each time from a different place takes a while, and so changing the environments makes for slow and frustrating terraform experience.

Stored state should be in different files or buckets for each environment, otherwise anyone read/writing dev state can also read/write prod state and you have no graduated security in your team (which is fine in some places but not all). But constantly having to redo the terraform init every time you switch from dev to prod is a complete pain. Workspaces are also not the answer to this problem (once again storing stuff in the same place?)

Something as simple as allowing different places for the .terraform directory would completely fix this. ( #3503 ) I could then init twice - one for dev, one for prod, store them both and be a happy terraformer just by means of an environment variable- Not only this, but it brings so many advantages because it allows you to run multiple terraforms concurrently against different environments on the same source tree (terraform init completely blocks this having a single place).

This seems like such a trivial enhancement, can this be fixed? Or can others elaborate on what they do? Maybe I'm missing something obvious. My requirements are:

  • Store state remotely in different places for each environment for security reasons
  • Not have to run terraform init every time I switch environments as it takes ages

with a bonus (not requirement) that:

  • I can run terraform in dev and prod concurrently off the same working tree

Right now I work around this with symlinks and copying the whole working directory to a tmp location. It really sucks. But environments are a fundamental part of terraform so what am I missing?

Thanks!

@apparentlymart you might be interested in this?

enhancement

Most helpful comment

Hi @gtmtech! Thanks for writing this up.

In terms of customizing the location of the .terraform directory to start: this isn't _directly_ possible, but you can get the same effect by using multiple working directories:

(assuming the current working directory is the one containing the root module config files)
$ mkdir prod
$ cd prod
$ terraform init .. -backend-config=bucket=terraform-dev
...
$ terraform apply ..
...

The above, while admittedly not intuitive, will allow you a separate .terraform directories for a single config within Terraform's current functionality.

I expect we could accept a PR to make the .terraform directory location configurable via an environment variable; I can't think of a reason right now why that would make any future development harder. With that said, I have some broader feedback on the approach you've sketched out here...


I'd actually recommend a different layout as a "best practice":

    environments/
        |-- dev/ # dev configuration
            |-- dev.tf
        |-- prod/ # prod configuration
            |-- prod.tf
    resources/ # shared module for elements common to all environments
        |-- main.tf

Then environments/prod/prod.tf might look like this:

terraform {
  backend "s3" {
    bucket         = "terraform-prod"
    dynamodb_table = "terraform_lock"
    encrypt        = true
    key            = "main.tfstate"
    region         = "eu-west-1"
  }
}

module "main" {
  source = "../resources"

  # per-environment settings here, as you had in the .tfvars files in your example
}

By having a separate _root module_ for each environment, you can gather together all of the necessary information in a single place that is separate for each one:

  • The complete backend configuration (no need for terraform init arguments)
  • The variable values
  • Once you run terraform init in that dir, also the .terraform directory.

To switch between environments, you just switch directories, and there's no risk at all of accidentally tangling up any of the above and applying the wrong thing to the wrong target.

This also allows for more elaborate differences between environments in situations where it's warranted. For example, the access control settings might be configured entirely differently for each environment even though the main infrastructure is the same, because you generally want a more restrictive configuration for production than you would for a development environment. To model that, you can put the access control resources directly inside environments/dev and environments/prod, while still sharing the resources module across them both.

Workspaces are not a good fit for modelling environments, because they only address part of the problem: separating the state. This is one reason why they were renamed from "environments" to "workspaces" in an earlier release, since the initial name gave the impression that they were analogous to what most teams mean when they say "environment": an entirely isolated set of infrastructure.

I'm aware that we have some leftover old documentation that still uses example workspace names like "dev" and "prod" which give the impression that this is the intended use of workspaces. We intend to fix this, and have started by revising the _When To Use Workspaces_ section to be a lot more explicit about what workspaces are suited for and what they are less suited for.

The purpose that workspaces were designed to serve -- _temporary_ copies of an infrastructure during development -- is something that we now know most teams don't do in practice, since it's often too expensive to give each developer a separate stack and more reasonable to simply have a shared, long-lived "staging" or "dev" environment to rehearse changes. A shared long-lived environment has its _own_ problems, such as that different developers will often conflict with one another, and so we intend to revisit this in a future major release to find a better workflow that works well for safely developing, testing, and deploying changes.

In the mean time, I think most teams should not use workspaces, and should instead use the module-per-environment pattern I described above. We intend to describe this in more detail in the documentation once we get past our current focus for the 0.12 release of improving the configuration language. (Indeed, many of the improvements in 0.12 are aimed at making it easier to write configurable modules, which will hopefully reduce some of the friction that the above suggested pattern has today due to issues like not being able to pass complex data structures into and out of modules.)

Thanks again for sharing this! We definitely do intend to revisit this in a later release and find a more intuitive workflow, since we're aware that there is lots of friction right now. I expect that we will lean towards removing remaining friction from the above pattern, whatever that might mean, because it gives access to the full power of the configuration language when defining an environment rather than cobbling together variables files and command line arguments.

All 12 comments

+1

Hi @gtmtech! Thanks for writing this up.

In terms of customizing the location of the .terraform directory to start: this isn't _directly_ possible, but you can get the same effect by using multiple working directories:

(assuming the current working directory is the one containing the root module config files)
$ mkdir prod
$ cd prod
$ terraform init .. -backend-config=bucket=terraform-dev
...
$ terraform apply ..
...

The above, while admittedly not intuitive, will allow you a separate .terraform directories for a single config within Terraform's current functionality.

I expect we could accept a PR to make the .terraform directory location configurable via an environment variable; I can't think of a reason right now why that would make any future development harder. With that said, I have some broader feedback on the approach you've sketched out here...


I'd actually recommend a different layout as a "best practice":

    environments/
        |-- dev/ # dev configuration
            |-- dev.tf
        |-- prod/ # prod configuration
            |-- prod.tf
    resources/ # shared module for elements common to all environments
        |-- main.tf

Then environments/prod/prod.tf might look like this:

terraform {
  backend "s3" {
    bucket         = "terraform-prod"
    dynamodb_table = "terraform_lock"
    encrypt        = true
    key            = "main.tfstate"
    region         = "eu-west-1"
  }
}

module "main" {
  source = "../resources"

  # per-environment settings here, as you had in the .tfvars files in your example
}

By having a separate _root module_ for each environment, you can gather together all of the necessary information in a single place that is separate for each one:

  • The complete backend configuration (no need for terraform init arguments)
  • The variable values
  • Once you run terraform init in that dir, also the .terraform directory.

To switch between environments, you just switch directories, and there's no risk at all of accidentally tangling up any of the above and applying the wrong thing to the wrong target.

This also allows for more elaborate differences between environments in situations where it's warranted. For example, the access control settings might be configured entirely differently for each environment even though the main infrastructure is the same, because you generally want a more restrictive configuration for production than you would for a development environment. To model that, you can put the access control resources directly inside environments/dev and environments/prod, while still sharing the resources module across them both.

Workspaces are not a good fit for modelling environments, because they only address part of the problem: separating the state. This is one reason why they were renamed from "environments" to "workspaces" in an earlier release, since the initial name gave the impression that they were analogous to what most teams mean when they say "environment": an entirely isolated set of infrastructure.

I'm aware that we have some leftover old documentation that still uses example workspace names like "dev" and "prod" which give the impression that this is the intended use of workspaces. We intend to fix this, and have started by revising the _When To Use Workspaces_ section to be a lot more explicit about what workspaces are suited for and what they are less suited for.

The purpose that workspaces were designed to serve -- _temporary_ copies of an infrastructure during development -- is something that we now know most teams don't do in practice, since it's often too expensive to give each developer a separate stack and more reasonable to simply have a shared, long-lived "staging" or "dev" environment to rehearse changes. A shared long-lived environment has its _own_ problems, such as that different developers will often conflict with one another, and so we intend to revisit this in a future major release to find a better workflow that works well for safely developing, testing, and deploying changes.

In the mean time, I think most teams should not use workspaces, and should instead use the module-per-environment pattern I described above. We intend to describe this in more detail in the documentation once we get past our current focus for the 0.12 release of improving the configuration language. (Indeed, many of the improvements in 0.12 are aimed at making it easier to write configurable modules, which will hopefully reduce some of the friction that the above suggested pattern has today due to issues like not being able to pass complex data structures into and out of modules.)

Thanks again for sharing this! We definitely do intend to revisit this in a later release and find a more intuitive workflow, since we're aware that there is lots of friction right now. I expect that we will lean towards removing remaining friction from the above pattern, whatever that might mean, because it gives access to the full power of the configuration language when defining an environment rather than cobbling together variables files and command line arguments.

@apparentlymart Thanks for taking the time for a detailed explanation! I have seen the directory per environment approach used above, and appreciate the comments you make about it.

In my experience this approach leads to teams having 2 places to put resources - in the (common) resources folder, or in the (environment specific) folder. Over time teams can put more and more in the environment specific folders (as things may initially just be for one environment) and they end up with a difficult-to-compare set of resources and possibly also multiple copy+pastes across different environments with the inevitable question (why isnt production anything like staging) question being asked later down the line.

It's for this reason I prefer instead to mandate a single resources area, and use feature-flags (a variable map "flags") and copious use of count on all resources to toggle features on and off in environments, and use tfvars lists and maps to wire in other config. In this way the differences between two environments are just comparing two tfvars files which I find simpler for team to get a handle on, than having to compare trees of differently configured resources and submodules, and with a single area for resources, teams have to make sure they work across all environments, which means they dont end up putting any environment specific resources in the resource definitions themselves. This tends to lead to cleaner, DRYer code.

However I also appreciate that your approach solves the immediate problem I'm having of separate state files (although it also requires me to cd quite a lot).

I think on balance if you're up for a PR, I might see if I can do one, to use an optional env var for the location of the .terraform directory, and then we can use both approaches in different situations where they make sense for the end users?

Thanks for that extra context, @gtmtech!

It is true that the module-per-environment approach does rely on humans to police themselves and each other (complying with a policy on which resources -- if any -- belong in the environment-specific module) whereas using variables implicitly forces that through the limitations of a .tfvars file. Our usual attitude is that Terraform should encourage a best-practice but give you room to step out of it when it's not appropriate, and so we generally lean towards the "trust the humans" angle when designing features.

However, given that moving the .terraform directory doesn't seem like it would create any particular _harm_ (either directly for end-users or making future development more difficult) I'm happy to be flexible and make room for this alternative approach, even though I don't feel ready to call it a "best practice" yet. (Good experience reports may change my mind! :grinning:)

I think it is probably best to wait until the long-running 0.12 development branch is merged into master before starting a PR on this, since I expect they'll cover similar ground and we'd end up having to resolve merge conflicts down the road otherwise. However, once that is true (still gonna be at least a few weeks away though, I expect :confounded:) we'd be happy to review a PR for this.

Thanks again for writing this up, and for the interest in working on a PR!

@apparentlymart A couple of questions about the recommended structure above:

    environments/
        |-- dev/ # dev configuration
            |-- dev.tf
        |-- prod/ # prod configuration
            |-- prod.tf
    resources/ # shared module for elements common to all environments
        |-- main.tf

using example dev.tf

module "main" {
  source = "../resources"

  # per-environment settings here, as you had in the .tfvars files in your example
}
  1. Is the first source attribute meant to be ../../resources instead?
  2. How do outputs work in this structure? I used to have an outputs.tf in the root directory and now I see no output when running

terraform output
The state file either has no outputs defined, or all the defined
outputs are empty. Please define an output in your configuration
with the output keyword and run terraform refresh for it to
become available. If you are using interpolation, please verify
the interpolated value is not empty. You can use the
terraform console command to assist.

My old structure was:

        |-- file1.tf
        |-- file2.tf
        |-- variables.tf
        |-- outputs.tf

Now I have:

    environments/
        |-- dev/ 
            |-- dev.tf
        |-- prod/ 
            |-- prod.tf
    resources/ # Everything that used to be at the root directory level
         |-- file1.tf
         |-- file2.tf
         |-- variables.tf
         |-- outputs.tf

My dev.tf looks like this:

terraform {
  backend "s3" {
    ...
  }
  version = "~> 0.11.8"
}

provider "aws" {
...
}

module "main" {
  source = "../../resources"
}

When I run from the dev directory terraform apply or terraform output I see no outputs, is there a way to "import" the outputs.tf so that all output are printed?

@apparentlymart I think I figured out a way to get the outputs in the structure, from within the dev directory I now run terraform output --module=main

It also can help in automation when you want to run terraform init from operational directory, instead of going to each directory of the project.

Does there exist a repo on github (or elsewhere) that demonstrates this layout by @apparentlymart (either an example repo or a real-thing)?

I'm diving into a new AWS project as a newbie, and while trawling for info on how to organize my terraform code while handling multiple environments I ran across this issue. The examples given are a great start, but if I could see a non-trivial example of this in action, it would help immensely.

I think a general challenge with showing "full examples" for something like this is that they necessarily involve making some decisions that are not really relevant to the pattern and can distract from the point being made. Defining environments in Terraform involves deciding what an environment is "made of" in your organization, which is a subjective architectural decision separate from how the Terraform configurations for representing those environments are organized.

With that said, in a prior role (before joining the Terraform team at HashiCorp) I wrote up an overview of a generalization of that organization's structure in my personal blog: Creating Environments. It's part of a bigger series of posts that describe the overall approach, but that particular article is about codifying the environments themselves in Terraform. That content is from 2017, so some details may have shifted in the mean time, but I think the general approach is still valid.

I've been sent this issue and after reviewing, I'm don't agree with the below statement.

I'd actually recommend a different layout as a "best practice":

    environments/
        |-- dev/ # dev configuration
            |-- dev.tf
        |-- prod/ # prod configuration
            |-- prod.tf
    resources/ # shared module for elements common to all environments
        |-- main.tf

What this is suggesting is managing multiple tf files per environment. This being stated as best practice is a concern as it really seems to me to be more of a work around to a missing piece of functionality that terraform doesn't provide, as stated by the original author @gtmtech.

My main issue with this approach is that I now have to keep in sync multiple .tf files and this doesn't follow with the principles of abstraction. Keeping different files for different environments in sync is a big concern, especially as my environments grow and the files get more complicated to keep in sync. As my environments get more complex, my references to modules also increase as new functionality come in. We should abstract the environment specifics and parameterize the terraform file, passing in environment variables in the form of tfvar file.

@apparentlymart, are we sure this is best practice?

Ultimately, using this best practice advice, there is no guarantee that what gets tested and deployed in dev uses the same deployment files in prod as, actually, they are different files.

Thoughts anyone?

You can mitigate that concern by making sure that each environment module contains only a backend configuration and a single module block with the same source in each environment. Then the environment modules contain only the information that is actually different between the environments, and the shared module contains the common elements.

How similar the environments are is ultimately up to you.

@apparentlymart - this is indeed what I do now, however its a bit more complicated than that -

You need to have at a minimum, backend.tf, single module block, variable declarations and provider declarations. Hmm, well I guess you can define `providers within the module actually, but the variables are needed.

This can be condensed into a single file, but the previous tfvars collections of file look and feel so much better, I've gone back to using them this way, and wrapping terraform again.

I think when I get time Ill work on that support for storing tempfiles in a user-specified directory rather than .terraform as this will make it a lot simpler to do stuff at scale.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rkulagowski picture rkulagowski  路  3Comments

c4milo picture c4milo  路  3Comments

thebenwaters picture thebenwaters  路  3Comments

franklinwise picture franklinwise  路  3Comments

ronnix picture ronnix  路  3Comments