Terraform: Documentation: Terraform remote state workflow

Created on 12 Jul 2016 · 16Comments · Source: hashicorp/terraform

It is not made readily apparent by the documentation what the appropriate workflow for terraform remote state management should be. A general overview of the current organization of documentation:

The Remote State page indicates that without local storage, Terraform stores state in a local terraform.tfstatefile. There will still be a local terraform.tfstate file at the default path of .terraform/terraform.tfstate whose existence is undocumented.

It's not until we see the documentation for the remote config command that we get an understanding of how remote state is managed. Here, it is prominently mentioned in the first paragraph that once remote state is configured, Terraform will pull automatically before plan and push after apply. This is all well and good, but what about this local pesky state file?

And what about transient failures? If remote state fails to download, then what? This is only covered in one place: remote push. Also, the existence of remote config push and terraform push was somewhat confusing to me--particularly when the latter referenced the former.

So the remote config documentation says not to store state locally. That's fine, but now we have to document how to get started with _our_ Terraform setup, right? We chose to do this by wrapping Terraform in Makefiles to abstract away the initial remote config, but then there's no really clear picture about how this process should look. Do we have to pull? Do we rely on terraform apply or plan to pull for us?

If one is just getting started with Terraform and finds that remote state is The Way To Go(tm) (which it is on a team of more than one or two people), it can be quite intimidating to approach remote state management--particularly while you are simultaneously learning the general Terraform workflow itself.

It seems to me that the natural place to centralize this workflow would be around the Remote state page. This page currently only serves as a _very_ generic explanation of what Remote State is. It might be helpful to document the general concepts, workflow, and maybe suggested ways to implement managing remote state for a team at this page. Further detail on specific components of that would then be provided by other pages like remote config, remote config pull, etc.

It would be useful to explain the relationship between the remote state file and the local state cache (or even explain that .terraform/terraform.tfstate _is_ a local cache). Basically, a good approach to documentation for something like this would be:

Here's generally what remote state is
Here's why remote state
Here are the components of remote state
- Backend
- Remote state "file"
- Local state cache
Here's the process of managing remote state
- Synchronize before plan
- Synchronize after apply
- Push/pull manually after transient failures
Example usage with explanation of what happens at each step.

Something like that. That would be thorough and exquisitely helpful.

While learning how to deal with remote state, I found this article quite helpful: http://code.hootsuite.com/how-to-use-terraform-and-remote-state-with-s3/

I just wish I didn't _have_ to refer to third-party documentation to get a clearer understanding of remote state.

documentation

Source

grepory

👍22

Most helpful comment

Hi, I've tried the workflow described above by @apparentlymart , however, I'm getting errors like the following:

* Resource 'data.terraform_remote_state.vpc' does not have attribute 'default_vpc_id' for variable 'data.terraform_remote_state.vpc.default_vpc_id'

The values are correctly stored in s3, any ideas?

brentmurphy on 6 Dec 2016

👍11

All 16 comments

Not to mention all of Charity's writings. And after our release in September, I'm going to be doing a lot of writing. Maybe, I'll pick this task up then.

For now it is VERY unclear to me why you need to do terraform remote config if you already have a terraform_remote_config {} resource defined. We started to migrate to resourced remote state from command line (shell script) remote state and then found there was absolutely no benefit. So much so that I'm filing a bug on it.

https://github.com/hashicorp/terraform/issues/7759

spanktar on 21 Jul 2016

So I don't understand the terraform_remote_config resource at all. Like _at all_.

I assumed that was what I used to get outputs from other remote states.

grepory on 21 Jul 2016

@grepory You set up remote state to store the terraform state file (usually terraform.tfstate) somewhere other than locally (often S3). That way if another developer is using the same TF build, they can use the same state and not clobber each other.

You can ALSO get values out of it if you'd like, but I've found most of the values are already available to you.

See the image on this page: http://code.hootsuite.com/how-to-use-terraform-and-remote-state-with-s3/

To discuss further, join #terraform-tool on IRC on Freenode

spanktar on 21 Jul 2016

Yeah, I get remote state. We have several terraform remote states in an S3 bucket.

I mean, it took me a while to get remote state. I used the command line to configure it (we do this in a Makefile now). I didn't realize that you could use the remote state resource to do that.

What I'm trying to figure out (now) is how to _share_ outputs from one state to another terraform stack. I thought I'd be able to use terraform_remote_config to do that.

E.g. we have infrastructure.tfstate and service_a.tfstate, and I want to get say the ecs_cluster id from infrastructure.tfstate to use in service_a. Does that make sense?

grepory on 21 Jul 2016

@grepory The terraform_remote_state resource (actually a data source as of 0.7) is a read-only counterpart to the writes done by the various terraform remote subcommands (which, as you note, are rather poorly documented today.)

You have the right idea that it's for passing values from one configuration to another. The idea is that your "parent" configuration would have its remote state stored via one of the remote storage backends, and then the "child" configuration would have a terraform_remote_state resource with the same configuration, which will then read and expose the _outputs_ from the parent configuration's state for use in the child configuration.

Your example of having your "service a" obtain values from the "infrastructure" is exactly the intended pattern. To do this, you need to first set up remote state for your infrastructure configuration using a command like the one shown under "Using Remote State on S3" in that great Hootsuite article. For the sake of example I'll populate it with some values:

terraform remote config -backend=s3 -backend-config="bucket=example-bucket" -backend-config="key=infrastructure/terraform_state"

Now within your "service_a" configuration you can retrieve the outputs from that configuration. Here's how that's written in 0.6 and earlier:

resource "terraform_remote_state" "infrastructure" {
    backend = "s3"
    config {
      bucket = "example-bucket"
      key    = "infrastructure/terraform_state"
    }
}

resource "aws_ecs_service" "example" {
  cluster = "${terraform_remote_state.infrastructure.output.ecs_cluster_id}"
  # ...
}

Here's how it'll look in 0.7, once it's released:

data "terraform_remote_state" "infrastructure" {
    backend = "s3"
    config {
      bucket = "example-bucket"
      key    = "infrastructure/terraform_state"
    }
}

resource "aws_ecs_service" "example" {
  cluster = "${data.terraform_remote_state.infrastructure.ecs_cluster_id}"
  # ...
}

I totally agree with you that the remote state page should do a better job of describing the usage and patterns around remote state. In the mean time, I hope the above helps clarify the role of the terraform_remote_state resource in particular.

apparentlymart on 25 Jul 2016

👍4 ❤1

This really does help a lot. Thanks for making me feel less crazy! <3

grepory on 25 Jul 2016

Hi, I've tried the workflow described above by @apparentlymart , however, I'm getting errors like the following:

* Resource 'data.terraform_remote_state.vpc' does not have attribute 'default_vpc_id' for variable 'data.terraform_remote_state.vpc.default_vpc_id'

The values are correctly stored in s3, any ideas?

brentmurphy on 6 Dec 2016

👍11

I'm seeing similar issues to @brentmurphy using Azure remote state data source.

data "terraform_remote_state" "rs" {
  backend = "azure"

  config {
    storage_account_name = "<snip>"
    container_name       = "terraform-state"
    key                  = "dev.terraform.tfstate"
  }
}

donaldgray on 1 Mar 2017

@donaldgray @brentmurphy this solved it for us https://github.com/hashicorp/terraform/issues/8853

jordzn on 13 Mar 2017

@apparentlymart thanks for detail explanation, your above example is working fine when a resource is accessing to data.terraform_remote_state.infrastructure.ecs_cluster_id}, but I can't make it working for module like:

data "terraform_remote_state" "vpc" {
  backend = "s3"

  config {
    access_key = "xxxxxx"
    secret_key = "xxxxxxxxx"
    bucket     = "stage-terraform-remote-state-storage"
    key        = "vpc/us-east1/terraform.tfstate"
    region     = "us-east1"
    encrypt    = "true"
    kms_key_id = "xxxxxxxxx"
  }
}

module "rds" {
  source = "[email protected]:xxxxxxxx/terraform-aws-modules.git//rds?ref=v0.0.4"

  # RDS Instance Inputs, should be loaded from terraform.tfvars file
  rds_instance_identifier = "${var.rds_instance_identifier}"
  rds_allocated_storage   = "${var.rds_allocated_storage}"
  rds_engine_type         = "${var.rds_engine_type}"
  rds_instance_class      = "${var.rds_instance_class}"
  rds_engine_version      = "${var.rds_engine_version}"
  db_parameter_group      = "${var.db_parameter_group}"

  database_name     = "${var.database_name}"
  database_user     = "${var.database_user}"
  database_password = "${var.database_password}"
  database_port     = "${var.database_port}"

  # RD network Inputs, the subnest and vpc_id should be loaded from remote state
  subnets      = ["${data.terraform_remote_state.vpc.private_subnets_ids.*.id}"]
  rds_vpc_id   = "${data.terraform_remote_state.vpc.vpc_id}"
  private_cidr = "${var.private_cidr}"
}

configuration init is pull my module with the exact version

terraform init -backend-config=config.tfvars
Downloading modules (if any)...
Get: git::ssh://[email protected]/xxxxxx/terraform-aws-modules.git?ref=v0.0.4
Initializing the backend...


Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your environment. If you forget, other
commands will detect it and remind you to do so if necessary.

but cannot access to `data.terraform_remote_state.vpc`

terraform plan      
var.rds_vpc_id
  VPC to connect to, used for a security group

  Enter a value:
  ```

PS:
```console
terraform version
Terraform v0.9.2

cc @mo-mughrabi @youssefNM

adilnaimi on 29 Mar 2017

@adilnaimi as recently added to Root Outputs Only on https://www.terraform.io/docs/providers/terraform/d/remote_state.html, any module outputs need to be explicitly exposed

mwhipple on 29 Mar 2017

👍1

@mwhipple thank you for looking into my case.

My VPC module already exposing output to outputs.tf

this is my terraform directories structure:

.
├── rds
│   ├── config.tfvars
│   ├── main.tf
│   ├── outputs.tf
│   ├── terraform.tfvars
│   └── variables.tf
└── vpc
    ├── config.tfvars
    ├── main.tf
    ├── outputs.tf
    ├── README.md
    ├── terraform.tfvars
    └── variables.tf

My VPC is exposing outputs as:

cat vpc/outputs.tf
output "private_subnets_ids" {
  value = ["${module.vpc.private_subnets}"]
}

output "database_subnets_ids" {
  value = ["${module.vpc.database_subnets}"]
}

output "public_subnets" {
  value = ["${module.vpc.public_subnets}"]
}

output "vpc_id" {
  value = "${module.vpc.vpc_id}"
}

and terraform is correctly writing output to s3 tfstate bucket:

My RDS module is looking for the remote state from the right s3 location:

data "terraform_remote_state" "vpc" {
  backend = "s3"

  config {
    access_key = "xxxxxx"
    secret_key = "xxxxxxxxx"
    bucket     = "stage-terraform-remote-state-storage"
    key        = "vpc/us-east1/terraform.tfstate"
    region     = "us-east1"
    encrypt    = "true"
    kms_key_id = "xxxxxxxxx"
  }
}

I'm confused if I'm using the remote_state+s3-backend+module in the right way

adilnaimi on 30 Mar 2017

👍1

my previous issue was fixed, I had to specify the variables subnets in variables.tf

adilnaimi on 1 Apr 2017

Hi folks, the page @grepory links to is a broken link now, with https://www.terraform.io/docs/state/remote.html as the current intro to remote state. I think the docs cover a lot of the points described above now: https://www.terraform.io/docs/backends/state.html in particular.

Having a diagram of the workflow which includes the local cache and describes when locks are acquired and released would be pretty useful still - thinking of questions like "so how do I recover safely if I hit an issue?" and "how does remote state affect performance?" would be easier to get a handle on with a more granular look into how and when TF interacts with the state.

The questions I was investigating when my googling brought up this issue:

is TF matching a checksum against the remote state file or pulling it all?
Is there a backend-independent way I can check when a lock was last acquired?
are there cases where Terraform might fail to update the remote state if Terraform apply fails or is killed? how do I untangle those?

I'm informed by this excellent Charity Majors post - https://charity.wtf/2016/03/30/terraform-vpc-and-why-you-want-a-tfstate-file-per-env/ , as I'm sure a lot of TF newcomers are. That post's almost two years old now - are there newer resources about debugging and managing complexity in TF state to add to the community shared wisdom?

kwerey on 13 Feb 2018

Hi all! Sorry for the long silence here.

This issue is discussing the "remote state" feature as it existed in Terraform 0.8 and earlier. In Terraform 0.9 this feature was reorganized into the idea of "backends", and the various terraform remote subcommands were replaced with configuration settings and the single terraform init command.

For that reason, we're going to close this issue out now. If you have questions about the modern incarnation of Terraform backends, please feel free to ask them in the community forum.

teamterraform on 20 Jul 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.