Terraform: State migration actions driven by configuration

Created on 13 Nov 2018  路  7Comments  路  Source: hashicorp/terraform

Use-cases

Currently when terraform scripts are refactored, there is often custom state manipulation that must be done in order to avoid the destruction and recreation of resources that have been moved around, renamed, etc. These procedures must be carried out manually, and often repeated across dev/staging/production environments.

This proposal is for a method of recording these steps, so that they can be carried out across all environments in an automated, standardized fashion, rather than the current manual, ad-hoc way of doing this.

Attempted Solutions

None.

Proposal

If we consider terraform state as analogous to a database and terraform scripts as analogous to code, this problem is similar (perhaps identical) to database migrations. ORM software packages have long handled this problem by:

  1. Recording the current version in the database, as a number.
  2. Having a framework that applies migration scripts, in order, between the current version in the database and the latest version.

In our case, this would imply that we store the current version in the terraform state.

Migration scripts should be written in some dialect of HCL, and stored in a migrations directory. Migrations do not need to be monotonically increasing numbers - they just need to increase.
Migrations should be managed with the CLI.

File format

TBD

CLI

# Create a new migration script with a timestamp, eg "1542068899-my-migration.tf"
terraform state generate migration my-migration 

# Migrate to the latest version. 
terraform state migrate

# Migrate to a specific version
terraform state migrate --version 22
core enhancement

Most helpful comment

Hi @sheax0r! Thanks for describing this use-case.

We've been having some very similar discussions internally, and have some design ideas that we plan to prototype once the current v0.12.0 work is complete.

Your analogy to database migrations is a good one, and aligns with how we were thinking about the problem. In our sketches, though, the "migrations" are just included as part of the configuration and handled automatically during normal plan / apply operations. This is still just an early design sketch, so there are certainly details here that can be refined, but I'll describe the general idea here for context.

The mechanism would be driven by a special new state block in configuration, which contains blocks describing state migrations that should be applied. It might look something like this:

state {
  # "move" does a similar operation to "terraform state mv"
  move "dbf2971ad1d74715a6462aede1913920" {
    from = aws_instance.foo
    to   = module.foo.aws_instance.main
  }

  # "forget" does a similar operation to "terraform state rm"
  forget "d71db678ec094edeb8f8db25bc843f83" {
    from = aws_instance.baz
  }
}

Each nested block inside the state block is an instruction to Terraform to treat the mentioned from and to resources differently during planning. In this particular case:

  • If the plan would normally include destroying aws_instance.foo and creating module.foo.aws_instance.main, instead generate a single action to migrate the existing state from the former to the latter and then update module.foo.aws_instance.main in-place for any further changes.
  • If the plan would normally include destroying aws_instance.baz then instead plan to drop it from the state immediately without touching the remote object at all.

Terraform would also check during plan to ensure that the rest of the configuration is consistent with any outstanding state actions. If the above move block were present but either aws_instance.foo still exists in configuration or module.foo.aws_instance.main _does not_ exist in configuration, that'd be an error.

The uuids included in the block headers are the mechanism for tracking whether each state action has been applied yet. After applying a plan that takes both of these actions, Terraform would include a new section in the state:

{
  ...
  "state_actions": [
    "dbf2971ad1d74715a6462aede1913920",
    "d71db678ec094edeb8f8db25bc843f83"
  ],
  ...
}

On subsequent plans, Terraform will disregard any state actions whose ids are already recorded in the state, assuming that they've already been applied. If the same configuration is applied multiple times (e.g. for different environments) then each will have its own state_ops and thus they will deal with the state migrations once each.

Once you (the operator) know that you've applied the state actions to all existing states, you can safely remove them from the configuration to clean up. On the next apply, Terraform will detect that the state actions recorded in the state are no longer present in configuration and clean those up too, so that they don't accumulate endlessly.


The other detail you touched on in your proposal, @sheax0r, is that database migration scripts will usually have a strict ordering, because in general database migrations are not idempotent. The state actions I showed here _are_ idempotent, so it's not necessary to assign a strict ordering to them, and instead we can use dependencies to order them in a similar way to how Terraform orders "normal" actions. For example, if we take the above configuration and adjust it so that we're moving the same resource twice:

state {
  move "dbf2971ad1d74715a6462aede1913920" {
    from = aws_instance.foo
    to   = module.foo.aws_instance.main
  }

  move "d71db678ec094edeb8f8db25bc843f83" {
    from = module.foo.aws_instance.main
    to   = module.foo.aws_instance.other
  }
}

...Terraform can infer that the second action builds on the first by observing that the from of the second matches the to of the first. In this case Terraform might just optimize this into a single move from aws_instance.foo to module.foo.aws_instance.other, skipping module.foo.aws_instance.main altogether.


In practice I expect that the state actions would not be hand-written by the user but would instead be generated by new Terraform commands that could then generate the uuid for each one and ensure that the operation makes sense (e.g. you can't move an aws_instance to an aws_security_group).

One particularly interesting possibility is a potential new terraform rename command:

terraform rename aws_instance.foo bar

Given an address of a resource in the _root_ module, this command could both generate the necessary move action in configuration _and_ rewrite the rest of the configuration so that all references to aws_instance.foo are replaced with aws_instance.bar. This is one of the advantages of Terraform having a declarative configuration language: we can make such updates to the configuration safe in the knowledge that the overall meaning of the configuration won't change.

This process of updating configuration and state together unfortunately does not generalize to operations _between_ modules, since Terraform cannot atomically modify child modules and the root module unless they are all in the same repository. But the rest of the state actions concept _does_ generalize for cross-module actions, if we treat the from and to addresses as relative to the module where they are defined.


As noted above, this is just a design sketch for now and details may shift during subsequent prototyping and implementation. Since our current focus as I write this is on the configuration language changes for v0.12 we're not actively working on this idea but hope to pick it up again for a later release.

Thanks again for sharing this use-case and design proposal!

All 7 comments

Hi @sheax0r! Thanks for describing this use-case.

We've been having some very similar discussions internally, and have some design ideas that we plan to prototype once the current v0.12.0 work is complete.

Your analogy to database migrations is a good one, and aligns with how we were thinking about the problem. In our sketches, though, the "migrations" are just included as part of the configuration and handled automatically during normal plan / apply operations. This is still just an early design sketch, so there are certainly details here that can be refined, but I'll describe the general idea here for context.

The mechanism would be driven by a special new state block in configuration, which contains blocks describing state migrations that should be applied. It might look something like this:

state {
  # "move" does a similar operation to "terraform state mv"
  move "dbf2971ad1d74715a6462aede1913920" {
    from = aws_instance.foo
    to   = module.foo.aws_instance.main
  }

  # "forget" does a similar operation to "terraform state rm"
  forget "d71db678ec094edeb8f8db25bc843f83" {
    from = aws_instance.baz
  }
}

Each nested block inside the state block is an instruction to Terraform to treat the mentioned from and to resources differently during planning. In this particular case:

  • If the plan would normally include destroying aws_instance.foo and creating module.foo.aws_instance.main, instead generate a single action to migrate the existing state from the former to the latter and then update module.foo.aws_instance.main in-place for any further changes.
  • If the plan would normally include destroying aws_instance.baz then instead plan to drop it from the state immediately without touching the remote object at all.

Terraform would also check during plan to ensure that the rest of the configuration is consistent with any outstanding state actions. If the above move block were present but either aws_instance.foo still exists in configuration or module.foo.aws_instance.main _does not_ exist in configuration, that'd be an error.

The uuids included in the block headers are the mechanism for tracking whether each state action has been applied yet. After applying a plan that takes both of these actions, Terraform would include a new section in the state:

{
  ...
  "state_actions": [
    "dbf2971ad1d74715a6462aede1913920",
    "d71db678ec094edeb8f8db25bc843f83"
  ],
  ...
}

On subsequent plans, Terraform will disregard any state actions whose ids are already recorded in the state, assuming that they've already been applied. If the same configuration is applied multiple times (e.g. for different environments) then each will have its own state_ops and thus they will deal with the state migrations once each.

Once you (the operator) know that you've applied the state actions to all existing states, you can safely remove them from the configuration to clean up. On the next apply, Terraform will detect that the state actions recorded in the state are no longer present in configuration and clean those up too, so that they don't accumulate endlessly.


The other detail you touched on in your proposal, @sheax0r, is that database migration scripts will usually have a strict ordering, because in general database migrations are not idempotent. The state actions I showed here _are_ idempotent, so it's not necessary to assign a strict ordering to them, and instead we can use dependencies to order them in a similar way to how Terraform orders "normal" actions. For example, if we take the above configuration and adjust it so that we're moving the same resource twice:

state {
  move "dbf2971ad1d74715a6462aede1913920" {
    from = aws_instance.foo
    to   = module.foo.aws_instance.main
  }

  move "d71db678ec094edeb8f8db25bc843f83" {
    from = module.foo.aws_instance.main
    to   = module.foo.aws_instance.other
  }
}

...Terraform can infer that the second action builds on the first by observing that the from of the second matches the to of the first. In this case Terraform might just optimize this into a single move from aws_instance.foo to module.foo.aws_instance.other, skipping module.foo.aws_instance.main altogether.


In practice I expect that the state actions would not be hand-written by the user but would instead be generated by new Terraform commands that could then generate the uuid for each one and ensure that the operation makes sense (e.g. you can't move an aws_instance to an aws_security_group).

One particularly interesting possibility is a potential new terraform rename command:

terraform rename aws_instance.foo bar

Given an address of a resource in the _root_ module, this command could both generate the necessary move action in configuration _and_ rewrite the rest of the configuration so that all references to aws_instance.foo are replaced with aws_instance.bar. This is one of the advantages of Terraform having a declarative configuration language: we can make such updates to the configuration safe in the knowledge that the overall meaning of the configuration won't change.

This process of updating configuration and state together unfortunately does not generalize to operations _between_ modules, since Terraform cannot atomically modify child modules and the root module unless they are all in the same repository. But the rest of the state actions concept _does_ generalize for cross-module actions, if we treat the from and to addresses as relative to the module where they are defined.


As noted above, this is just a design sketch for now and details may shift during subsequent prototyping and implementation. Since our current focus as I write this is on the configuration language changes for v0.12 we're not actively working on this idea but hope to pick it up again for a later release.

Thanks again for sharing this use-case and design proposal!

@apparentlymart awesome, thank you for the reply and the in-depth explanation of where you're going with this. You've clearly given it alot more thought than the original back-of-the-napkin sketch I presented here, I'm excited to see where this goes!

I'll close this issue. Cheers!

@apparentlymart Actually I'm reopening; I shall leave it to you do as you wish.

Thanks @sheax0r! Yes, let's leave this issue open to represent the use-case, and then once we're ready to do more design work and prototyping here we can post updates.

In the mean time, if anyone else comes across this issue and would like to represent that they'd like the feature, please add a :+1: reaction to the initial comment (not _this_ comment) since we use the counts of those as an input to prioritization of work. (Posting comments saying "me too" or "+1", on the other hand, just creates notification noise for those who are already watching the issue.)

Terraform can infer that the second action builds on the first by observing that the from of the second matches the to of the first.

Is this inference necessarily correct? If the blocks were swapped I would expect this to preserve the original module.foo.aws_instance.main as module.foo.aws_instance.other. With dependency-based ordering, how could I achieve that?

Great idea! Is it a work in progress or still waiting for prioritisation?
Would it make sense to implement it is a custom terraform provider? If at all possible..

FWIW, https://github.com/minamijoyo/tfmigrate try to mitigate this as an external tool for now. It would be nice to see this kind of feature supported out of the box by Terraform. This will help modules developers to provide an upgrade path to users. This is actually a real pain for us. We struggle with it in all terraform-aws-modules.

@apparentlymart or someone else, any chance to get Hashicorp attention on this issue ? To me there is no way to Terraform v1 without something to deal with migrations.

Was this page helpful?
0 / 5 - 0 ratings