I am envisioning rollbacks to be as simple as reapplying changes from old snapshots/deployment records. If you use Git for versioning of deployment records, then it is pretty straightforward.
For example, imagine we have tagged a stable release as v1.2:
$ git tag v1.2
$ pulumi update prod
... 1.2 is in production now ...
Now later on, we've done some things, and horked our environment:
$ git commit
$ pulumi update prod
... oops! ...
To get back to a sane state:
$ git checkout v1.2
$ pulumi update prod
... 1.2 is back in production, we're good ...
At the very least, we need to document the "playbook" for how to go about this. And it's not quite as simple as shown above, because the rollback will need the old deployment record (which will have been obliterated by the git checkout v1.2 -- unless we are using a Pulumi deployment server). Perhaps this is left as-is, since it's plumbing and you need to know what you're doing.
To make this nicer, we might consider a rollback command which is just porcelain on top:
$ pulumi deploy rollback prod v1.2
This effectively would do the same as above, but it automatically smuggles the current targets file so that you don't need to do anything manual. Pretty cool demo too!
@mmdriley As you're designing the deployment service, we should think about making sure we retain enough information to support rollback.
We frequently get asked by potential customers about whether Pulumi supports rollback. The short answer is "yes", since you can always accomplish rollback by rolling forward to a prior program plus configuration. The longer answer is "not what you probably meant" -- an opportunity to do better.
There are two primary scenarios in which I've heard people want rollback.
The first is the ability to trigger a rollback when something has gone wrong. Particularly with a history command, and the ability to export specific checkpoints from it, it would seem we have some of the pieces needed to do this. This calls into question, however, whether it's valid to use a goal state from a checkpoint rather than running a program. It could very well be that our answer is "re-deploy the code," but we will need to understand whether this is an adequate answer (certainly when I've told some customers this -- especially those not using a Git-oriented workflow -- it was not).
The second is the ability to automatically rollback changes for a partially applied update. This would act more like AWS CloudFormation. Presumably, given our system, we'd need to rewind back to the goal state as it existed just prior to performing an update. And I am skeptical that we should do this by default, so it would probably materialize as a flag such as --rollback-on-failure or somesuch.
The two are related, and it would be wonderful if they leveraged similar mechanisms in how they work.
Hey there! My team is evaluating whether/when to move to Pulumi from TF, and one of the things I'm interested in that Terraform can't currently do is being able to rollback safely from a failed rollout. The reason I think it's compelling here is that the direct integration between Pulumi and a proper coding language makes it much more interesting to me to go ahead and write the rollback code myself as part of a custom provider. If as part of the provider definition you had the option of defining a callback function from each of your CRUD definitions that could get passed the little bit of state necessary to perform a rollback (helping with the less straightforward cases like undoing a delete), that would be excellent.
Longer term and more generally, I'm interested in options to get my hands more in the weeds of how Pulumi applies the actual state once it's been evaluated. In particular, things like complex destructive resource updates with a rollover migration in the middle (obviously out of scope of this particular issue)
@joeduffy would be great to see up/down hooks (as any ORMs do with their migrations). This way anyone can inject anything that must happen if a deployment fails for each particular Component.
would be great to see up/down hooks (as any ORMs do with their migrations). This way anyone can inject anything that must happen if a deployment fails for each particular Component.
That is more or less what https://github.com/pulumi/pulumi/issues/1691 is tracking - does that look like it's in the ballpark of what you are looking for? Would be great to add any details on requirements there.
Most helpful comment
We frequently get asked by potential customers about whether Pulumi supports rollback. The short answer is "yes", since you can always accomplish rollback by rolling forward to a prior program plus configuration. The longer answer is "not what you probably meant" -- an opportunity to do better.
There are two primary scenarios in which I've heard people want rollback.
The first is the ability to trigger a rollback when something has gone wrong. Particularly with a
historycommand, and the ability to export specific checkpoints from it, it would seem we have some of the pieces needed to do this. This calls into question, however, whether it's valid to use a goal state from a checkpoint rather than running a program. It could very well be that our answer is "re-deploy the code," but we will need to understand whether this is an adequate answer (certainly when I've told some customers this -- especially those not using a Git-oriented workflow -- it was not).The second is the ability to automatically rollback changes for a partially applied update. This would act more like AWS CloudFormation. Presumably, given our system, we'd need to rewind back to the goal state as it existed just prior to performing an update. And I am skeptical that we should do this by default, so it would probably materialize as a flag such as
--rollback-on-failureor somesuch.The two are related, and it would be wonderful if they leveraged similar mechanisms in how they work.