terraform apply returning successful exit code on error sometimes

Created on 13 Mar 2019 · 20Comments · Source: hashicorp/terraform

Current Terraform Version

0.11.13

Use-cases

I'm using terraform inside a bash script to deploy a complex demo environment. After terraform apply other script steps are executed, but it should exit if there are any errors with terraform apply. If there is an error with the deployment there is no change on the exit code created by apply it always will return 0.

I need a feature similar to the terraform plan --detailed-exitcode, to return a detailed exit code when the command exits. When provided, this argument changes the exit codes and their meanings to provide more granular information about what the resulting plan contains:

0 = Succeeded apply
1 = Error

Attempted Solutions

Right now i'm using the workaround to check for any "Error applying plan" inside the output:
terraform apply | tee /dev/tty | ( ! grep "Error applying plan" )

Proposal

An optional parameter --detailed-exitcode for terraform apply would be nice.

References

https://devops.stackexchange.com/questions/871/terraform-apply-exit-code-on-error

#13598
#11302

-->

bug cli v0.11 v0.12 waiting for reproduction

Source

automatecloud

👍7

Most helpful comment

We see a similar issue during terraform apply (v0.12.3):

Error: Error modifying Target Group Attributes: AccessDenied: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/ZZZZ-role/YYYY is not authorized to perform: elasticloadbalancing:ModifyTargetGroupAttributes on resource: arn:aws:elasticloadbalancing:us-east-2:XXXXXXXXXXXX:targetgroup/AAAA/b8f487cdb9069c7b
status code: 403, request id: 38eeb95d-c901-11e9-b5e4-d1fbf73e5627

This is returning a 0 error code, preventing our automation from being able to notify us of deployment failures.

ChronosWS on 27 Aug 2019

👍4

All 20 comments

Hi @automatecloud,

The intended behavior is to return a Nonzero exit code on error by default, and we have tests verifying that behavior for several situations but it seems like you've found a case where it isn't working. That's a bug, not a missing feature.

Can you share some more details on exactly how you are running Terraform (the automation code you have around it, how the plan and apply are connected, etc) and then we can try to figure out what is different in your case from the existing test cases?

Thanks!

apparentlymart on 13 Mar 2019

The challenge i face is if that sometime the resource is already available. I'm managing DNS entries in GCP and sometimes they were already added by someone manually. If you use terraform apply it exits with a "Error applying plan". My script should stop there as i can't guarantee the DNS entry is configured with the right cname for example. If it doesn't the rest of it could fail. So i'm using the workaround to grep the logs for the "Error applying plan" entry. That is a good workaround, but terraform apply should be able to return the right exit code if i need it.

terraform refresh and terraform plan didn't help as they do not find the GCP resources.

automatecloud on 13 Mar 2019

Hi @automatecloud,

The thing that is strange to me is that Terraform should _already_ be generating a nonzero exit code in that situation, but apparently in your situation it is not. I'd like to understand more about what you're doing so I can understand how it differs from the various automated tests we have in Terraform that are verifying that errors during terraform apply produce a non-zero exit code.

In other words, Terraform should already be behaving the way you describe, so I'd like to try to understand why it isn't in your case.

apparentlymart on 13 Mar 2019

It's easy to reproduce. You only need to create a resource manually. Let's say you plan to manage a DNS record. Add them manually inside GCP. After that defining the resource in a terraform file:

resource "google_dns_record_set" "master" {
name = "example.andreas.com"
provider = "google"
project = "myproject"
type = "A"
ttl = 60

managed_zone = "andreas-lab"

rrdatas = ["10.10.10.10"]
}

and apply it. It will show you the following error

Error applying plan:

1 error(s) occurred:

google_dns_record_set.master: 1 error(s) occurred:
google_dns_record_set.master: Error creating DNS RecordSet: googleapi: Error 409: The resource 'entity.change.additions[0]' named 'example.andreas.com. (A)' already exists, alreadyExists

The exit code of terraform is 0 and not a non-zero.

If a terraform apply is not successful it should be always non-zero.

You can also find another example of that failed to apply because of existing resources here https://github.com/hashicorp/terraform/issues/20344

automatecloud on 13 Mar 2019

I am seeing similar behavior with Azure Key Vault.

I run a terraform plan -out terraform.plan and see 12 resources to be created.

I then run terraform apply -auto-approve terraform.plan and error out creating the Key vault with:

Error: Error applying plan:

1 error(s) occurred:

* azurerm_key_vault.client-keyvault: 1 error(s) occurred:

* azurerm_key_vault.client-keyvault: Error updating Key Vault "xxxxx-keyvault" (Resource Group "tf_int_201949012204_4e11"): keyvault.VaultsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="An invalid value was provided for 'accessPolicies'."

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

The actual cause for the error is a problem on my end with a bad variable but TF returns a 0 exit code during the apply so this isn't being caught in CI.

IMHO any error during the apply should return a non-zero exit code.

devblackops on 2 Apr 2019

👍1

for me, in the version 0.11.13 the param detailed-exitcode doesn't work.
the commands init, and apply should return non zero exit code.

hendrixroa on 24 Apr 2019

Sorry, my fault, plan and apply return exit code as expected. but would be great if init can have one

hendrixroa on 27 Apr 2019

@apparentlymart, I'd like to take this one on.

Have you any advice on where to get started, please?

james-flynn-ie on 23 May 2019

Not sure if this is still an issue or not. Tried to replicate using AWS route53 by creating a record that already exists and the exit code return 1. Maybe is specific to a provider? Tried replicating using v12.7

tjj9020 on 15 Aug 2019

We see a similar issue during terraform apply (v0.12.3):

This is returning a 0 error code, preventing our automation from being able to notify us of deployment failures.

ChronosWS on 27 Aug 2019

👍4

We've also experienced this with the vault_database_secret_backend_role resources (e.g. when the database name/schema is a reserved word or the password used is wrong). This is fairly major pain because the problems are not detected in the CI/CD pipeline.

@apparentlymart, any way that I can help?

mattlord on 29 Aug 2019

Hello,

Can we get an update on this issue? This continues to be a problem 10 months on, and is making integration with CI/CD more difficult than necessary.

Thank you!

j4g4f on 20 Feb 2020

👍2

Having the same problem here, exiting 0 when it fails to create a necessary S3 bucket due to the infamous

Error: Error creating S3 bucket: Error creating S3 bucket sp-terraform-s3-state-test
 retrying: OperationAborted: A conflicting conditional operation is currently in progress against this resource. Please try again.
        status code: 409, request id: E3ACB8F77041E863, host id: qCXGnd/ZkZ7ymm25s3v8fiyABmhrmLcLoOGpeNTJ57pgLESQhljeKNHoI93gW8DjI78l5QcgkEk=

  on ../module/s3.tf line 1, in resource "aws_s3_bucket" "s3_terraform_state":                                                         
   1: resource "aws_s3_bucket" "s3_terraform_state" {                                                                                  


Releasing state lock. This may take a few moments...                                                                                   
me@myhost ~/ws/terraform_sre/Modules/state_management/tf_state_test                                         
 % echo $?                                                                                                                             
0

bougyman on 3 Mar 2020

👍1

Same issue here.

tg12 on 12 Mar 2020

Same issue here - plus, sometimes a successful run returns a non-zero exit code

ngodec on 20 Apr 2020

Adding my 2 cents that it's happening for me on Azure resources for v.12.20

fluffy-cakes on 20 Apr 2020

Same problem in context of ZSH and Makefile, running terraform v.12.20. See big red errors but my make task completes successfully.

jackmahoney on 28 Jul 2020

👍1

Same issue found in AzureRM with error results from existing diagnostic log settings against network security groups

daweins on 16 Aug 2020

Issue still present in 0.13 with both AWS ( insufficient permission ) and PagerDuty provider ( subscription level errors )

lukaszraczylo on 29 Aug 2020

hey folks - I see a lot of people saying that they're encountering similar successful exit codes on errors. I'd like to explain what help we need in order to move this forward. The practical thing I need to translate this into action is very concrete examples, e.g:

run the following command(s) to set up the environment
run linked terraform config using "terraform apply" on X.Y.Z terraform version
output you're seeing

To work on or even prioritize this for engineers, I have to be able to run each case on my workstation without inventing any details in order to be confident we're seeing the same behavior. As-is, it's not clear to me whether these are all the same issue, or if there are multiple different issues (e.g. bugs in different providers), and so without details on reproductions, I'm stuck and can't plan a fix.

In order to figure out whether these are the same or different issues, can you please write up reproduction cases such that I can copy-paste it and run them locally? Ideally, this would use the null resource provider rather than a real provider in order to minimize external dependencies, but I understand that some of these cases will require real cloud providers. The key is that I need to be able to copy-paste and run this, so the easier to set up, the easier it will be for me to reproduce and prioritize.