0.11.13
I'm using terraform inside a bash script to deploy a complex demo environment. After terraform apply other script steps are executed, but it should exit if there are any errors with terraform apply. If there is an error with the deployment there is no change on the exit code created by apply it always will return 0.
I need a feature similar to the terraform plan --detailed-exitcode, to return a detailed exit code when the command exits. When provided, this argument changes the exit codes and their meanings to provide more granular information about what the resulting plan contains:
0 = Succeeded apply
1 = Error
Right now i'm using the workaround to check for any "Error applying plan" inside the output:
terraform apply | tee /dev/tty | ( ! grep "Error applying plan" )
An optional parameter --detailed-exitcode for terraform apply would be nice.
https://devops.stackexchange.com/questions/871/terraform-apply-exit-code-on-error
-->
Hi @automatecloud,
The intended behavior is to return a Nonzero exit code on error by default, and we have tests verifying that behavior for several situations but it seems like you've found a case where it isn't working. That's a bug, not a missing feature.
Can you share some more details on exactly how you are running Terraform (the automation code you have around it, how the plan and apply are connected, etc) and then we can try to figure out what is different in your case from the existing test cases?
Thanks!
The challenge i face is if that sometime the resource is already available. I'm managing DNS entries in GCP and sometimes they were already added by someone manually. If you use terraform apply it exits with a "Error applying plan". My script should stop there as i can't guarantee the DNS entry is configured with the right cname for example. If it doesn't the rest of it could fail. So i'm using the workaround to grep the logs for the "Error applying plan" entry. That is a good workaround, but terraform apply should be able to return the right exit code if i need it.
terraform refresh and terraform plan didn't help as they do not find the GCP resources.
Hi @automatecloud,
The thing that is strange to me is that Terraform should _already_ be generating a nonzero exit code in that situation, but apparently in your situation it is not. I'd like to understand more about what you're doing so I can understand how it differs from the various automated tests we have in Terraform that are verifying that errors during terraform apply produce a non-zero exit code.
In other words, Terraform should already be behaving the way you describe, so I'd like to try to understand why it isn't in your case.
It's easy to reproduce. You only need to create a resource manually. Let's say you plan to manage a DNS record. Add them manually inside GCP. After that defining the resource in a terraform file:
resource "google_dns_record_set" "master" {
name = "example.andreas.com"
provider = "google"
project = "myproject"
type = "A"
ttl = 60
managed_zone = "andreas-lab"
rrdatas = ["10.10.10.10"]
}
and apply it. It will show you the following error
Error applying plan:
1 error(s) occurred:
google_dns_record_set.master: 1 error(s) occurred:
google_dns_record_set.master: Error creating DNS RecordSet: googleapi: Error 409: The resource 'entity.change.additions[0]' named 'example.andreas.com. (A)' already exists, alreadyExists
The exit code of terraform is 0 and not a non-zero.
If a terraform apply is not successful it should be always non-zero.
You can also find another example of that failed to apply because of existing resources here https://github.com/hashicorp/terraform/issues/20344
I am seeing similar behavior with Azure Key Vault.
I run a terraform plan -out terraform.plan and see 12 resources to be created.
I then run terraform apply -auto-approve terraform.plan and error out creating the Key vault with:
Error: Error applying plan:
1 error(s) occurred:
* azurerm_key_vault.client-keyvault: 1 error(s) occurred:
* azurerm_key_vault.client-keyvault: Error updating Key Vault "xxxxx-keyvault" (Resource Group "tf_int_201949012204_4e11"): keyvault.VaultsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="An invalid value was provided for 'accessPolicies'."
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
The actual cause for the error is a problem on my end with a bad variable but TF returns a 0 exit code during the apply so this isn't being caught in CI.
IMHO any error during the apply should return a non-zero exit code.
for me, in the version 0.11.13 the param detailed-exitcode doesn't work.
the commands init, and apply should return non zero exit code.
Sorry, my fault, plan and apply return exit code as expected. but would be great if init can have one
@apparentlymart, I'd like to take this one on.
Have you any advice on where to get started, please?
Not sure if this is still an issue or not. Tried to replicate using AWS route53 by creating a record that already exists and the exit code return 1. Maybe is specific to a provider? Tried replicating using v12.7
We see a similar issue during terraform apply (v0.12.3):
Error: Error modifying Target Group Attributes: AccessDenied: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/ZZZZ-role/YYYY is not authorized to perform: elasticloadbalancing:ModifyTargetGroupAttributes on resource: arn:aws:elasticloadbalancing:us-east-2:XXXXXXXXXXXX:targetgroup/AAAA/b8f487cdb9069c7b
status code: 403, request id: 38eeb95d-c901-11e9-b5e4-d1fbf73e5627
This is returning a 0 error code, preventing our automation from being able to notify us of deployment failures.
We've also experienced this with the vault_database_secret_backend_role resources (e.g. when the database name/schema is a reserved word or the password used is wrong). This is fairly major pain because the problems are not detected in the CI/CD pipeline.
@apparentlymart, any way that I can help?
Hello,
Can we get an update on this issue? This continues to be a problem 10 months on, and is making integration with CI/CD more difficult than necessary.
Thank you!
Having the same problem here, exiting 0 when it fails to create a necessary S3 bucket due to the infamous
Error: Error creating S3 bucket: Error creating S3 bucket sp-terraform-s3-state-test
retrying: OperationAborted: A conflicting conditional operation is currently in progress against this resource. Please try again.
status code: 409, request id: E3ACB8F77041E863, host id: qCXGnd/ZkZ7ymm25s3v8fiyABmhrmLcLoOGpeNTJ57pgLESQhljeKNHoI93gW8DjI78l5QcgkEk=
on ../module/s3.tf line 1, in resource "aws_s3_bucket" "s3_terraform_state":
1: resource "aws_s3_bucket" "s3_terraform_state" {
Releasing state lock. This may take a few moments...
me@myhost ~/ws/terraform_sre/Modules/state_management/tf_state_test
% echo $?
0
Same issue here.
Same issue here - plus, sometimes a successful run returns a non-zero exit code
Adding my 2 cents that it's happening for me on Azure resources for v.12.20
Same problem in context of ZSH and Makefile, running terraform v.12.20. See big red errors but my make task completes successfully.
Same issue found in AzureRM with error results from existing diagnostic log settings against network security groups
Issue still present in 0.13 with both AWS ( insufficient permission ) and PagerDuty provider ( subscription level errors )
hey folks - I see a lot of people saying that they're encountering similar successful exit codes on errors. I'd like to explain what help we need in order to move this forward. The practical thing I need to translate this into action is very concrete examples, e.g:
To work on or even prioritize this for engineers, I have to be able to run each case on my workstation without inventing any details in order to be confident we're seeing the same behavior. As-is, it's not clear to me whether these are all the same issue, or if there are multiple different issues (e.g. bugs in different providers), and so without details on reproductions, I'm stuck and can't plan a fix.
In order to figure out whether these are the same or different issues, can you please write up reproduction cases such that I can copy-paste it and run them locally? Ideally, this would use the null resource provider rather than a real provider in order to minimize external dependencies, but I understand that some of these cases will require real cloud providers. The key is that I need to be able to copy-paste and run this, so the easier to set up, the easier it will be for me to reproduce and prioritize.
Most helpful comment
We see a similar issue during terraform apply (v0.12.3):
Error: Error modifying Target Group Attributes: AccessDenied: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/ZZZZ-role/YYYY is not authorized to perform: elasticloadbalancing:ModifyTargetGroupAttributes on resource: arn:aws:elasticloadbalancing:us-east-2:XXXXXXXXXXXX:targetgroup/AAAA/b8f487cdb9069c7b
status code: 403, request id: 38eeb95d-c901-11e9-b5e4-d1fbf73e5627
This is returning a 0 error code, preventing our automation from being able to notify us of deployment failures.