Terraform: 0.12 TypeSet Preferring Configuration Value Over Refreshed Value in State

Created on 27 Jan 2019  ยท  5Comments  ยท  Source: hashicorp/terraform

Terraform Version

# go.mod
github.com/hashicorp/terraform v0.12.0-alpha4.0.20190124012834-1592b4aa67ad

Terraform Configuration Files

resource "aws_vpc" "test" {
  cidr_block = "10.3.0.0/16"
}

resource "aws_network_acl" "test" {
  vpc_id = "${aws_vpc.test.id}"

  egress {
    protocol   = "tcp"
    rule_no    = 2
    action     = "allow"
    cidr_block =  "10.3.0.0/18"
    from_port  = 443
    to_port    = 443
  }

  ingress {
    protocol   = "tcp"
    rule_no    = 1
    action     = "allow"
    cidr_block =  "10.3.0.0/18"
    from_port  = 80
    to_port    = 80
  }
}

Debug Output

OSS TeamCity logs:

Expected Behavior

Acceptance tests should pass as they previously did in Terraform 0.11.

Actual Behavior

In Terraform 0.12 the acceptance tests show the state having a different value based on the configuration instead of based on the refresh.

Affected resources are attempting to consistently hash the TypeSet based on some form of normalization of the value. In the aws_network_acl resource, it coerces hashing the "numeric" (number as a string) protocol regardless of whether a protocol number or protocol name is passed in, thereby using the lack of hash difference to suppress a protocol attribute difference in the plan. For aws_db_parameter_group its always lowercasing the value, etc.

--- FAIL: TestAccAWSNetworkAcl_importBasic (10.11s)
    testing.go:568: Step 1 error: ImportStateVerify attributes not equivalent. Difference is shown below. Top is actual, bottom is expected.

        (map[string]string) (len=2) {
         (string) (len=26) "egress.3111164687.protocol": (string) (len=1) "6",
         (string) (len=27) "ingress.1871939009.protocol": (string) (len=1) "6"
        }


        (map[string]string) (len=2) {
         (string) (len=26) "egress.3111164687.protocol": (string) (len=3) "tcp",
         (string) (len=27) "ingress.1871939009.protocol": (string) (len=3) "tcp"
        }

--- FAIL: TestAccAWSNetworkAcl_EgressAndIngressRules (6.31s)
    testing.go:568: Step 0 error: Check failed: 2 errors occurred:
            * Check 2/14 error: aws_network_acl.bar: Attribute 'ingress.1871939009.protocol' expected "6", got "tcp"
            * Check 8/14 error: aws_network_acl.bar: Attribute 'egress.3111164687.protocol' expected "6", got "tcp"

There are a few other cases of similarly behaving acceptance tests:

--- FAIL: TestAccAWSDBParameterGroup_limit (4.37s)
    testing.go:568: Step 0 error: Check failed: Check 57/87 error: aws_db_parameter_group.large: Attribute 'parameter.1937131004.value' expected "on", got "ON"

--- FAIL: TestAccAWSELB_basic (4.26s)
    testing.go:568: Step 0 error: Check failed: Check 12/13 error: aws_elb.bar: Attribute 'listener.206423021.lb_protocol' expected "http", got "HttP"

--- FAIL: TestAccAWSELB_listener (4.93s)
    testing.go:568: Step 0 error: Check failed: Check 6/6 error: aws_elb.bar: Attribute 'listener.206423021.lb_protocol' expected "http", got "HttP"

--- FAIL: TestAccAWSNetworkAcl_OnlyIngressRules_basic (8.88s)
    testing.go:568: Step 0 error: Check failed: 1 error occurred:
            * Check 2/8 error: aws_network_acl.foos: Attribute 'ingress.4245812720.protocol' expected "6", got "tcp"

--- FAIL: TestAccAWSNetworkAcl_OnlyIngressRules_update (8.45s)
    testing.go:568: Step 0 error: Check failed: 1 error occurred:
            * Check 3/11 error: aws_network_acl.foos: Attribute 'ingress.401088754.protocol' expected "6", got "tcp"

--- FAIL: TestAccAWSNetworkAcl_ipv6Rules (13.39s)
    testing.go:568: Step 0 error: Check failed: Check 3/8 error: aws_network_acl.foos: Attribute 'ingress.1976110835.protocol' expected "6", got "tcp"

Steps to Reproduce

Run any of the aforementioned acceptance tests in a 0.12 upgraded branch in the AWS provider.

Additional Context

TypeSet is not explicitly indexable in practice (if I understand correctly) so while that may reduce the impact of this new difference generally speaking, the new behavior makes it harder/not possible to acceptance test these attributes via ImportStateVerify without ignoring them and needing to test the imported state in another fashion.

bug provider-sdk

Most helpful comment

Further to James's note, I just wanted to also point out for posterity that:

  • #20236 introduced a check that the value after apply agrees with what was predicted during plan, but includes an opt-out for the legacy SDK since it cannot make such a guarantee due to its legacy type system not exactly matching the new Terraform type system.
  • The follow up change #20282 also adds some similar rules -- and a similar opt-out -- for making sure the plan agrees with configuration.
  • #20282 also includes an adjustment to the opt-out behavior for both: we now run the safety check even when the provider has opted out, but in the opt-out case we write the problem to the log with level WARNING, instead of returning an error.

Because of the last item there, we will should now be able to use the additional details in TRACE logs gathered with bug reports to more easily trace the new equivalent of "diffs didn't match during apply" (which is now phrased as "Provider produced inconsistent final plan" for v0.12) back to the upstream resource inconsistencies that caused them.

Here's an example of what sort of log line is generated when the plan disagrees with the configuration in the legacy SDK:

2019/02/12 11:58:30 [WARN] Provider "test" produced an invalid plan for test_resource_diff_suppress.foo, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .val_to_upper: planned value cty.StringVal("FOO") does not match config value cty.StringVal("foo")

The message for an apply-time inconsistency (final result disagrees with plan) produces a similarly-shaped but differently-worded log message:

2019/02/12 11:58:30 [WARN] Provider "test" produced an unexpected new value for test_resource_diff_suppress.foo, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .network: was cty.StringVal("default"), but now cty.StringVal("modified")

If provider developers get error reports with the "Provider produced inconsistent final plan" message post-0.12 which seem to report changes that cannot be explained by the provider's own behavior, I'd advise looking for either of the above message types in the logs (search for "produced an invalid plan" or "produced an unexpected new value") to see if there are reports about an upstream resource changing a result value in a way that seems incorrect. We must rely on human intuition to separate these issues from the more common "old SDK is weird" sort of issues, but at least we now have a path to possibly tracing such errors back to their root cause, rather than them just accumulating on GitHub.

All 5 comments

Potentially related: #20183

The issue here seems to be resources changing their values during apply, rather than refresh.
The values presented here are the values that the resource planned, which were not marked as unknown, so should have been what was applied.

Because of conceptual issues with sets in general, the shims have to prefer wholly-known planned values if there is a discrepancy. If the shims were to pass through the incorrect values (which can still happen in many cases), terraform should report this as an error about the unexpected change.

Just to add some clarification before closing this;

Providers must "stick to their plan". Besides the fact that a resource applying a different value than what was planned breaks the contract between core and the provider, it causes cascading errors in resource's dependencies. If a value is returned from plan, terraform expects to be able to use that value in any resource that references it without waiting for it to be applied. If that value later changes, the incorrect value was either planned or applied to those dependencies, causing an "unexpected diff during apply" error, or a non-empty plan after apply.

While we can't catch all these errors while still using the legacy sdk, core will attempt to notify the users appropriately to better avoid these situation.

Further to James's note, I just wanted to also point out for posterity that:

  • #20236 introduced a check that the value after apply agrees with what was predicted during plan, but includes an opt-out for the legacy SDK since it cannot make such a guarantee due to its legacy type system not exactly matching the new Terraform type system.
  • The follow up change #20282 also adds some similar rules -- and a similar opt-out -- for making sure the plan agrees with configuration.
  • #20282 also includes an adjustment to the opt-out behavior for both: we now run the safety check even when the provider has opted out, but in the opt-out case we write the problem to the log with level WARNING, instead of returning an error.

Because of the last item there, we will should now be able to use the additional details in TRACE logs gathered with bug reports to more easily trace the new equivalent of "diffs didn't match during apply" (which is now phrased as "Provider produced inconsistent final plan" for v0.12) back to the upstream resource inconsistencies that caused them.

Here's an example of what sort of log line is generated when the plan disagrees with the configuration in the legacy SDK:

2019/02/12 11:58:30 [WARN] Provider "test" produced an invalid plan for test_resource_diff_suppress.foo, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .val_to_upper: planned value cty.StringVal("FOO") does not match config value cty.StringVal("foo")

The message for an apply-time inconsistency (final result disagrees with plan) produces a similarly-shaped but differently-worded log message:

2019/02/12 11:58:30 [WARN] Provider "test" produced an unexpected new value for test_resource_diff_suppress.foo, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .network: was cty.StringVal("default"), but now cty.StringVal("modified")

If provider developers get error reports with the "Provider produced inconsistent final plan" message post-0.12 which seem to report changes that cannot be explained by the provider's own behavior, I'd advise looking for either of the above message types in the logs (search for "produced an invalid plan" or "produced an unexpected new value") to see if there are reports about an upstream resource changing a result value in a way that seems incorrect. We must rely on human intuition to separate these issues from the more common "old SDK is weird" sort of issues, but at least we now have a path to possibly tracing such errors back to their root cause, rather than them just accumulating on GitHub.

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings