Terraform-provider-aws: resource/aws_security_group_rule crashes with provider version 1.43.1

Created on 9 Nov 2018  ·  16Comments  ·  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.11.8

  • provider.aws v1.43.1
  • provider.http v1.0.1

Terraform Configuration Files

https://github.com/terraform-providers/terraform-provider-aws/tree/master/examples/eks-getting-started

Modified to use eu-west-1 region.

Output

https://gist.github.com/yorinasub17/99184f7fcf662ce09f6fb5c7b9e7389f

Panic output

https://gist.github.com/yorinasub17/06c6be4e7199069cf204412e59ebe075

Expected Behavior

No panic.

Actual Behavior

Panic!

Steps to Reproduce

  1. terraform init
  2. terraform apply
  3. terraform destroy <= this fails

Important Factoids

  • I am using a utility that uses STS assume roles to authenticate to AWS.
  • The destroy worked when I rolled back to v1.42.0
bug crash servicec2

Most helpful comment

We have this too when trying to refresh the state of aws_security_group_rule
Are you planning to release the fix right away or are we better fixing the version of the provider to 1.43.0?

All 16 comments

Relevant portion of the crash log:

2018-11-09T13:57:47.538-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: panic: runtime error: invalid memory address or nil pointer dereference
2018-11-09T13:57:47.538-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2f5818c]
2018-11-09T13:57:47.538-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: 
2018-11-09T13:57:47.538-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: goroutine 339 [running]:
2018-11-09T13:57:47.538-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: github.com/terraform-providers/terraform-provider-aws/aws.findRuleMatch(0xc000910f00, 0xc000164a20, 0x2, 0x2, 0x1, 0x0)
2018-11-09T13:57:47.538-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-aws/aws/resource_aws_security_group_rule.go:433 +0x6c
2018-11-09T13:57:47.539-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: github.com/terraform-providers/terraform-provider-aws/aws.resourceAwsSecurityGroupRuleRead(0xc0001d22a0, 0x385e7a0, 0xc00025e300, 0xc0001d22a0, 0x0)
2018-11-09T13:57:47.539-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-aws/aws/resource_aws_security_group_rule.go:286 +0x5fd
2018-11-09T13:57:47.539-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: github.com/terraform-providers/terraform-provider-aws/vendor/github.com/hashicorp/terraform/helper/schema.(*Resource).Refresh(0xc000744850, 0xc0002dd8b0, 0x385e7a0, 0xc00025e300, 0xc0003d4a58, 0x10bc001, 0x333cd80)
2018-11-09T13:57:47.539-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-aws/vendor/github.com/hashicorp/terraform/helper/schema/resource.go:352 +0x160
2018-11-09T13:57:47.539-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4: github.com/terraform-providers/terraform-provider-aws/vendor/github.com/hashicorp/terraform/helper/schema.(*Provider).Refresh(0xc0009810a0, 0xc0002dd860, 0xc0002dd8b0, 0xc00005ac00, 0xc0000886e0, 0x7d9a6c0)
2018-11-09T13:57:47.539-0800 [DEBUG] plugin.terraform-provider-aws_v1.43.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-aws/vendor/github.com/hashicorp/terraform/helper/schema/provider.go:308 +0x92

There is a missing nil check for p.FromPort and p.ToPort in findRuleMatch(). I'll make sure this gets a covering acceptance test.

We have this too when trying to refresh the state of aws_security_group_rule
Are you planning to release the fix right away or are we better fixing the version of the provider to 1.43.0?

@LucaLanziani could you share the configuration of the rule that you think is causing the issue? Definitely want to be sure we get the right amount of test coverage in place.

I just came across this issue too. This seems to be a regression in aws provider in version = "~> 1.35", as when I specify a more limited range of version = "~> 1.35.0"the issues is gone.

@soorena776 and anyone else chiming in -- can you please provide the configuration of the rule(s) you believe might be causing this? This will ensure we're covering all the troublesome behavior in one fix. Thanks!


resource "aws_security_group_rule" "worker-node-ingress-self-sgr" {
  description              = "Allow node to communicate with each other"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = "${aws_security_group.worker-sg.id}"
  source_security_group_id = "${aws_security_group.worker-sg.id}"
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "worker-node-ingress-cluster-sgr" {
  description              = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
  from_port                = 1025
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.worker-sg.id}"
  source_security_group_id = "${aws_security_group.master-sg.id}"
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "worker-node-ingress-https-sgr" {
  description              = "Allow pods to communicate with the cluster API Server"
  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.master-sg.id}"
  source_security_group_id = "${aws_security_group.worker-sg.id}"
  to_port                  = 443
  type                     = "ingress"
}

We're seeing this for both v0.11.7 & v0.11.10.

Running the 1.43.1 gets different results for me, depending on the environment:

  • some are throwing connection is shut down / unexpected EOF once plan hits a module with a security group rule
  • some are sucessfully planing, but are calling for an unnecessary forced refresh of security group rules ... eg:
-/+ module.vpc.aws_security_group_rule.web_egress (new resource required)
      id:                       "sgrule-XXXXXX" => <computed> (forces new resource)
      cidr_blocks.#:            "1" => "1"
      cidr_blocks.0:            "0.0.0.0/0" => "0.0.0.0/0"
      from_port:                "0" => "0"
      protocol:                 "-1" => "-1"
      security_group_id:        "sg-XXXXXX" => "sg-XXXXXX"
      self:                     "false" => "false"
      source_security_group_id: "" => <computed>
      to_port:                  "0" => "65535" (forces new resource)
      type:                     "egress" => "egress"

@bflad I'll try to share my config first thing in the morning

After diving into the code path and with the help of the example configurations above, I was able to generate the crash via acceptance testing. I have submitted the bug fix pull request here: #6419

This will absolutely go out with the next provider release whether a 1.43.2 or 1.44.0.

@bflad thanks a lot for the work and the quick fix, do you have any ETA for the next release?

Sorry I should have mentioned that previously in my last response. The fix will go out Monday at the latest.

If you're in need of an immediate workaround, you can pin your AWS provider version in your configuration via:

provider "aws" {
  # ... potentially other configuration ...
  version = "1.43.0"
}

Then run terraform init.

I've also opened a technical debt issue (#6422) to add additional multiple rule acceptance test coverage for this resource, since it is one of the few resources that is heavily influenced by other configuration (e.g. other security group rules present in the same security group).

Version 1.43.2 of the AWS provider has been released with the fix for this situation. If you continue to have trouble on that updated version, please open a fresh bug report with all the relevant details.

Thanks again to all the reporters and apologies for the inconvenience the last day or so.

It works for me with the latest v1.43.2 provider.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings