Terraform-provider-aws: aws_ram_share_accepter fails to refresh state

Created on 16 Sep 2020 · 2Comments · Source: hashicorp/terraform-provider-aws

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v0.13.1
+ provider registry.terraform.io/hashicorp/aws v3.6.0
+ provider registry.terraform.io/hashicorp/time v0.5.0

Your version of Terraform is out of date! The latest version
is 0.13.2. You can update by downloading from https://www.terraform.io/downloads.html

Affected Resource(s)

aws_ram_resource_share_accepter

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.3"
    }
    time = {
      source  = "hashicorp/time"
      version = "~> 0.5.0"
    }
  }
  required_version = ">= 0.13"
}

provider "aws" {
  alias   = "dev_nsoc"
  region  = "us-east-2"
  profile = "dev-nsoc"
}

provider "aws" {
  alias   = "dev_multi"
  region  = "us-east-2"
  profile = "dev-multi"
}

data "aws_caller_identity" "dev_multi" {
  provider = aws.dev_multi
}

resource "aws_ec2_transit_gateway" "tgw" {
  provider                        = aws.dev_nsoc
  amazon_side_asn                 = 65000
  auto_accept_shared_attachments  = "enable"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  description                     = "TGW for VPNs to on-prem sites"
  dns_support                     = "enable"
  vpn_ecmp_support                = "enable"
  tags = {
    Name        = "dhagan-testing"
  }
}

resource "aws_ram_resource_share" "tgw" {
  provider                  = aws.dev_nsoc
  name                      = "dhagan-test"
  allow_external_principals = true
  tags = {
    Name        = "dhagan-test"
  }
}

resource "aws_ram_resource_association" "tgw" {
  provider           = aws.dev_nsoc
  resource_share_arn = aws_ram_resource_share.tgw.arn
  resource_arn       = aws_ec2_transit_gateway.tgw.arn
}

resource "aws_ram_principal_association" "tgw" {
  provider           = aws.dev_nsoc
  principal          = data.aws_caller_identity.dev_multi.account_id
  resource_share_arn = aws_ram_resource_share.tgw.arn
  depends_on = [
    aws_ram_resource_association.tgw
  ]
}

resource "aws_ram_resource_share_accepter" "tgw" {
  provider  = aws.dev_multi
  share_arn = aws_ram_principal_association.tgw.resource_share_arn
}

Debug Output

https://gist.github.com/dthvt/2071c4ac7efadf383ea9907a448c046e

Panic Output

Expected Behavior

Expect the RAM share to be accepted.

Actual Behavior

Seems to be a very common but not 100% occurring error. The RAM share accepter fails with the error:

Error: error retrieving resource shares: UnknownResourceException: ResourceShare arn:aws:ram:us-east-2:024660257967:resource-share/e3eeada4-b908-4a0a-8c01-045441086956 could not be found.

In the debug output, the call to /getresourceshares (which at first glance appears correct to me) returns a 400 code from AWS. Not sure if this is a timing issue that needs to be retried.

Sometimes the accepter resource is not in the state file (even though it's been accepted). Sometimes (as in the example I've attached here) the accepter is added to the state file but marked as tainted. Example plan debug is here https://gist.github.com/dthvt/fab84469caaf7fe8b4a275a15ddc67fd

I've tried adding a time_sleep resource between the share and accepter, but that seems irrelevant since it's actually the API calls within the accepter itself that end up erroring out.

As a work around, you can use terraform import to add the accepter to the state file (if it's already there and marked as tainted, a terraform state rm first, then an import seems to work).

Steps to Reproduce

terraform apply

May take more than one attempt unfortunately. However, it happens VERY commonly for me and should be reproducible.

Important Factoids

Two accounts in the same AWS org w/ RAM sharing turned off (so accepter is required).

References

#0000

bug servicec2 servicram servicsts

Source

dthvt

👍6

Most helpful comment

Yes, I've confirmed the I could run the same query via CLI (sorry, should have mentioned that). That's why I suspect it might be a race condition or something that just needs to be retried in the provider during creation.

dthvt on 17 Sep 2020

👍2

All 2 comments

Hi @dthvt, thank you for reporting this issue. Looking at the request that returns the 400 error code (GetResourceShares), my first guess is that it could be associated with the {"resourceOwner":"OTHER-ACCOUNTS"} portion of the request, which is set by the AWS provider. I've seen that not found error in the AWS cli when running aws ram get-resource-shares --resource-owner OTHER-ACCOUNTS --resource-share-arn arn:aws:ram:xxx.... for a resource share created and shared by another account in the same AWS organization, but when running the same command with SELF as the resource-owner, the resource share is returned in the output. On your end, do you mind confirming if the resource share from 1 account is viewable from the other account with the AWS CLI --resource-owner OTHER-ACCOUNTS option? That way we can begin to rule out where the request may be running into issues.