Terraform-provider-aws: aws_ram_resource_share_accepter wanting to create again

Created on 10 Sep 2019  ·  26Comments  ·  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

> terraform -v
Terraform v0.12.7
+ provider.aws v2.24.0

Affected Resource(s)

  • aws_ram_resource_share_accepter

Terraform Configuration Files

This is essentially what we have in our module:

variable "project_account" {
  description = "The project's account number"
  type        = "string"
}

variable "core_transitgw_account" {
  description = "The core account number who owns the TransitGW"
  type        = "string"
}

provider "aws" {
  alias   = "core_transitgw_account"
  region  = "us-east-2"
  profile = "AmazonAWSAdmins_profile"
  assume_role {
    role_arn = "arn:aws:iam::${var.core_transitgw_account}:role/admin"
  }
}

provider "aws" {
  alias   = "project_account"
  region  = "us-east-2"
  profile = "AmazonAWSAdmins_profile"
  assume_role {
    role_arn = "arn:aws:iam::${var.project_account}:role/admin"
  }
}

data "aws_ram_resource_share" "transitgw" {
  provider       = "aws.core_transitgw_account"
  name           = "tgw-share"
  resource_owner = "SELF"
}

# Send an sharing invite with the project account
resource "aws_ram_principal_association" "transitgw" {
  provider           = "aws.core_transitgw_account"
  principal          = "${var.project_account}"
  resource_share_arn = "${data.aws_ram_resource_share.transitgw.arn}"
}

# ...and accept the invite.
resource "aws_ram_resource_share_accepter" "transitgw" {
  provider  = "aws.project_account"
  share_arn = "${aws_ram_principal_association.transitgw.resource_share_arn}"
}

# Create the Satellite VPC
resource "aws_vpc" "satellite" {
  provider   = "aws.project_account"
  cidr_block = "${var.vpc_cidr_block}"
  tags = {
    Name = "satellite"
  }
}

# Create the TransitGW attachment
resource "aws_ec2_transit_gateway_vpc_attachment" "satellite" {
  provider                                        = "aws.project_account"
  subnet_ids                                      = "${aws_subnet.satellite.*.id}"
  transit_gateway_id                              = "${data.aws_ec2_transit_gateway.satellite.id}"
  vpc_id                                          = "${aws_vpc.satellite.id}"
  transit_gateway_default_route_table_propagation = false
  tags = {
    Name = "transitgw"
    Side = "creator"
  }
  depends_on = ["aws_ram_resource_share_accepter.transitgw"]
}

# ...and accept it in the core_transitgw_account.
resource "aws_ec2_transit_gateway_vpc_attachment_accepter" "transitgw" {
  provider                                        = "aws.core_transitgw_account"
  transit_gateway_attachment_id                   = "${aws_ec2_transit_gateway_vpc_attachment.satellite.id}"
  transit_gateway_default_route_table_propagation = false
  tags = {
    Name = "satellite-${var.project_name}"
    Side = "accepter"
  }
}

Debug Output

Encrypted full debug output located here: https://gist.github.com/cacack/e285bb2991cde0a3cede3d4bffc6e6db

Expected Behavior

No changes in plan.

Actual Behavior

  # module.project_training_satellite_vpc.aws_ram_resource_share_accepter.transitgw will be created
  + resource "aws_ram_resource_share_accepter" "transitgw" {
      + id                  = (known after apply)
      + invitation_arn      = (known after apply)
      + receiver_account_id = (known after apply)
      + resources           = (known after apply)
      + sender_account_id   = (known after apply)
      + share_arn           = "arn:aws:ram:us-east-2:12345678:resource-share/44b605ed-97da-6010-130a-9efb4006e4a1"
      + share_id            = (known after apply)
      + share_name          = (known after apply)
      + status              = (known after apply)

Steps to Reproduce

  1. terraform plan

Important Factoids

We are pretty infant with AWS and Terraform...

References

  • #7601
bug servicram

Most helpful comment

I analysed further the code of the accepter.

First of all, something is wrong in my previous comment, the ID is already the resource_share_arn, so there is no change on this aspect.

The biggest change is that a lot of attributes are populated from the invitation. Since we cannot rely on the invitation to always be there (especially for "import" operations, which might be useful in some corner cases e.g. when the state got corrupted for one reason or another), I believe that we have to change this behaviour and populate the attributes from the resource share itself (the one on the receiving side).

This would imply the following changes :

  • status => to be populated from resource share "status"
  • receiver_account_id => cannot be populated from the resource share and always equal to the current account, is there any value in storing this ? It is currently used in the delete operation, if the receiver_account_id is empty, the resource cannot be deleted. I do not see the purpose of this test before the deletion.
  • sender_account_id => to be populated from resource share "owningAccountId"
  • share_arn => to be populated from resource share "resourceShareArn"
  • invitation_arn => not always accessible, especially for imports, to be removed
  • share_id => currently the value is computed, I do not see this as a very reliable computation (it is only based on assumption, I did not find any AWS documentation on this), I would prefer to remove this attribute and let the end user compute it in his terraform code if he needs it, and hence be more capable to adapt to any further change on AWS side for this computation (but maybe I'm wrong and this is not the terraform philosophy)
  • share_name => to be populated from resource share "name"
  • resources => to be populated as today from a call to the ListResources endpoint

So in my mind there would be 3 "breaking" changes in the removal of the "receiver_account_id", "invitation_arn", "share_id" attributes.

I intend to modify the "..Read" function to populate attributes from the resource share in the receiving account (instead of the invitation which is "rarely" present indeed).

I found as well a bug in the Delete function where resourceAwsRamResourceShareStateRefreshFunc is called with d.Id() which is the resource_share_arn while this function expect a resource_share_invitation_arn indeed. So I assume that the deletion does not wait for the real resource deletion and exits prematurely. Moreover, since the invitation is not already present at deletion (could have disappeared if older than 7 days), this check could quite often not be made even if the right parameter would be passed. I would hence consider to switch this check to be made on the resource_share itself instead of the invitation, for both the creation and the deletion operations.

Any feedback greatly appreciated.

All 26 comments

Anyone have thoughts on this? Is it a bug with the aws_ram_resource_share_accepter, or a problem with our usage?

It could be either, I saw this too:

aws_ram_resource_share_accepter.resource_name: Refreshing state... (ID: arn:aws:ram:ap-southeast-1:<redacted>)
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/09/18 17:21:30 [DEBUG] [aws-sdk-go] DEBUG: Response RAM/GetResourceShareInvitations Details:
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: ---[ RESPONSE ]--------------------------------------
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: HTTP/2.0 200 OK
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: Content-Length: 440
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: Content-Type: application/json
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: Date: Wed, 18 Sep 2019 09:21:30 GMT
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: X-Amz-Apigw-Id: ANOMFEVzSQ0Fesg=
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: X-Amzn-Requestid: <redacted>
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: X-Amzn-Trace-Id: <redacted>
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: -----------------------------------------------------
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/09/18 17:21:30 [DEBUG] [aws-sdk-go] {"resourceShareInvitations":[{"invitationTimestamp":1568186025.509,"receiverAccountId":"<redacted>","resourceShareArn":"arn:aws:ram:ap-southeast-1:<redacted>","resourceShareInvitationArn":"arn:aws:ram:ap-southeast-1:<redacted>","resourceShareName":"<redacted>","senderAccountId":"<redacted>","status":"EXPIRED"}]}
2019-09-18T17:21:30.162+0800 [DEBUG] plugin.terraform-provider-aws_v2.25.0_x4: 2019/09/18 17:21:30 [WARN] No RAM resource share invitation by ARN (arn:aws:ram:ap-southeast-1:<redacted>) found, removing from state

This was working a week ago, so I suspect the expiry of the invitation has something to do with it. Not sure if that should cause the resource to require being recreated, because the share is still active currently.

Perhaps I may also be using the wrong resource?

It seems like aws ram get-resource-shares returns it as being status: ACTIVE so it seems after accepting the invitation, it does lapse into an expired state. This prevents us from querying anything about the resource which were shared. I think we need to extend to aws_ram_resource_share data source to also expose the responses which were shared. Thoughts?

That would make sense. We are able re-run a plan/apply after a successfully apply and not have plan to (re)create anything. There was some amount of time that elapsed from the initial apply to when we ran that above.

This may also play into another issue were seeing, which I may open another issue for. I'm trying to import another account with the same resources but it is failing to import the aws_ram_resource_share_accepter:

> terraform import -provider=aws.transitgw_account module.project_satellite_vpc.aws_ram_resource_share_accepter.transitgw 'arn:aws:ram:us-east-2:123456789012:resource-share/44b605ed-97da-6010-130a-9efb4006e4a1'
module.project_satellite_vpc.aws_ram_resource_share_accepter.transitgw: Importing from ID "arn:aws:ram:us-east-2:123456789012:resource-share/44b605ed-97da-6010-130a-9efb4006e4a1"...
module.project_satellite_vpc.aws_ram_resource_share_accepter.transitgw: Import prepared!
  Prepared aws_ram_resource_share_accepter for import
module.project_satellite_vpc.aws_ram_resource_share_accepter.transitgw: Refreshing state... [id=arn:aws:ram:us-east-2:123456789012:resource-share/44b605ed-97da-6010-130a-9efb4006e4a1]

Error: Cannot import non-existent remote object

While attempting to import an existing object to
aws_ram_resource_share_accepter.transitgw, the provider detected that no
object exists with the given id. Only pre-existing objects can be imported;
check that the id is correct and that it is associated with the provider's
configured region or endpoint, or use "terraform apply" to create a new remote
object for this resource.

Just pinging this again... I suspect @lowjoel is correct and the RAM invitation moves to an expired/archive state causing Terraform to want to create this resource again.

We have another RAM use-case with sharing Route53 rules between multiple accounts, but are hesitant to head down this route (pun intended) if we're going to experience the same invitation problems.

For now, we've removed management of the RAM share from Terraform and will handle it directly with API as part of some initial bootstrapping we must perform. However, I'd really like to have this under Terraform's management.

i can confirm that we are seeing this too

Terraform v0.11.14
+ provider.aws v2.24.0

my code is much the same as @cacack in the issue description, however we are using RAM to share r53 resolver rules as well as TGWs.

The initial terraform plan and apply work just fine. But at some point in the future (anecdotally about 1 week) the RAM resource share invitations expire and are removed, even after they have been ACCEPTED.

As stated, the consequence is that terraform wants to create a new aws_ram_resource_share_accepter resource for an invitation that no longer exists and therefore recreate all resources that depend on the accepter resource.

How do I workaround the issue, until we get a fix?

where possible I've been using -target to avoid touching the aws_ram_resource_share resources. But that won't work in all cases - like if you need to update the RAM shares themselves.

I destroyed the existing resources and recreated the RAM Share resource and all the dependent resources as well. Worked fine for a few days. And strangely, I got the same issue again after that.

I've been experiencing the same issue as well. In my case, I ended up using a data "external" running a python script which should be triggered whenever the RAM resource share arn is changed:

data "external" "RESOURCE-X_share_accepter_in_ACCOUNT-X" {  ## FIXME: go back to using a resource "aws_ram_resource_share_accepter" once #10064 is fixed
  program = [ "python3", "${path.module}/../dist/accept-ram-share-invitation.py" ]

  query = {
    account   = "${var.account}"
    region    = "${var.awsRegion}"
    share_arn = "${aws_ram_resource_share.RESOURCE-X_ram_share.arn}"
  }
}

Here's the content of my accept-ram-share-invitation.py:

import boto3
import sys
import json

ROLE_ARN = "" ## provide an IAM role to assume
ASSUME_ROLE_SESSION = "" ## provide an name for the IAM role session

def assume_role(region, account):
    client = boto3.client('sts')

    response = client.assume_role(RoleArn=ROLE_ARN, RoleSessionName=ASSUME_ROLE_SESSION)

    return boto3.Session(
        aws_access_key_id=response['Credentials']['AccessKeyId'],
        aws_secret_access_key=response['Credentials']['SecretAccessKey'],
        aws_session_token=response['Credentials']['SessionToken']
    )

def main(**kwargs):
    session = assume_role(region=kwargs['region'], account=kwargs['account'])

    client_ram_identity = session.client('ram', region_name=kwargs['region'])

    response_dict = client_ram_identity.get_resource_share_invitations(
        resourceShareArns=[
            kwargs['share_arn']
        ]
    )

    invitation_arns = [{'arn': s['resourceShareInvitationArn'], 'status': s['status']} for s in response_dict['resourceShareInvitations']] if 'resourceShareInvitations' in response_dict.keys() else []

    if not invitation_arns:
        result_dict = { 'status': 'EXPIRED' }
    else:
        for i in invitation_arns:
            if i['status'] != 'ACCEPTED':
                client_ram_identity.accept_resource_share_invitation(
                    resourceShareInvitationArn=i['arn'],
                )
        result_dict = { 'status': 'ACCEPTED' }

    return result_dict

if __name__ == "__main__":
    args = json.load(sys.stdin)
    sys.exit(str(json.dumps(main(**args))))

@jasonalex I think you could also do something similar to run the AcceptResourceShareInvitation API manually and handle the output as you wish.

Looks like maybe resourceAwsRamResourceShareAccepterRead() needs to check for existing accepted shares before looking at invitations. Poking around it looks like the ARN is the same between the two APIs.

I confirm this issue, same behaviour here.

As @dthvt mentioned, the issue lies in the fact that the current code in resourceAwsRamResourceShareAccepterRead() first checks for the corresponding ResourceShareInvitation (assuming that it will always be here) and then the ResourceShare itself.

The assumption that ResourceShareInvitation will be there forever is wrong. From my experience, it disappears after a little while from AWS systems. From the AWS documentation, it currently says (here) that we have 7 days to accept an invitation. From the experience of other users above, it seems that the invitation is in fact always deleted after 7 days, accepted or not does not matter.

From the error message associated with the check for invitation, I could imagine that this has been done to cover the case where the RAM Sharing has been activated within the organisation.

So we should probably change the code to not take for granted that the invitation will always be there.

I could propose a PR in a few days if nobody already started to work on this.

Unfortunately this issue with the invitation being deleted after 7 days does not seems to be something testable with the current test framework. Since we test against real AWS resources, the invitation will always be there since the ResourceShare will just be created. Any thought on this? @bflad : any insight would be greatly appreciated on this specific testing issue

My suggestion would be to stop relying too much on invitations for the read, finding a more balanced solution between the (which is not an easy task). It means as well that we have to stop using the invitationId as an ID for the resource, but go back to the resource_share_arn as ID (which is contradictory with what was decided in the original pull request. @YakDriver @ewbankkit: any recommendation on this?

I analysed further the code of the accepter.

First of all, something is wrong in my previous comment, the ID is already the resource_share_arn, so there is no change on this aspect.

The biggest change is that a lot of attributes are populated from the invitation. Since we cannot rely on the invitation to always be there (especially for "import" operations, which might be useful in some corner cases e.g. when the state got corrupted for one reason or another), I believe that we have to change this behaviour and populate the attributes from the resource share itself (the one on the receiving side).

This would imply the following changes :

  • status => to be populated from resource share "status"
  • receiver_account_id => cannot be populated from the resource share and always equal to the current account, is there any value in storing this ? It is currently used in the delete operation, if the receiver_account_id is empty, the resource cannot be deleted. I do not see the purpose of this test before the deletion.
  • sender_account_id => to be populated from resource share "owningAccountId"
  • share_arn => to be populated from resource share "resourceShareArn"
  • invitation_arn => not always accessible, especially for imports, to be removed
  • share_id => currently the value is computed, I do not see this as a very reliable computation (it is only based on assumption, I did not find any AWS documentation on this), I would prefer to remove this attribute and let the end user compute it in his terraform code if he needs it, and hence be more capable to adapt to any further change on AWS side for this computation (but maybe I'm wrong and this is not the terraform philosophy)
  • share_name => to be populated from resource share "name"
  • resources => to be populated as today from a call to the ListResources endpoint

So in my mind there would be 3 "breaking" changes in the removal of the "receiver_account_id", "invitation_arn", "share_id" attributes.

I intend to modify the "..Read" function to populate attributes from the resource share in the receiving account (instead of the invitation which is "rarely" present indeed).

I found as well a bug in the Delete function where resourceAwsRamResourceShareStateRefreshFunc is called with d.Id() which is the resource_share_arn while this function expect a resource_share_invitation_arn indeed. So I assume that the deletion does not wait for the real resource deletion and exits prematurely. Moreover, since the invitation is not already present at deletion (could have disappeared if older than 7 days), this check could quite often not be made even if the right parameter would be passed. I would hence consider to switch this check to be made on the resource_share itself instead of the invitation, for both the creation and the deletion operations.

Any feedback greatly appreciated.

Thanks @benoit74 for generating a PR for this! I have an immediate use for this, so hoping the PR gets accepted quickly.

Re. the delete option, I thought there was some prior discussion that the accepter should act like other accepters in that deletion is effectively a no-op. Personally, if I destroy my side of a RAM share, I don't have any issue w/ the behavior of actually deleting the share from my account. I'm not sure what the consensus is for the desired behavior, but it would probably be preferable that accepters have a uniform behavior to reduce confusion.

In the initial development, we weren't thinking of the accepter as a continuing part of the config or that it would be used 7- (we thought 15-) days after creation. Clearly, that was a mistake.

My thoughts on the issues:

  1. Considering the VPC peering connection accepter as an example, that resource/interaction does not involve an invitation. I like the simplicity of _not_ having an invitation (AWS could have made this work similarly). Thus, I love the idea of abstracting away the invitation completely here. It simplifies things for the user. Carrying the concept through implementation, the invitation should be irrelevant if the share is already in place. In other words, if one-year after establishing a share, the receiver applies or imports the accepter, it should just check for the appropriate existing share and then do nothing - no invitation needed.
  2. As for delete, the VPC peering connection accepter does nothing when the resource is deleted. This would suggest we do the same here. But, is there a way for the receiver to get rid of the share? Maybe this isn't the place for that anyway.

Thank you for you insights!

  1. To be honnest, I don't get why the VPC peering connection accepter does
    not support the deletion. It means that the accepter's account can never
    get rid of the corresponding VPC peering connection (at least without
    manual / complex manipulations). It is an issue for me, since the requester
    and accepter accounts are not necessarily managed by one single entity (I
    have scenarios where the two accounts are owned by different companies,
    which agree to work together and some point in time but might decide to
    stop at some point in the future). As an accepter, I would like to be able
    to delete a peering connection for whatever reason.

  2. The simplest way for a receiver to get rid of a share (or a peering
    connnection) would be to delete the corresponding accepter, and then it's
    the job of the provider to delete what needs to be (abstracting AWS
    complexity behind this, invitations or something else).

  3. The current behavior of the accepters seems awkward to me since they
    behave like a procedural language while terraform is declarative.

So my suggestion would be to support the deletion in all accepters, since
it is a lean way for a receiver to get rid of something he previously
accepted.

Does it makes sense to you?

I personally think it makes more sense for the accepter to actively delete the accepter's side of a share, so I'm receptive to your position @benoit74 . It probably makes sense for that to be a uniform behavior across all resource accepters. But I'm not familiar enough with the other *_accepter functions to say whether there's a particular counter-example.

@benoit74 Looking back at the comments on the original aws_vpc_peering_connection_accepter PR (and as I recall) it was an effectively arbitrary decision for delete to not removing the peering connection.
In retrospect, and given that later resources such as aws_ec2_transit_gateway_vpc_attachment_accepter and aws_ec2_transit_gateway_peering_attachment_accepter do delete the attachment on delete, that may not have been the correct decision.
We would have to add a new attribute like force_delete or somesuch to ensure backwards compatibility if this was changed now.

@benoit74 Was testing successful after the invitation expired? Anxiously awaiting acceptance of this PR! :-)

Yes, it is working. Sorry for the delay for testing.

Regarding the delete, this resource was already deleting resource_shares, so no need to add a force_delete for this resource just like transit_gateway's accepters.

The fix for this has been merged and will release with version 2.50.0 of the Terraform AWS Provider, Thursday this week. Thanks to @benoit74 and everyone above for the investigation and implementation work. 👍

This has been released in version 2.50.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

Thanks @benoit74 and team! I really appreciate the quick fix!

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings