If my STS token in ~/.aws/credentials is expired, when I invoke terraform apply, it will seemingly hang and become unresponsive, requiring two SIGINTs to quit. Trace logs show that it's repeatedly calling sts:GetCallerIdentity which resulting in 403 Forbidden with an ExpiredToken code.
...
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: Action=GetCallerIdentity&Version=2011-06-15
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: -----------------------------------------------------
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: 2017/08/04 21:14:40 [DEBUG] [aws-sdk-go] DEBUG: Response sts/GetCallerIdentity Details:
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: ---[ RESPONSE ]--------------------------------------
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: HTTP/1.1 403 Forbidden
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: Connection: close
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: Content-Length: 297
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: Content-Type: text/xml
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: Date: Sat, 05 Aug 2017 04:14:39 GMT
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: X-Amzn-Requestid: 99b535db-7994-11e7-8d9e-e17db6dd7b22
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4:
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4:
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: -----------------------------------------------------
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: 2017/08/04 21:14:40 [DEBUG] [aws-sdk-go] <ErrorResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: <Error>
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: <Type>Sender</Type>
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: <Code>ExpiredToken</Code>
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: <Message>The security token included in the request is expired</Message>
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: </Error>
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: <RequestId>99b535db-7994-11e7-8d9e-e17db6dd7b22</RequestId>
2017/08/04 21:14:40 [DEBUG] plugin: terraform-provider-aws_v0.1.3_x4: </ErrorResponse>
...
Terraform v0.10.0
N/A
provider "aws" {
region = "us-east-1"
}
resource "aws_security_group" "default" {
# doesn't matter which resource(s) are used
name = "foo"
}
See above. I can generate a full trace log if necessary.
N/A
What should have happened?
The authentication process should check for a ExpiredToken response code and either return an error or emit some message. I found that if I swap in a fresh unexpired token while terraform apply is in this (seemingly unresponsive) loop, it'll work. If the token expires in the middle of a command, it would be nice to allow for the user to replace the token (i.e., by polling like it does now), but if the very first auth results in an ExpiredToken, then perhaps it would be appropriate to abort the command?
What actually happened?
Terraform seemed to hang, requiring two Ctrl-Cs to abort.
terraform applyI think this behaviour is kind of intended because Terraform does not have a way to lease a new (valid) token and expects the user to do it and carry on once it does.
With that said we could (and probably should) give the user some feedback in the UI when this occurs instead of blindly retrying behind the scenes.
This would be a nice enhancement. I've been working around this by prefixing my terraform commands with an aws cli command to verify my credentials have not expired yet because aws cli will throw a 255 exit code when credentials expire:
aws s3 ls > /dev/null ; echo $?
An error occurred (ExpiredToken) when calling the ListBuckets operation: The provided token has expired.
255
aws sts get-caller-identity may be more lightweight in that context and for that purpose.
The expiry of the token causes a bigger problem if terraform hangs and cannot progress - and cannot save its state to the S3 bucket that you're using for backend state storage because you don't have permission... because your token has expired. The only workaround that I've found is to ensure that before any terraform operation, get a new token. And hope that it won't take more than an hour to do the operation.
@charles-at-geospock Thanks for sharing feedback from that angle. Do you have any suggestions for solutions in mind?
From my own experience token is quite an abstract thing in AWS as it may come from different sources (sts/GetSessionToken, plain sts/AssumeRole, sts/AssumeRoleWithSAML, sts/AssumeRoleWithWebIdentity or sts/GetFederationToken) and therefore the process of refreshing the token may differ significantly depending on your environment.
In some cases we can't really refresh the token at all without prompting the user for some extra input (e.g. login details for SAML/OAuth) and that would require some significant changes in both core schema helper and aws provider to deal with this. Also, Terraform was always designed mainly as a CLI tool which has capability to do things w/out user's interaction, where possible so I'd be hesitant about adding such complex functionality into the UI as it can make scripting more difficult.
Ain't saying it's impossible or that we won't do it - I'm merely sharing my viewpoint.
I'm honestly not sure - the go SDK appears to support the AWS_PROFILE variable that the command line tools use, but I couldn't see how to make it work with Terraform to use that, or whether that would be able to be handled by the SDK itself for renewal.
I've only used the AssumeRole method, so I'm not sure of the others - Looking at the ARNs returned, there may be some way to handle this if there is a consistent form that could be interpreted, eg *:sts:<account>:<mechanism>/<parameters>
where
* 'assume-role' => `<role>/<label>` (and the original user is given by the tail of the user-id)
though how you might recover the original credentials, I'm not sure - for me, at the command line, I use revert to the default profile then use the extracted parameters to re-request a token, but that won't work if the token came from elsewhere.
The more I look at this (and say to myself 'huh, I've had a quite naive approach'), the more I think that the only way to usefully deal with renewing the token is if TF itself gained it (using the necessary parameters) such that it can do so again when an expiry error happens. Maybe trying to get some support off Amazon to find out what they might expect for this sort of thing - presumably they have similar constraints when it comes to CloudFormation. I haven't spoken to Amazon about such things, though.
I wholeheartedly agree that requiring the user to interact in the middle of the session probably isn't a workable solution - I see one place where TF wins its place is in the CI/CD workflow, being the same mechanism, used to test the system during testing as for deployment. In such cases, just as you hope for production use, you hope to leave it alone and let it get on with the job :-(
One other 'interesting' real world case of problems that I had was that the token expired after requesting the building of a number of expensive EC2 systems... and then because it couldn't store the results in the S3 bucket (and I didn't think to look at the backup files), it never recorded the instance ids. There were just a few extra EC2 systems running when I came to review the state of an account a few days later. Because the expiry happened before the tags were asserted (as the tag assignment is done after the creation for some operations), finding the reason those systems had been created was slightly more difficult - they didn't have names, or associated 'purpose' or other tags that we use for tracking.
I hope it's not felt that I've hijacked a thread with a variation on the issue, but seeing that someone else suffered from the same things I have spurred me to give my own experience in the hope that someone knows a way to go.
I see the same issue except that Ctrl-C isn't sufficient - I seem to have to kill -9 the process 馃槙
If I lean on Ctrl-C, here's what I see in the debug log:
^C2017-10-26T13:13:53.406+0100 [DEBUG] plugin.terraform-provider-template_v1.0.0_x4: 2017/10/26 13:13:53 [DEBUG] plugin: received interrupt signal (count: 33). Ignoring.
2017-10-26T13:13:53.406+0100 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/26 13:13:53 [DEBUG] plugin: received interrupt signal (count: 33). Ignoring.
^C2017-10-26T13:13:53.426+0100 [DEBUG] plugin.terraform-provider-aws_v1.1.0_x4: 2017/10/26 13:13:53 [DEBUG] plugin: received interrupt signal (count: 34). Ignoring.
2017-10-26T13:13:53.426+0100 [DEBUG] plugin.terraform-provider-template_v1.0.0_x4: 2017/10/26 13:13:53 [DEBUG] plugin: received interrupt signal (count: 34). Ignoring.
This is on OS X 10.12.6 with terraform 0.10.8. Any suggestions why ctrl-c doesn't work here?
I too see this issue very often. I think terraform should just exit when it receives token expiry error. When we do SIGINT the state files are left with incomplete states, these are caused because some of the states where not updated in the state file because the creation complete response is not received by terraform yet.
I also experience this issue, and as a result use the following workaround in my ~/.bash_profile:
terraform () {
aws sts get-caller-identity > /dev/null && /usr/local/bin/terraform "$@"
}
in my opinion this is the expected behaviour. any workaround would open a security risk
also, even if TF fails , the state is kept in ram, till you rerun apply and then it's pushed to s3.
on other way, is to use aws-vault with --server mode, which will renew the credentials for you
@FernandoMiguel How is it a security risk?
@lvh long running credentials
Gotcha. I was confused because the main workaround/bug fix I see being discussed right now is having terraform error out when the credentials become invalid instead of just hanging, that doesn't increase credential lifetime, it just improves error messages. How would you otherwise recover from a hanging TF process? (I agree the state is kept in RAM there, but that doesn't seem very useful as it continues to bash its head against the AWS SDK with an expired credential :))
@lvh I will agree with you there.
I get sometimes hit by this issue too, and seeing tf keep retrying is just silly
I know I said this was by design...
But since I keep getting hit by this when running from my laptop (not ci obviously),
Is there a way for the time out of the credentials to be lower?
No reason to opaquely retry. Fail per principle of least astonishment. I can re-auth, push the erred tfstate file and get on with my life.
lamazoidius
It would be really nice if this was handled simply by exiting and displaying the expired token message. Currently I have to kill the process and forcibly unlock the Terraform cloud workspace which is a lot of manual work for an error which could trivially be detected with no side-effects.
Reported as https://support.hashicorp.com/hc/en-us/requests/24845
Most helpful comment
I too see this issue very often. I think terraform should just exit when it receives token expiry error. When we do SIGINT the state files are left with incomplete states, these are caused because some of the states where not updated in the state file because the creation complete response is not received by terraform yet.