On aws_instance
(and presumably for other providers) there doesn't seem to be a way to wait for cloud-config to finish before moving on to other resources. If I have a cloud-config runcmd
that creates a directory which is then used on a remote-exec
, the remote-exec
will fail because the resource gets run right after the creation of the instance and not after its cloud-config is completed.
On cloud formation, you can send a signal and have that be caught by ResourceSignal
which changes the status from Pending to Complete.
I'm also looking for a clean way to solve for this. I feel like there's something in the null_resource
area that can be hacked up, but I'm not sure that's the best path forward.
My "hack" at the moment around this is to treat a file as a resource signal, and have the remote-exec
block the rest of the execution until that file exists.
cloud-init.yml
:
runcmd:
- mkdir -p /etc/consul.d
- touch /tmp/signal
consul.tf
:
resource "null_resource" consul-config {
...
provisioner "remote-exec" {
inline = [
"while [ ! -f /tmp/signal ]; do sleep 2; done",
...
]
}
...
}
Well, that’s the cleanest version of this ugly hack I’ve seen yet. :-)
On Jan 14, 2016, at 5:15 PM, Calvin Leung Huang [email protected] wrote:
My "hack" at the moment around this is to treat a file as a resource signal, and have the remote-exec block the rest of the execution until that file exists.
cloud-init.yml:
runcmd:
- mkdir -p /etc/consul.d
- touch /tmp/signal
consul.tf:provisioner "remote-exec" {
inline = [
"while [ ! -f /tmp/signal ]; do sleep 2; done",
...
]
}
—
Reply to this email directly or view it on GitHub https://github.com/hashicorp/terraform/issues/4668#issuecomment-171835311.
Thanks, hope that helps :)
@cleung2010 could you share an example of how this looks in CloudFormation? I'm not too familiar with it so I'd like to try to understand a bit better how it solves this case and thus how/whether that solution might be used by Terraform.
Okay, so I think I'm understanding better the CloudFormation workflow:
cloud-init
to eventually run the cfn-signal
program.cfn-signal
program calls SignalResource to tell CloudFormation that the initialization either succeeded or failed.The key difference between CloudFormation and Terraform here is that of course Terraform doesn't have a server that the instance can contact to signal its success. However, as you noticed you can use provisioners in conjunction with state outside of Terraform (in your case, a file showing up on disk) to approximate the same thing.
If we frame the problem as having the instance send a signal somewhere and having Terraform listen for that signal, then there's a number of different signalling mechanisms that Terraform could hypothetically support via provisioners, and which can be implemented in the mean time using remote-exec
scripts:
remote-exec
script returns only once the instance shows up in the Consul registry, or once its checks are healthy.remote-exec
script that polls the ELB's instance table until the instance in question switches to the InService
state.cloud-init
write a byte to it. Then use a remote-exec
script that reads from the FIFO. This is basically the same thing as your solution of polling for a file, except that the FIFO avoids the need to poll because FIFO operations block until both a writer and a reader are present. This is actually a two-way synchronization, unlike the other approaches here: Terraform's provisioner will block on the user-data
write, and the user-data
write itself will block on Terraform's provisioner.Alternatively, Terraform has an aws_cloudformation_stack
resource which you can use to delegate the creation of instances to CloudFormation, and then you can use the cfn-signal
mechanism; AFAIK the aws_cloudformation_stack
resource is not considered complete until CloudFormation is satisfied that the stack is complete.
I found this issue because we have a similar need to migrate some existing cloudformation templates to terraform.
I think we will use the work-around with the null_resource but instead of using a file on the server and a remote-exec to check for it, we will use an S3 key and local-exec (in some cases we do not have ssh access to the servers, and simply need to know that the service they provide is ready before continuing).
(I think cloudformation also relies on s3: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html)
Regardless of the backend where you store the information (S3, consul, dynamo,...), I think it would be very convenient to have a generic wait-signal mechanism which manages things like unique id (to identify the resource sending the signal), retries and timeouts for instance.
Thanks for opening this feature request @calvn, and thanks to everyone else for the great discussion. Sorry we let this sit here idle for so long.
After some reflection, it seems like this is not a feature that Terraform can easily support natively since it requires somewhere to send the notification that the instance has booted and Terraform is not a hosted service.
Therefore we (the Terraform team) recommend pursuing alternative approaches such as the ones I enumerated in my earlier comment above, each of which makes use of a specific system outside of Terraform to maintain the necessary state. Since we don't have any near-term plans to work on this, I'm going to close this as part of our effort to prune some stale issues that don't have short-term action plans.
Thanks again for the discussion here!
I use this:
provisioner "remote-exec" {
inline = [
"/bin/bash -c \"timeout 300 sed '/finished-user-data/q' <(tail -f /var/log/cloud-init-output.log)\""
]
}
touch /tmp/finished-user-data
set -euxo pipefail
at the topHad a similar problem. I was using "runcmd" to create file and write some content to it. I was taking errors sometimes if I didn't wait enough. I didn't want to solve it waiting in the instant creation script, it is not a clean solution mentioned by @justinclayton
I solved it using "write_files"
I am not facing anymore "the file not found" errors.
Cloud init has a status wait command
provisioner "remote-exec" {
inline = [
"cloud-init status --wait"
}
I have some issues with the ecs agent I guess.. Do anyone have the documentation how to use a CIS Centos linux 7 to create an AMI with docker and ecs agent installed (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html#ecs-agent-install-nonamazonlinux) ....but the instances after providing the AMI id to the CFN and after deploying it aren't running the tasks.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Cloud init has a status wait command