Terraform: Waiting for cloud-config/user_data completion

Created on 13 Jan 2016  ·  14Comments  ·  Source: hashicorp/terraform

On aws_instance (and presumably for other providers) there doesn't seem to be a way to wait for cloud-config to finish before moving on to other resources. If I have a cloud-config runcmd that creates a directory which is then used on a remote-exec, the remote-exec will fail because the resource gets run right after the creation of the instance and not after its cloud-config is completed.

On cloud formation, you can send a signal and have that be caught by ResourceSignal which changes the status from Pending to Complete.

enhancement provideaws

Most helpful comment

Cloud init has a status wait command

provisioner "remote-exec" {
  inline = [
    "cloud-init status --wait"
}

All 14 comments

I'm also looking for a clean way to solve for this. I feel like there's something in the null_resource area that can be hacked up, but I'm not sure that's the best path forward.

My "hack" at the moment around this is to treat a file as a resource signal, and have the remote-exec block the rest of the execution until that file exists.

cloud-init.yml:

runcmd:
  - mkdir -p /etc/consul.d
  - touch /tmp/signal

consul.tf:

resource "null_resource" consul-config {
  ...
  provisioner "remote-exec" {
    inline = [
      "while [ ! -f /tmp/signal ]; do sleep 2; done",
      ...
    ]
  }
  ...
}

Well, that’s the cleanest version of this ugly hack I’ve seen yet. :-)

On Jan 14, 2016, at 5:15 PM, Calvin Leung Huang [email protected] wrote:

My "hack" at the moment around this is to treat a file as a resource signal, and have the remote-exec block the rest of the execution until that file exists.

cloud-init.yml:

runcmd:

  • mkdir -p /etc/consul.d
  • touch /tmp/signal
    consul.tf:

provisioner "remote-exec" {
inline = [
"while [ ! -f /tmp/signal ]; do sleep 2; done",
...
]
}

Reply to this email directly or view it on GitHub https://github.com/hashicorp/terraform/issues/4668#issuecomment-171835311.

Thanks, hope that helps :)

@cleung2010 could you share an example of how this looks in CloudFormation? I'm not too familiar with it so I'd like to try to understand a bit better how it solves this case and thus how/whether that solution might be used by Terraform.

@apparentlymart There is an example on the use of cfn-signal here, and also in the official AWS guide for setting up a Consul cluster on ECS here

Okay, so I think I'm understanding better the CloudFormation workflow:

  • Cloudformation is configured to start an EC2 instance with user-data that will cause cloud-init to eventually run the cfn-signal program.
  • The cfn-signal program calls SignalResource to tell CloudFormation that the initialization either succeeded or failed.
  • CloudFormation waits for that call and then uses it to decide what to do next.

The key difference between CloudFormation and Terraform here is that of course Terraform doesn't have a server that the instance can contact to signal its success. However, as you noticed you can use provisioners in conjunction with state outside of Terraform (in your case, a file showing up on disk) to approximate the same thing.

If we frame the problem as having the instance send a signal somewhere and having Terraform listen for that signal, then there's a number of different signalling mechanisms that Terraform could hypothetically support via provisioners, and which can be implemented in the mean time using remote-exec scripts:

  • Run Consul Agent on the instances, and ensure that the instances join a Consul cluster once they've successfully booted. Then the remote-exec script returns only once the instance shows up in the Consul registry, or once its checks are healthy.
  • Put an ELB in front of your instances, and then use a remote-exec script that polls the ELB's instance table until the instance in question switches to the InService state.
  • If you don't want to use an external service to transport the signal, you could put a FIFO (named pipe) in a predictable place on the filesystem in the AMI, and then make cloud-init write a byte to it. Then use a remote-exec script that reads from the FIFO. This is basically the same thing as your solution of polling for a file, except that the FIFO avoids the need to poll because FIFO operations block until both a writer and a reader are present. This is actually a two-way synchronization, unlike the other approaches here: Terraform's provisioner will block on the user-data write, and the user-data write itself will block on Terraform's provisioner.

Alternatively, Terraform has an aws_cloudformation_stack resource which you can use to delegate the creation of instances to CloudFormation, and then you can use the cfn-signal mechanism; AFAIK the aws_cloudformation_stack resource is not considered complete until CloudFormation is satisfied that the stack is complete.

I found this issue because we have a similar need to migrate some existing cloudformation templates to terraform.

I think we will use the work-around with the null_resource but instead of using a file on the server and a remote-exec to check for it, we will use an S3 key and local-exec (in some cases we do not have ssh access to the servers, and simply need to know that the service they provide is ready before continuing).

(I think cloudformation also relies on s3: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html)

Regardless of the backend where you store the information (S3, consul, dynamo,...), I think it would be very convenient to have a generic wait-signal mechanism which manages things like unique id (to identify the resource sending the signal), retries and timeouts for instance.

Thanks for opening this feature request @calvn, and thanks to everyone else for the great discussion. Sorry we let this sit here idle for so long.

After some reflection, it seems like this is not a feature that Terraform can easily support natively since it requires somewhere to send the notification that the instance has booted and Terraform is not a hosted service.

Therefore we (the Terraform team) recommend pursuing alternative approaches such as the ones I enumerated in my earlier comment above, each of which makes use of a specific system outside of Terraform to maintain the necessary state. Since we don't have any near-term plans to work on this, I'm going to close this as part of our effort to prune some stale issues that don't have short-term action plans.

Thanks again for the discussion here!

I use this:

provisioner "remote-exec" {
    inline = [
      "/bin/bash -c \"timeout 300 sed '/finished-user-data/q' <(tail -f /var/log/cloud-init-output.log)\""
    ]
}
  • The last line in the user_data.sh script runs a touch /tmp/finished-user-data
  • The user_data bash script has a set -euxo pipefail at the top
  • So it won't get to touch the marker file if any part of the user_data failed
  • This method also prints out the cloud-init-output.log file to the screen (saves you having to ssh over to the instance to see why it failed booting)

Had a similar problem. I was using "runcmd" to create file and write some content to it. I was taking errors sometimes if I didn't wait enough. I didn't want to solve it waiting in the instant creation script, it is not a clean solution mentioned by @justinclayton
I solved it using "write_files"

I am not facing anymore "the file not found" errors.

Cloud init has a status wait command

provisioner "remote-exec" {
  inline = [
    "cloud-init status --wait"
}

I have some issues with the ecs agent I guess.. Do anyone have the documentation how to use a CIS Centos linux 7 to create an AMI with docker and ecs agent installed (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html#ecs-agent-install-nonamazonlinux) ....but the instances after providing the AMI id to the CFN and after deploying it aren't running the tasks.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

franklinwise picture franklinwise  ·  3Comments

rkulagowski picture rkulagowski  ·  3Comments

rnowosielski picture rnowosielski  ·  3Comments

zeninfinity picture zeninfinity  ·  3Comments

ketzacoatl picture ketzacoatl  ·  3Comments