Terraform-provider-aws: Heavy Cpu Load

Created on 1 Dec 2017

Terraform Version

Terraform v0.10.8

Affected Resource(s)

general terraform issue
the aws plugin take ~16% and terraform master process take another 16%

Terraform Configuration Files

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html.

Panic Output

If Terraform produced a panic, please provide a link to a GitHub Gist containing the output of the crash.log.

Expected Behavior

Terraform should not be taking 30% of my cpu just to start a couple of instances at aws.
This seems like excessive cpu load.
if I understand correctly, terraform is a wrapper around aws api, with a bunch of sophisticated logic ontop.
my cpu is a: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
ram: 16GB

Actual Behavior

very high (37% peak) cpu load

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

very basic aws configuration, with handful of remote exec to setup puppet. booting 3 instances using count variables. One instance has an attached ebs volume.


Most helpful comment

based on @berney suggestion I tried the windows version of terraform. No high cpu utilization, using the same test project. This strongly suggests that its a WSL bug, and everyone else who complained about high cpu in this thread were also running WSL.
I will have opened a bug report with WSL : https://github.com/Microsoft/WSL/issues/3276


Hi @jgrammen-agilitypr!

I agree that this CPU usage seems excessive. Unfortunately, with the information given it's hard to guess what aspect of Terraform's work here was causing all this CPU work. If you're able to share a small config that causes this problem for you, we may be able to reproduce it and understand via profiling which aspect of the AWS provider and Terraform is using the CPU here.

Providing what I can, with names changed to protect the innocent.
As far as I am concerned this is a pretty bog standard terraform setup for aws instances

variable "run_master" { default = true }
# master -- jenkins master
resource "aws_instance" "master" {
        key_name = "${var.keypair}"
        ami = "${var.ami_xenial}"
        instance_type = "${var.server_type["master"]}"
        subnet_id = "${var.subnet_production}"
        private_ip = "${var.server_ip["master"]}"
        vpc_security_group_ids = "${var.sg_production}"
        root_block_device {
            volume_type = "gp2"
            volume_size = "${var.root_volsize["master"]}"
        lifecycle {
                ignore_changes = ["tags"]
        count = "${var.master}"
        user_data = "#cloud-config\nhostname: master.domain\nfqdn: master.domain"

        depends_on = ["aws_instance.slave"]
        tags {
                type = "jenkins"
                role = "master"
                name = "master"

        provisioner "remote-exec" {
                inline = [
                        "printf '\n${var.server_ip["master"]} master.domain' | sudo tee -a /etc/hosts",
                        "wget --quiet -O /tmp/puppetlabs-release-pc1-$(lsb_release -sc).deb https://apt.puppetlabs.com/puppetlabs-release-pc1-$(lsb_release -sc).deb",
                        "sudo dpkg -i /tmp/puppetlabs-release-pc1-$(lsb_release -sc).deb",
                        "sudo apt-get update",
                        "sudo apt-get install puppet-agent -y",
                        "sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=stopped enable=false",
                        "nohup sudo /opt/puppetlabs/bin/puppet agent -t --waitforcert 180 &",
                        "sleep 1" # trick to get terraform to finish above command before closing ssh conn
                        # command || true # force command to exit and allow terraform to continue
                connection {
                        type = "ssh"
                        user = "ubuntu"

resource "aws_ebs_volume" "master_data" {
  availability_zone = "${var.az_production}"
  # standard = magnetic , gp2 = standard ssd
  type = "gp2"
  size = "${var.data_volsize["master"]}"
  tags {
    Name = "master-data"

  count = "1"

resource "aws_volume_attachment" "master_data_vole" {
  # new kernerls will rename /dev/sdf to /dev/xvdf anyways
  device_name = "/dev/xvdf"
  volume_id = "${aws_ebs_volume.master_data.id}"
  instance_id = "${aws_instance.master.id}"
  provisioner "remote-exec" {
        inline = [
                "sudo umount -A /dev/xvdf || true",
        when = "destroy"
        connection {
                type = "ssh"
                host = "${aws_instance.master.private_ip}"
                user = "ubuntu"

  count = "1"

resource "aws_instance" "slave" {
        key_name = "${var.keypair}"
        ami = "${var.ami_xenial}"
        instance_type = "${var.server_type["slave"]}"
        subnet_id = "${var.subnet_production}"
        private_ip = "${var.ent_ip[count.index + 1]}"
        vpc_security_group_ids = "${var.sg_production}"
        root_block_device {
            volume_type = "gp2"
            volume_size = "${var.root_volsize["slave"]}"
        lifecycle {
                ignore_changes = ["tags"]
        count = "${var.slave_count["slave"]}"
        user_data = "#cloud-config\nhostname: slave0${count.index + 1}.domain\nfqdn: slave0${count.index + 1}.domain"
        tags {
                type = "jenkins"
                role = "slave"
                sub_role = "slave"
                Name = "slave0${count.index + 1}"
        provisioner "remote-exec" {
                inline = [
                        "wget --quiet -O /tmp/puppetlabs-release-pc1-$(lsb_release -sc).deb https://apt.puppetlabs.com/puppetlabs-release-pc1-$(lsb_release -sc).deb",
                        "sudo dpkg -i /tmp/puppetlabs-release-pc1-$(lsb_release -sc).deb",
                        "sudo apt-get update",
                        "sudo apt-get install puppet-agent -y",
                        "sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=stopped enable=false",
                        "nohup sudo /opt/puppetlabs/bin/puppet agent -t --waitforcert 180 &",
                        "sleep 1" # trick to get terraform to finish above command before closing ssh conn
                        # command || true # force command to exit and allow terraform to continue
                connection {
                        type = "ssh"
                        user = "ubuntu"

Thanks for the additional information, @jgrammen-agilitypr.

When we get a chance we'll try to reproduce your results and, if we can, try to diagnose with a profiler.

Hello everyone.
I also have CPU load (100%, screenshot in attachment), but in my case I have not very strong CPU (i5-7200U).
I use Ubuntu subsystem on Windows 10.


I am also getting high CPU utilisation running terraform. I am launching it from withing WSL Ubuntu on Windows, I get 100% CPU utilisation on all cores whilst terraform is sitting at the apply prompt.

Plan: 1 to add, 1 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: 

The CPU utilisation appears to be occurring in the kernel (it shows as red in Process Explorer).



Something terraform is doing is causing the kernel to be very busy.

Whilst waiting for user input before doing anything should require ~0 CPU utilisation.

As soon as I answer no or hit ctrl-c the high CPU load stops.

I'm not using the AWS provider, I'm not sure if the underlying root cause of OPs problem is the same or different.

! terraform version

Terraform v0.11.3
+ provider.acme (unversioned)
+ provider.azurerm v1.1.1
+ provider.local v1.1.0
+ provider.null v1.0.0
+ provider.template v1.0.0
+ provider.tls v1.0.1

! WSL Ubuntu version

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

I'm on WSL Ubuntu also.

Here's a strace -c -f for null_resource provider during destroy (so when it's not supposed to do anything).

% time seconds usecs/call calls errors syscall

0.00 0.000000 0 45 21 read
0.00 0.000000 0 24 write
0.00 0.000000 0 1 close
0.00 0.000000 0 3 rt_sigprocmask
0.00 0.000000 0 27 sched_yield
0.00 0.000000 0 1 clone
0.00 0.000000 0 2 sigaltstack
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 2 gettid
0.00 0.000000 0 205 22 futex
0.00 0.000000 0 423 clock_gettime
0.00 0.000000 0 2724378 epoll_wait
0.00 0.000000 0 1 epoll_ctl
0.00 0.000000 0 3 2 unlinkat
0.00 0.000000 0 175 pselect6

100.00 0.000000 2725291 45 total

Same goes when on when waiting for user confirm, all the provider seem to be doing is spinning on epoll_wait

I will provide more details later, but this appears to be a real problem.
When running a terraform up, it appears that terraform is cpu limited.
I ran a up that takes 15 minutes but for a colleague it took 1.5 hours. He has a slower cpu (2 cores vs my 4 cores).
Terraform in my understanding is just parsing the terraform configuration and invoking provider api calls, why the crazy cpu usage?

Test terraform project

I built a brand new project just for testing purposes, with the minimalist configuration I can think of, and the cpu utlizations is still crazy high, for why should be api calls to aws.
It seems even more strange the the high cpu happens before I even accept the apply.
That means terrform is using tons of cpu just waiting to actually do anything.
This is now impacting our ability to use terraform, because colleagues with really beefy cpus are unable to use terraform to build our infrastructure.

cpu utilization while terraform is waiting for me to type yes, to accept the apply:



# ca04test01
resource "aws_instance" "ca04test01" {
    key_name = "${var.keypair}" 
    ami = "${var.ami_xenial}"
    instance_type = "${var.server_type["ca04test01"]}"
    subnet_id = "${var.subnet_production["ent_app_preprod"]}"
    private_ip = "${var.server_ip["ca04test01"]}"
    # vpc_security_group_ids has to be a list, so enclose in [] to make it a 1 item list
    vpc_security_group_ids = ["${var.sg_production["ent_app_preprod"]}"]

    lifecycle {
        ignore_changes = ["tags"]
    count = "1"
    user_data = "#cloud-config\nhostname: ca04test01.agilitypr.internal\nfqdn: ca04test01.agilitypr.internal"

    tags {
        type = "test"
        role = "test"
        Name = "ca04test01"

    provisioner "remote-exec" {
        inline = [
            "printf '\n${var.server_ip["ca04test01"]} ca04test01.agilitypr.internal' | sudo tee -a /etc/hosts",
            "sleep 1" # trick to get terraform to finish above command before closing ssh conn
            # command || true # force command to exit and allow terraform to continue
        connection {
            type = "ssh"
            user = "ubuntu"

terraform -v

Terraform v0.11.3
+ provider.aws v1.21.0

WSL / ubuntu for windows (on windows 10 build 1709)

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

based on @berney suggestion I tried the windows version of terraform. No high cpu utilization, using the same test project. This strongly suggests that its a WSL bug, and everyone else who complained about high cpu in this thread were also running WSL.
I will have opened a bug report with WSL : https://github.com/Microsoft/WSL/issues/3276


Issue was addressed in WSL, can this be closed?

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

