Terraform: apt-get update issue

Created on 23 Feb 2015 · 18Comments · Source: hashicorp/terraform

I run apt-get update on a linux box, then install supervisor when provisioning like so:

provisioner "remote-exec" {
    inline = [
        "sudo apt-get update",
        "sudo apt-get install -y supervisor"
    ]
}

I consistently get the following error:

module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 0%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 0%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 2%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 35%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 35%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 49%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 49%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 78%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 78%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 84%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 84%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 85%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... 85%
module.elb_eu.aws_instance.elb (remote-exec): Reading package lists... Done
module.elb_eu.aws_instance.elb (remote-exec): Building dependency tree... 0%
module.elb_eu.aws_instance.elb (remote-exec): Building dependency tree... 0%
module.elb_eu.aws_instance.elb (remote-exec): Building dependency tree... 50%
module.elb_eu.aws_instance.elb (remote-exec): Building dependency tree... 50%
module.elb_eu.aws_instance.elb (remote-exec): Building dependency tree
module.elb_eu.aws_instance.elb (remote-exec): Reading state information... 0%
module.elb_eu.aws_instance.elb (remote-exec): Reading state information... 1%
module.elb_eu.aws_instance.elb (remote-exec): Reading state information... Done
module.elb_eu.aws_instance.elb (remote-exec): E: Unable to locate package supervisor
module.elb_eu.aws_instance.elb (remote-exec): chown: cannot access ‘/etc/supervisor/conf.d’: No such file or directory

Running the same inline statements by hand on the linux box does not give me the same issue.

Is terraform running command asynchronously? Its not supervisor specific. It looks like apt-get update hasn't finished before terraform tries to run apt-get install supervisor.

TF: v 0.3.7

provisioneremote-exec

Source

outrunthewolf

Most helpful comment

I think that most experiencing this issue are having interference from cloud-init and its configuration of the apt sources. It's definitely not a Terraform issue in these cases other than the Hashicorp crew getting provisioning happening earlier than other tools. @duggan's workaround succeeds because the apt sources are complete by the time the second update runs.

Place this in your provisioning script before the apt-get update and you should be all set.

until [[ -f /var/lib/cloud/instance/boot-finished ]]; do
  sleep 1
done

whiteley on 3 Sep 2015

👍15

All 18 comments

Hi there!

Terraform's remote-exec provisioner simply writes the commands you specify to a file called /tmp/script.sh and executes it, so the commands will run in sequence.

When trying to reproduce this I did accidentally boot an Ubuntu AMI that was too old to have supervisor available and got 'Unable to locate package supervisor' message. I know you said this doesn't seem to be a supervisor-specific issue for you - can you provide a few more details on your setup so I can help you debug?

Here's the full config I just threw together that works AOK:

variable "aws_access_key" {}
variable "aws_secret_key" {}

provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region = "us-east-1"
}

resource "aws_security_group" "allow_ssh" {
  name = "i-1025-allow-ssh"
  description = "Allow ssh inbound"

  ingress {
    from_port = 22
    to_port = 22
    protocol = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_instance" "web" {
  ami = "ami-06793a6e"
  instance_type = "t1.micro"
  key_name = "terraform-testing"
  security_groups = [
    "default",
    "${aws_security_group.allow_ssh.name}"
  ]
  provisioner "remote-exec" {
    inline = [
      "sudo apt-get update",
      "sudo apt-get install -y supervisor"
    ]
    connection {
      user = "ubuntu"
      key_file = "~/.ssh/terraform-testing.pem"
    }
  }
}

phinze on 23 Feb 2015

I'm running an ubuntu 14.04 box.

Works when I:

SSH into a clean box and run the commands
- Add the commands to a script file like terraform would, then run it

Fails:

When I try and run with terraform

Its interesting that you're nesting the connection block. I aren't doing that. Attached it my config, its part of a module, but you can see what i'm doing.

resource "aws_instance" "elb" {
    count = 1
    instance_type = "${var.elb_instance_type}"
    ami = "${lookup(var.elb_ami, var.region)}"
    security_groups = ["${var.sec_group}"]
    key_name = "${var.key_name}"

    connection {
        user = "${var.key_user}"
        key_file = "${var.key_path}"
    }

    tags {
        Name = "api-elb-${count.index}"
        "Type" = "elb"
        "Status" = "live"
        "slug" = "api-elb-${count.index}"
    }

    # Create the various file locations for the file provisioner and chown
    provisioner "remote-exec" {
        inline = [
            "sudo mkdir -p /etc/consul/templates",
            "sudo mkdir -p /etc/consul/scripts",
            "sudo chown -R ubuntu:ubuntu /etc/consul",

            "sudo mkdir -p /etc/nginx/sites-enabled",
            "sudo mkdir -p /etc/nginx/certs",
            "sudo chown -R ubuntu:ubuntu /etc/nginx",
            "sudo mkdir -p /var/log/nginx",

            "sudo apt-get update",
            "sudo apt-get install -y supervisor",
            "sudo mkdir -p /etc/supervisor/conf.d",
            "sudo mkdir -p /var/log/supervisor",
            "sudo mkdir -p /var/log/consul-templates",
            "sudo chown -R ubuntu:ubuntu /var/log/consul-templates",
            "sudo chown -R ubuntu:ubuntu /etc/supervisor"
        ]
    }

    # Install the nginx configurations
    provisioner "file" {
        source = "${path.module}/configs/elb/elb.ctmpl"
        destination = "/etc/consul/templates/elb.ctmpl"
    }

    # Install the ssl chain
    provisioner "file" {
        source = "${path.module}/configs/elb/chain.pem"
        destination = "/etc/nginx/certs/chain.pem"
    }

    # Install the ssl key
    provisioner "file" {
        source = "${path.module}/configs/elb/ssl.key"
        destination = "/etc/nginx/certs/ssl.key"
    }

    # Install the consul service file
    provisioner "file" {
        source = "${path.module}/configs/consul/elb/service.json"
        destination = "/etc/consul/elb-service.json"
    }

    # Install a consul template file
    provisioner "file" {
        source = "${path.module}/configs/consul/elb/consul-template.conf"
        destination = "/etc/supervisor/conf.d/consul-template.conf"
    }

    # Install the config defintion
    provisioner "file" {
        source = "${path.module}/configs/consul/config.json"
        destination = "/etc/consul/config.json"
    }

    # Start nginx
    # Associate with the consul master
    # Enable consul templates
    provisioner "remote-exec" {
        inline = [
            # Set some variables we need
            "PRIVATE_IP=$(ip addr | grep 'state UP' -A2 | tail -n1 | awk '{print $2}' | cut -f1  -d'/')",
            "PUBLIC_IP=$(wget -qO- http://ipecho.net/plain) && sleep 5",

            # Run docker files
            "sudo docker run -d -h api-elb --name api-elb -v /etc/consul:/etc/consul -v /var/run/docker.sock:/var/run/docker.sock -p $PRIVATE_IP:8300:8300 -p $PRIVATE_IP:8301:8301 -p $PRIVATE_IP:8301:8301/udp -p $PRIVATE_IP:8302:8302 -p $PRIVATE_IP:8302:8302/udp -p $PRIVATE_IP:8400:8400 -p $PRIVATE_IP:8500:8500 -p 172.17.42.1:53:53/udp progrium/consul -advertise $PUBLIC_IP -dc ${var.region} -log-level debug -join ${var.algalon_ip} -config-dir /etc/consul",
            "sudo docker run -d -p 80:80 -p 443:443 --name nginx -v /etc/nginx/sites-enabled:/etc/nginx/sites-enabled -v /var/log/nginx:/var/log/nginx -v /etc/nginx/certs:/etc/nginx/certs dockerfile/nginx"
        ]
    }
}

outrunthewolf on 23 Feb 2015

Closing as no one else can reproduce. I'll look into it when I have time.

outrunthewolf on 6 Mar 2015

FWIW, I've been sporadically running into this with Terraform. Interestingly, the culprit also happens to be supervisor. Unfortunately, it's not consistent. Sometimes it works, sometimes it doesn't, and the only difference is changing the instance size to force a reprovision of the resource.

Without digging further into it, I'd be 50:50 on whether this is a supervisor packaging bug or a Terraform bug (or even apt-get), but finding this issue seems to lean it towards some unusual combination of the two.

Running 0.3.7.

duggan on 20 Mar 2015

Hey, so here's a question - what are your AMI sources? I've been building images in us-east-1 with Packer, then copying them to other regions. I'm booting everything in eu-west-1 at the moment.

There's a funny thing going on with apt-get update the first time it gets run after a copy (judging by strace) - it seems to be updating us-east sources. Then the apt-get install immediately following this correctly identifies itself as being in eu-west-1, tries to install from it, and finds no supervisor package.

Running apt-get update twice here seems to workaround the bug, but it might be the same thing affecting you.

duggan on 20 Mar 2015

Damn! I was starting to think I was crazy. I ended up building supervisor into the AMI's... which is not ideal.

I'll give it a try and see if a double update works

outrunthewolf on 22 Mar 2015

I'm having the same issue with trying to install Saltstack via apt-get install after running an apt-get update

llevar on 14 Aug 2015

Its something to do with apt-get update either not running correctly, or not finishing correctly, so its very possible an Ubuntu issue.

Perhaps apt-get update returns a finish code before its actually finished (perhaps some file operations in the background are still in operation, or are still to be completed).

outrunthewolf on 14 Aug 2015

This sounds like a common issue I've come across with any continuous integration AMI build chain. At times, packages will simply be unavailable and I'll have to do multiple rebuilds before they just start working again. Haven't had the time to further diagnose.

I've had this issue with autoconf, ec2-ami-tools and supervisor.

mjuarez on 17 Aug 2015

Place this in your provisioning script before the apt-get update and you should be all set.

until [[ -f /var/lib/cloud/instance/boot-finished ]]; do
  sleep 1
done

whiteley on 3 Sep 2015

👍15

genius :+1:

outrunthewolf on 3 Sep 2015

That is going to save a lot of headaches! :+1:

duggan on 3 Sep 2015

Hi @phinze,

I am also getting an error during run commands on "remote-exec" in provision section.

Error:

aws_instance.test: Creating...
ami: "" => "ami-97d490fd"
availability_zone: "" => ""
ebs_block_device.#: "" => ""
ephemeral_block_device.#: "" => ""
instance_type: "" => "t1.micro"
key_name: "" => "newkvp"
placement_group: "" => ""
private_dns: "" => ""
private_ip: "" => ""
public_dns: "" => ""
public_ip: "" => ""
root_block_device.#: "" => ""
security_groups.#: "" => ""
source_dest_check: "" => "1"
subnet_id: "" => ""
tags.#: "" => "1"
tags.Name: "" => "test_server"
tenancy: "" => ""
vpc_security_group_ids.#: "" => "1"
vpc_security_group_ids.1458656584: "" => "sg-4d5b1f2a"
aws_instance.test: Provisioning with 'file'...
aws_instance.test: Provisioning with 'remote-exec'...
aws_instance.test (remote-exec): Connecting to remote host via SSH...
aws_instance.test (remote-exec): Host: 54.86.117.120
aws_instance.test (remote-exec): User: root
aws_instance.test (remote-exec): Password: false
aws_instance.test (remote-exec): Private key: true
aws_instance.test (remote-exec): SSH Agent: false
aws_instance.test (remote-exec): Connected!
Error applying plan:

1 error(s) occurred:

Failed to upload script: Error reading script: EOF

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Script :

provider "aws" {
access_key = "aws_access_key"
secret_key = "aws_secret_key"
region = "us-east-1"
}

resource "aws_instance" "test" {
ami = "ami-97d490fd"
instance_type = "t1.micro"
vpc_security_group_ids = ["sg-4d5b1f2a"]
key_name = "newkvp"
provisioner "file" {
source = "script.sh"
destination = "/tmp/script.sh"
connection {
agent = false
user = "root"
key_file = "newkvp.pem"
}
}

provisioner "remote-exec" {
    inline = [
      "echo 1"
    ]
    connection {
           agent = false
           user = "root"
           key_file = "newkvp.pem"
    }
}

tags {
    Name = "test_server"
}

}

Here, AMI I am using is developed by packer with centos6.5, and also tried this code with the different versions(v0.6.6, v0.6.4, v0.6.0) of terraform.

Please provide me any solution regarding this as I am unable to execute any single command to remote resource with terraform.

chin2chavda on 24 Nov 2015

Hi @chin2chavda can you open this as a fresh GitHub issue? We'll take a look and see what's up!

phinze on 3 Dec 2015

Sure @phinze

chin2chavda on 7 Dec 2015

Here is link of new issue created :
https://github.com/hashicorp/terraform/issues/4186

chin2chavda on 7 Dec 2015

This is still an issue with tf 0.12.1

Jakexx360 on 25 Jun 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.