Terraform: EC2 auto recovery

Created on 1 Mar 2016 · 12Comments · Source: hashicorp/terraform

I'm trying to setup auto recovery for an EC2 instance. I have the ARN and properties defined in the alarm, but can't see a way to tie that to the instance.

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name = "ec2-autorecover"
  namespace = "AWS/EC2"
  evaluation_periods = "2"
  period = "60"
  alarm_description = "This metric auto recovers EC2 instances"
  alarm_actions = ["arn:aws:automate:${var.region}:ec2:recover"]
}

Is auto recovery possible, and if so, what am I missing?

bug provideaws

Source

toddrosner

Most helpful comment

I have gotten an autorecover CloudWatch alarm created as follows:

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name          = "ec2-autorecover"
  namespace           = "AWS/EC2"
  evaluation_periods  = "2"
  period              = "60"
  alarm_description   = "This metric auto recovers EC2 instances"
  alarm_actions       = ["arn:aws:automate:${var.aws_region}:ec2:recover"]
  statistic           = "Minimum"
  comparison_operator = "GreaterThanThreshold"
  threshold           = "0"
  metric_name         = "StatusCheckFailed_System"
  dimensions {
      InstanceId = "${aws_instance.app.id}"
  }
}

I replicated the CloudFormation definition for an EC2 autorecover alarm from the Amazon documentation - seems you just need to pass dummy values for the missing parameters.

I'm still unable to test that the autorecover alarm is actually working, but at least the resource is created!

craigwatson on 26 May 2016

👍11

All 12 comments

Looks like this might not be possible and that the alarms are really only useful with auto scaling polices. After trying a few things, I get the following error:

Errors:

  * aws_cloudwatch_metric_alarm.instance: "threshold": required field is not set
  * aws_cloudwatch_metric_alarm.instance: "comparison_operator": required field is not set
  * aws_cloudwatch_metric_alarm.instance: "metric_name": required field is not set
  * aws_cloudwatch_metric_alarm.instance: "statistic": required field is not set

These properties aren't available when setting up auto recovery in the console, so I assume that setting up auto recovery in Terraform is not possible at this time. Someone please correct me if I'm wrong.

toddrosner on 1 Mar 2016

Hi @toddrosner! Thanks for opening this issue. I think this is related to https://github.com/hashicorp/terraform/issues/5390 also, but since this one is (just about!) earlier, I'm going to keep this one open for future discussion. It looks like this is not supported right now in Terraform, but I'd imagine it should be possible, so I'll tag this as an enhancement.

jen20 on 3 Mar 2016

@jen20 Thanks for the update. Hopefully we'll see this implemented soon.

toddrosner on 6 Mar 2016

👍4

I have gotten an autorecover CloudWatch alarm created as follows:

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name          = "ec2-autorecover"
  namespace           = "AWS/EC2"
  evaluation_periods  = "2"
  period              = "60"
  alarm_description   = "This metric auto recovers EC2 instances"
  alarm_actions       = ["arn:aws:automate:${var.aws_region}:ec2:recover"]
  statistic           = "Minimum"
  comparison_operator = "GreaterThanThreshold"
  threshold           = "0"
  metric_name         = "StatusCheckFailed_System"
  dimensions {
      InstanceId = "${aws_instance.app.id}"
  }
}

I replicated the CloudFormation definition for an EC2 autorecover alarm from the Amazon documentation - seems you just need to pass dummy values for the missing parameters.

I'm still unable to test that the autorecover alarm is actually working, but at least the resource is created!

craigwatson on 26 May 2016

👍11

Oddly enough, @craigwatson 's solution is correct, even though I still see the same behavior from #5390. That is, I cannot create the autorecover action in the console (greyed out, claiming unsupported instance type even though EBS only, etc) yet, creating the alarm via TF seems to have succeeded. As mentioned though, there's no way to test autorecovery in AWS so we'll all have to wait for somebody's instance to die and for them to report back :)

br0ch0n on 19 Jul 2016

Hi folks

I am going to close this out. I was able to get this working as follows:

provider "aws" {
  region = "us-west-2"
}

resource "aws_cloudwatch_metric_alarm" "autorecover" {
  alarm_name          = "ec2-autorecover"
  namespace           = "AWS/EC2"
  evaluation_periods  = "2"
  period              = "60"
  alarm_description   = "This metric auto recovers EC2 instances"
  alarm_actions       = ["arn:aws:automate:us-west-2:ec2:recover"]
  statistic           = "Minimum"
  comparison_operator = "GreaterThanThreshold"
  threshold           = "1"
  metric_name         = "StatusCheckFailed_System"
  dimensions {
      InstanceId = "${aws_instance.app.id}"
  }
}

resource "aws_internet_gateway" "foo" {
    vpc_id = "${aws_vpc.foo.id}"
    tags {
        bar = "baz"
    }
}

resource "aws_vpc" "foo" {
    cidr_block = "10.50.0.0/16"
}

resource "aws_subnet" "foo" {
    cidr_block = "10.50.1.0/24"
    vpc_id = "${aws_vpc.foo.id}"
}

resource "aws_instance" "app" {
    ami = "ami-5fe5423f"
    instance_type = "m3.medium"
    subnet_id = "${aws_subnet.foo.id}"
}

Hope this helps

Paul

stack72 on 26 Oct 2016

👍5

@br0ch0n
Sadly our AWS support guy tells us it won't work :( When #8455 is done, it can resolve this issue.

timonwong on 5 Dec 2016

Even with #8455, I don't seem to be able to create the auto-recovery alarm in the AWS console for an EC2 instance created like this:

resource "aws_instance" "instance" {
  ami                    = "${var.ami}"
  instance_type          = "${var.instance_type}"
  subnet_id              = "${var.subnet_id}"
  vpc_security_group_ids = ["${var.security_group_ids}"]
  private_ip             = "${var.private_ip}"
  key_name               = "${var.key_name}"
  root_block_device {
    volume_type = "gp2"
    volume_size = 8
  }
  ephemeral_block_device {
    device_name = "/dev/sdb"
    no_device = true
  }
  tags {
    Name                 = "${var.name}"
  }

I got the error:

The EC2 'Recover' Action is not valid for the associated instance. Please remove or change to a different EC2 action.

I tried this with an m3.medium instance, and confirmed that no instance storage was mounted (whereas if I left out the ephemeral_block_device, there _was_ instance storage mounted).

I created an _identical_ instance in the console (e.g. running a diff between the results of aws ec2 describe-instances on the two instances only shows the expected differences like instance ID and IP address), and I _was_ able to create the alarm for that one.

borsboom on 24 Feb 2017

@borsboom ~I think you can only use it with EBS-only instance types.~

Some AMIs contains more than one ephemeral block devices definitions, you need to exclude them all, they are not visible in neither the Console nor the API. You can only query the AMIs, and find out them in the "Block Devices" section.

timonwong on 25 Feb 2017

@timonwong: thank you! Once I add no_device ephemeral_block_devices for _both_ that were defined by the AMI, I was able to create the auto-recovery alarm in the AWS console. That gives me much more confidence that a Terraform-created auto-recovery alarm will work as well.

For anyone finding this issue in the future, you can find the block devices for an AMI using something like this:

$ aws ec2 describe-images --image-ids=ami-a58d0dc5
{
    "Images": [
        {
            [...snip...]
            "BlockDeviceMappings": [
                [...snip...]
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            [...snip...]
        }
    ]
}

This translates into the following ephemeral_block_devices:

resource "aws_instance" "instance" {
  [...snip...]
  ephemeral_block_device {
    device_name = "/dev/sdb"
    no_device = true
  }
  ephemeral_block_device {
    device_name = "/dev/sdc"
    no_device = true
  }
}

borsboom on 25 Feb 2017

Thanks for all the info. I went through similar issues mentioned in this post and finally came up with the full implementation here: https://github.com/cxmcc/tf_aws_ec2_auto_recovery Please let me know if you have any feedback.

cxmcc on 18 Aug 2017

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.