I'm trying to setup auto recovery for an EC2 instance. I have the ARN and properties defined in the alarm, but can't see a way to tie that to the instance.
resource "aws_cloudwatch_metric_alarm" "autorecover" {
alarm_name = "ec2-autorecover"
namespace = "AWS/EC2"
evaluation_periods = "2"
period = "60"
alarm_description = "This metric auto recovers EC2 instances"
alarm_actions = ["arn:aws:automate:${var.region}:ec2:recover"]
}
Is auto recovery possible, and if so, what am I missing?
Looks like this might not be possible and that the alarms are really only useful with auto scaling polices. After trying a few things, I get the following error:
Errors:
* aws_cloudwatch_metric_alarm.instance: "threshold": required field is not set
* aws_cloudwatch_metric_alarm.instance: "comparison_operator": required field is not set
* aws_cloudwatch_metric_alarm.instance: "metric_name": required field is not set
* aws_cloudwatch_metric_alarm.instance: "statistic": required field is not set
These properties aren't available when setting up auto recovery in the console, so I assume that setting up auto recovery in Terraform is not possible at this time. Someone please correct me if I'm wrong.
Hi @toddrosner! Thanks for opening this issue. I think this is related to https://github.com/hashicorp/terraform/issues/5390 also, but since this one is (just about!) earlier, I'm going to keep this one open for future discussion. It looks like this is not supported right now in Terraform, but I'd imagine it should be possible, so I'll tag this as an enhancement.
@jen20 Thanks for the update. Hopefully we'll see this implemented soon.
I have gotten an autorecover CloudWatch alarm created as follows:
resource "aws_cloudwatch_metric_alarm" "autorecover" {
alarm_name = "ec2-autorecover"
namespace = "AWS/EC2"
evaluation_periods = "2"
period = "60"
alarm_description = "This metric auto recovers EC2 instances"
alarm_actions = ["arn:aws:automate:${var.aws_region}:ec2:recover"]
statistic = "Minimum"
comparison_operator = "GreaterThanThreshold"
threshold = "0"
metric_name = "StatusCheckFailed_System"
dimensions {
InstanceId = "${aws_instance.app.id}"
}
}
I replicated the CloudFormation definition for an EC2 autorecover alarm from the Amazon documentation - seems you just need to pass dummy values for the missing parameters.
I'm still unable to test that the autorecover alarm is actually working, but at least the resource is created!
Oddly enough, @craigwatson 's solution is correct, even though I still see the same behavior from #5390. That is, I cannot create the autorecover action in the console (greyed out, claiming unsupported instance type even though EBS only, etc) yet, creating the alarm via TF seems to have succeeded. As mentioned though, there's no way to test autorecovery in AWS so we'll all have to wait for somebody's instance to die and for them to report back :)
Hi folks
I am going to close this out. I was able to get this working as follows:
provider "aws" {
region = "us-west-2"
}
resource "aws_cloudwatch_metric_alarm" "autorecover" {
alarm_name = "ec2-autorecover"
namespace = "AWS/EC2"
evaluation_periods = "2"
period = "60"
alarm_description = "This metric auto recovers EC2 instances"
alarm_actions = ["arn:aws:automate:us-west-2:ec2:recover"]
statistic = "Minimum"
comparison_operator = "GreaterThanThreshold"
threshold = "1"
metric_name = "StatusCheckFailed_System"
dimensions {
InstanceId = "${aws_instance.app.id}"
}
}
resource "aws_internet_gateway" "foo" {
vpc_id = "${aws_vpc.foo.id}"
tags {
bar = "baz"
}
}
resource "aws_vpc" "foo" {
cidr_block = "10.50.0.0/16"
}
resource "aws_subnet" "foo" {
cidr_block = "10.50.1.0/24"
vpc_id = "${aws_vpc.foo.id}"
}
resource "aws_instance" "app" {
ami = "ami-5fe5423f"
instance_type = "m3.medium"
subnet_id = "${aws_subnet.foo.id}"
}
Hope this helps
Paul
@br0ch0n
Sadly our AWS support guy tells us it won't work :( When #8455 is done, it can resolve this issue.
Even with #8455, I don't seem to be able to create the auto-recovery alarm in the AWS console for an EC2 instance created like this:
resource "aws_instance" "instance" {
ami = "${var.ami}"
instance_type = "${var.instance_type}"
subnet_id = "${var.subnet_id}"
vpc_security_group_ids = ["${var.security_group_ids}"]
private_ip = "${var.private_ip}"
key_name = "${var.key_name}"
root_block_device {
volume_type = "gp2"
volume_size = 8
}
ephemeral_block_device {
device_name = "/dev/sdb"
no_device = true
}
tags {
Name = "${var.name}"
}
I got the error:
The EC2 'Recover' Action is not valid for the associated instance. Please remove or change to a different EC2 action.
I tried this with an m3.medium instance, and confirmed that no instance storage was mounted (whereas if I left out the ephemeral_block_device, there _was_ instance storage mounted).
I created an _identical_ instance in the console (e.g. running a diff between the results of aws ec2 describe-instances on the two instances only shows the expected differences like instance ID and IP address), and I _was_ able to create the alarm for that one.
@borsboom ~I think you can only use it with EBS-only instance types.~
Some AMIs contains more than one ephemeral block devices definitions, you need to exclude them all, they are not visible in neither the Console nor the API. You can only query the AMIs, and find out them in the "Block Devices" section.
@timonwong: thank you! Once I add no_device ephemeral_block_devices for _both_ that were defined by the AMI, I was able to create the auto-recovery alarm in the AWS console. That gives me much more confidence that a Terraform-created auto-recovery alarm will work as well.
For anyone finding this issue in the future, you can find the block devices for an AMI using something like this:
$ aws ec2 describe-images --image-ids=ami-a58d0dc5
{
"Images": [
{
[...snip...]
"BlockDeviceMappings": [
[...snip...]
{
"DeviceName": "/dev/sdb",
"VirtualName": "ephemeral0"
},
{
"DeviceName": "/dev/sdc",
"VirtualName": "ephemeral1"
}
],
[...snip...]
}
]
}
This translates into the following ephemeral_block_devices:
resource "aws_instance" "instance" {
[...snip...]
ephemeral_block_device {
device_name = "/dev/sdb"
no_device = true
}
ephemeral_block_device {
device_name = "/dev/sdc"
no_device = true
}
}
Thanks for all the info. I went through similar issues mentioned in this post and finally came up with the full implementation here: https://github.com/cxmcc/tf_aws_ec2_auto_recovery Please let me know if you have any feedback.
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
I have gotten an autorecover CloudWatch alarm created as follows:
I replicated the CloudFormation definition for an EC2 autorecover alarm from the Amazon documentation - seems you just need to pass dummy values for the missing parameters.
I'm still unable to test that the autorecover alarm is actually working, but at least the resource is created!