Terraform v0.7.2
I'm using Terraform to create a VPC with a number of subnets, routes, etc. Here is the code for the problematic route:
resource "aws_route" "nat" {
count = "${var.num_availability_zones}"
route_table_id = "${element(aws_route_table.private-app.*.id, count.index)}"
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = "${element(aws_nat_gateway.nat.*.id, count.index)}"
}
I should be able to run terraform apply without errors.
2 error(s) occurred:
* aws_route.nat.0: Error finding route after creating it: error finding matching route for Route table (rtb-7b48ec1d) and destination CIDR block (0.0.0.0/0)
* aws_route.nat.2: Error finding route after creating it: error finding matching route for Route table (rtb-7f48ec19) and destination CIDR block (0.0.0.0/0)
terraform applyEver since the upgrade to Terraform 0.7.x, I am seeing a series of intermittent (probably eventual consistency) errors with VPCs, subnets, route tables, etc. For related bugs, see:
Also, I had previously hit this sort of issues with a more complicated VPC setup and font a workaround, as reported in https://github.com/hashicorp/terraform/issues/7527, but the workaround doesn't help with Terraform 0.7.x, and I'm seeing these issues even with simpler setups.
Update: I've found, through trial and error and copying code examples I found online, that most of the related issues listed in the "Important Factoids" section above are resolved by adding two depends_on entries to each aws_route resource: one that points to the Internet Gateway in the VPC and one that points to the corresponding aws_route_table resource.
resource "aws_route" "internet" {
route_table_id = "${aws_route_table.public.id}"
destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.main.id}"
# A workaround for a series of eventual consistency bugs in Terraform. For a list of the errors, see the related
# bugs described in this issue: https://github.com/hashicorp/terraform/issues/8542. The workaround is based on:
# https://github.com/hashicorp/terraform/issues/5335 and https://charity.wtf/2016/04/14/scrapbag-of-useful-terraform-tips/
depends_on = ["aws_internet_gateway.main", "aws_route_table.public"]
}
I have no idea why that helps, but it gets rid of _most_ issues. The only one it does NOT get rid of is the "Error finding route after creating it" error described in this issue. That problem still appears from time to time, and once you hit it, there is no way to recover due to https://github.com/hashicorp/terraform/issues/7993#issuecomment-243187837.
Digging through the Terraform code, I found that this error message comes from the resourceAwsRouteCreate function, in this snippet:
var route *ec2.Route
err = resource.Retry(15*time.Second, func() *resource.RetryError {
route, err = findResourceRoute(conn, d.Get("route_table_id").(string), d.Get("destination_cidr_block").(string))
return resource.RetryableError(err)
})
if err != nil {
return fmt.Errorf("Error finding route after creating it: %s", err)
}
In other words, the resourceAwsRouteCreate function creates the route and then tries to fetch that route back from the AWS API, retrying for up to 15 seconds before giving up and returning the "Error finding route after creating it" error message. Is it possible that under certain circumstances, it takes more than 15 seconds for the route to propagate everywhere in AWS?
In particular, I've noticed that I get this error more often than another developer on my team. We are both testing in us-east-1, but he's located in the US, while I'm in Europe. Perhaps his AWS API read traffic is being routed directly to us-east-1, whereas mine are being routed to replicas in Europe. Those far away replicas would take longer to "catch up" than the closer ones and would explain why I see these errors so much more often.
Is there any reason not to wait longer, like 1 or 2 minutes? If things are working normally, you lose nothing, as it'll succeed long before then; if things are delayed, you _need_ to wait that long anyway; and if things are actually broken, than waiting an extra minute or two to find out isn't a big deal. If that sounds reasonable, I'm happy to submit a PR.
Same thing is happening to me, and it feels completely random. Looking in the AWS console the route is created fine, but Terraform still fails with the error above.
As as Ops guy, I would love for this problem to be addressed. I have pretty much accepted the random nature of my terraform deployments due to this issue. Fortunately the routes get created even though they aren't committed to the state file, so my deployment is operational, I just don't get a clean terraform exit code and have to manually remove the blackhole routes after a destroy. Frustrating...
@jen20 @stack72 something for 0.7.4? :)
Thanks for all your research on this, @brikis98. Would also like to see this issue get some priority. Running into this in 0.6.16 too, and a 15 second timeout seems overly optimistic to me..
+1 struggling with the same here, version 0.7.4
+1, Terraform v0.7.5
@mitchellh Any word on this? This causes perpetual errors in run output, and there's no way of using import on aws_route as a workaround..
Hey all – I've bumped the post-create read timeout to 2 minutes, up from 15 seconds, in https://github.com/hashicorp/terraform/commit/3fbf01ea1b89cc6d16bf0981ce1355f0f5f9a9c8 . I hope that's sufficient for this issue, but please let me know if you feel I need to make further adjustments to the timeouts. I'm going to close this for now, thanks!
@catsby Thank you! I hope that'll do the trick. I'll try out the new version and report back if I hit this again.
Thank you, @catsby! What version will this commit be released under?
@ti-mo sorry I can't say more than "the next release" :/
We're trying to keep a ~2 week pace of releases, which would put it at next week, but I make no promises 😄
I just ran into this exact same issue today:
aws_route.internal.0: Error finding route after creating it: Unable to find matching route for Route Table (rtb-xxxxxxxx) and destination CIDR block (0.0.0.0/0).
I was creating a VPC from the segment.io stack project. It of course left the VPC irrecoverable via terraform. I had to destroy it and try again. The second time everything came up fine. I guess this wouldn't be a problem when changing attributes of an already functioning VPC provisioned with terraform. Still, not a very good first experience with building VPCs this way.
Maybe we should increase the timeout again? 10 minutes?
Hitting this error as well. Our Terraform is running eu-central-1 and deploying to us-east-1. We haven't seen this issue when Terraform is both running and deploying to the same eu-central-1.
Ran into this issue two times in 24 hours. Only doing maybe 10 terraform apply
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Digging through the Terraform code, I found that this error message comes from the
resourceAwsRouteCreatefunction, in this snippet:In other words, the
resourceAwsRouteCreatefunction creates the route and then tries to fetch that route back from the AWS API, retrying for up to 15 seconds before giving up and returning the "Error finding route after creating it" error message. Is it possible that under certain circumstances, it takes more than 15 seconds for the route to propagate everywhere in AWS?In particular, I've noticed that I get this error more often than another developer on my team. We are both testing in
us-east-1, but he's located in the US, while I'm in Europe. Perhaps his AWS API read traffic is being routed directly tous-east-1, whereas mine are being routed to replicas in Europe. Those far away replicas would take longer to "catch up" than the closer ones and would explain why I see these errors so much more often.Is there any reason not to wait longer, like 1 or 2 minutes? If things are working normally, you lose nothing, as it'll succeed long before then; if things are delayed, you _need_ to wait that long anyway; and if things are actually broken, than waiting an extra minute or two to find out isn't a big deal. If that sounds reasonable, I'm happy to submit a PR.