Terraform: Error finding route after creating it: error finding matching route for Route table (rtb-xxxxx) and destination CIDR block (xxx.xxx.xxx.xxx/xxx)

Created on 30 Aug 2016 · 16Comments · Source: hashicorp/terraform

Terraform Version

Terraform v0.7.2

Affected Resource(s)

aws_route

Terraform Configuration Files

I'm using Terraform to create a VPC with a number of subnets, routes, etc. Here is the code for the problematic route:

resource "aws_route" "nat" {
    count = "${var.num_availability_zones}"
    route_table_id = "${element(aws_route_table.private-app.*.id, count.index)}"
    destination_cidr_block = "0.0.0.0/0"
    nat_gateway_id = "${element(aws_nat_gateway.nat.*.id, count.index)}"
}

Expected Behavior

I should be able to run terraform apply without errors.

Actual Behavior

2 error(s) occurred:

* aws_route.nat.0: Error finding route after creating it: error finding matching route for Route table (rtb-7b48ec1d) and destination CIDR block (0.0.0.0/0)
* aws_route.nat.2: Error finding route after creating it: error finding matching route for Route table (rtb-7f48ec19) and destination CIDR block (0.0.0.0/0)

Steps to Reproduce

terraform apply

Important Factoids

Ever since the upgrade to Terraform 0.7.x, I am seeing a series of intermittent (probably eventual consistency) errors with VPCs, subnets, route tables, etc. For related bugs, see:

Also, I had previously hit this sort of issues with a more complicated VPC setup and font a workaround, as reported in https://github.com/hashicorp/terraform/issues/7527, but the workaround doesn't help with Terraform 0.7.x, and I'm seeing these issues even with simpler setups.

bug provideaws

Source

brikis98

👍7

Most helpful comment

Digging through the Terraform code, I found that this error message comes from the resourceAwsRouteCreate function, in this snippet:

var route *ec2.Route
err = resource.Retry(15*time.Second, func() *resource.RetryError {
  route, err = findResourceRoute(conn, d.Get("route_table_id").(string), d.Get("destination_cidr_block").(string))
  return resource.RetryableError(err)
})
if err != nil {
  return fmt.Errorf("Error finding route after creating it: %s", err)
}

In other words, the resourceAwsRouteCreate function creates the route and then tries to fetch that route back from the AWS API, retrying for up to 15 seconds before giving up and returning the "Error finding route after creating it" error message. Is it possible that under certain circumstances, it takes more than 15 seconds for the route to propagate everywhere in AWS?

In particular, I've noticed that I get this error more often than another developer on my team. We are both testing in us-east-1, but he's located in the US, while I'm in Europe. Perhaps his AWS API read traffic is being routed directly to us-east-1, whereas mine are being routed to replicas in Europe. Those far away replicas would take longer to "catch up" than the closer ones and would explain why I see these errors so much more often.

Is there any reason not to wait longer, like 1 or 2 minutes? If things are working normally, you lose nothing, as it'll succeed long before then; if things are delayed, you _need_ to wait that long anyway; and if things are actually broken, than waiting an extra minute or two to find out isn't a big deal. If that sounds reasonable, I'm happy to submit a PR.

brikis98 on 1 Sep 2016

👍6

All 16 comments

Update: I've found, through trial and error and copying code examples I found online, that most of the related issues listed in the "Important Factoids" section above are resolved by adding two depends_on entries to each aws_route resource: one that points to the Internet Gateway in the VPC and one that points to the corresponding aws_route_table resource.

resource "aws_route" "internet" {
    route_table_id = "${aws_route_table.public.id}"
    destination_cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.main.id}"

    # A workaround for a series of eventual consistency bugs in Terraform. For a list of the errors, see the related
    # bugs described in this issue: https://github.com/hashicorp/terraform/issues/8542. The workaround is based on:
    # https://github.com/hashicorp/terraform/issues/5335 and https://charity.wtf/2016/04/14/scrapbag-of-useful-terraform-tips/
    depends_on = ["aws_internet_gateway.main", "aws_route_table.public"]
}

I have no idea why that helps, but it gets rid of _most_ issues. The only one it does NOT get rid of is the "Error finding route after creating it" error described in this issue. That problem still appears from time to time, and once you hit it, there is no way to recover due to https://github.com/hashicorp/terraform/issues/7993#issuecomment-243187837.

brikis98 on 30 Aug 2016

👍1

Digging through the Terraform code, I found that this error message comes from the resourceAwsRouteCreate function, in this snippet:

var route *ec2.Route
err = resource.Retry(15*time.Second, func() *resource.RetryError {
  route, err = findResourceRoute(conn, d.Get("route_table_id").(string), d.Get("destination_cidr_block").(string))
  return resource.RetryableError(err)
})
if err != nil {
  return fmt.Errorf("Error finding route after creating it: %s", err)
}

brikis98 on 1 Sep 2016

👍6

Same thing is happening to me, and it feels completely random. Looking in the AWS console the route is created fine, but Terraform still fails with the error above.

keith-miller on 2 Sep 2016

As as Ops guy, I would love for this problem to be addressed. I have pretty much accepted the random nature of my terraform deployments due to this issue. Fortunately the routes get created even though they aren't committed to the state file, so my deployment is operational, I just don't get a clean terraform exit code and have to manually remove the blackhole routes after a destroy. Frustrating...

bkc1 on 2 Sep 2016

👍1

@jen20 @stack72 something for 0.7.4? :)

imduffy15 on 13 Sep 2016

👍1

Thanks for all your research on this, @brikis98. Would also like to see this issue get some priority. Running into this in 0.6.16 too, and a 15 second timeout seems overly optimistic to me..

ti-mo on 28 Sep 2016

+1 struggling with the same here, version 0.7.4

jswierad on 3 Oct 2016

+1, Terraform v0.7.5

ghost on 17 Oct 2016

@mitchellh Any word on this? This causes perpetual errors in run output, and there's no way of using import on aws_route as a workaround..

ti-mo on 17 Oct 2016

👍5

Hey all – I've bumped the post-create read timeout to 2 minutes, up from 15 seconds, in https://github.com/hashicorp/terraform/commit/3fbf01ea1b89cc6d16bf0981ce1355f0f5f9a9c8 . I hope that's sufficient for this issue, but please let me know if you feel I need to make further adjustments to the timeouts. I'm going to close this for now, thanks!

catsby on 21 Oct 2016

@catsby Thank you! I hope that'll do the trick. I'll try out the new version and report back if I hit this again.

brikis98 on 21 Oct 2016

Thank you, @catsby! What version will this commit be released under?

ti-mo on 24 Oct 2016

@ti-mo sorry I can't say more than "the next release" :/

We're trying to keep a ~2 week pace of releases, which would put it at next week, but I make no promises 😄

catsby on 24 Oct 2016

I just ran into this exact same issue today:

aws_route.internal.0: Error finding route after creating it: Unable to find matching route for Route Table (rtb-xxxxxxxx) and destination CIDR block (0.0.0.0/0).

I was creating a VPC from the segment.io stack project. It of course left the VPC irrecoverable via terraform. I had to destroy it and try again. The second time everything came up fine. I guess this wouldn't be a problem when changing attributes of an already functioning VPC provisioned with terraform. Still, not a very good first experience with building VPCs this way.

Maybe we should increase the timeout again? 10 minutes?

bshelton229 on 15 Feb 2017

Hitting this error as well. Our Terraform is running eu-central-1 and deploying to us-east-1. We haven't seen this issue when Terraform is both running and deploying to the same eu-central-1.

Ran into this issue two times in 24 hours. Only doing maybe 10 terraform apply

ericfrederich on 5 Apr 2018

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.