Terraform-provider-google: GCP Peering does not work

Created on 12 Feb 2019 · 7Comments · Source: hashicorp/terraform-provider-google

Terraform Version

Terraform v0.11.11
+ provider.google v1.20.0

Affected Resource(s)

google_compute_network_peering

Terraform Configuration Files

// Peer the infra vpc with the dev vpc.
resource "google_compute_network_peering" "infra_dev" {
  name         = "infra-dev"
  network      = "${module.infra_network.network_self_link}"
  peer_network = "${module.dev_network.network_self_link}"
}

// Peer the dev vpc with the infra vpc.
resource "google_compute_network_peering" "dev_infra" {
  name         = "dev-infra"
  network      = "${module.dev_network.network_self_link}"
  peer_network = "${module.infra_network.network_self_link}"
  depends_on   = ["google_compute_network_peering.infra_dev"]
}

Debug Output

Error: Error applying plan:

1 error(s) occurred:

* google_compute_network_peering.dev_infra: 1 error(s) occurred:

* google_compute_network_peering.dev_infra: Error adding network peering: googleapi: Error 400: There is a route operation in progress on the local or peer network. Try again later., badRequest

Expected Behavior

The infra and the dev vpc should be peered.

Actual Behavior

See error. This basically means that peering is broken. The depends_on has no effect. The use of depends_on in dev_infra should mean that it WAITS until the first peering operation completes thereby fulfilling the GCP API requirement of one peering operation at a time.

Steps to Reproduce

terraform apply

bug

Source

dgarstang

👍5

Most helpful comment

@emilymye this is because you need more networks and peerings to reproduce the race condition.

We have a lot of different networks in GCP using a shared VPC. Each service lies in its own separated network and we need to peer each network to allow communication between relevant services.

We hit the race condition every single time and without a dependency hack using input/output it would take like 20 iterations of plan/apply to have all the peerings created from scratch.

Now Terraform team seems to want to let Terraform being dumb regarding parallelism, I mean dumb in a good way. And let the provider to take care of provider specific implementation details like the parallelism issue we have here, i.e. "In GCP it is not possible to peer a network with several other networks at the same time".

Our solution is to reproduce a graph of dependency of the peerings using input/output:

1) We have a module to peer two networks in both direction and we use depends_on to do them in sequence:

# Terraform module: gcp/google/vpc_network/network_peering
# Peer a network with another.

# Note: a network cannot be peered to multiple networks simultaneously.
# We have to create the peering sequentially thus you'll notice some hacks
# to be able to do so with Terraform 0.12

resource "google_compute_network_peering" "network" {
  name         = "${var.network_name}-${var.peered_network_name}"
  network      = "${var.network_link}"
  peer_network = "${var.peered_network_link}"
}

resource "google_compute_network_peering" "peered_network" {
  depends_on = ["google_compute_network_peering.network"]

  name         = "${var.peered_network_name}-${var.network_name}"
  network      = "${var.peered_network_link}"
  peer_network = "${var.network_link}"
}

2) This module declares its own network link inputs as outputs:

# Outputs for gcp/google/vpc_network/network_peering module

# Modules dependency hack as of Terraform 0.12
# We use the network variable to define a chain of dependencies between the
# different calls of this module.
# Note that the values seem to be reversed but this is expected as we use the
# google_compute_network_peering.peered_network resource which is the last one
# to be created.
# Inspired from:
# https://github.com/hashicorp/terraform/issues/1178#issuecomment-207369534
output "network_link"        { value = "${google_compute_network_peering.peered_network.peer_network}" }
output "peered_network_link" { value = "${google_compute_network_peering.peered_network.network}" }

3) Then the module callers can reproduce the dependency graph like the following (note that we have another module to actually create the network, they have the name <name>_network):

module "peering_A_B" 
  source = "../../vpc_network/network_peering"

  network_name        = module.A_network.project_name
  network_link        = module.A_network.network_link
  peered_network_name = module.B_network.project_name
  peered_network_link = module.B_network.network_link
}

module "peering_B_C" {
  source = "../../vpc_network/network_peering"

  network_name        = module.B_network.project_name
  network_link        = module.peering_A_B.peered_network_link
  peered_network_name = module.C_network.project_name
  peered_network_link = module.C_network.network_link
}

module "peering_A_C" {
  source = "../../vpc_network/network_peering"

  network_name        = module.A_network.project_name
  network_link        = module.peering_A_B.network_link
  peered_network_name = module.C_network.project_name
  peered_network_link = module.peering_B_C.peered_network_link
}

The example above reproduces the following graph:

A -> B -> C
|__________^

The above solution effectively peers in sequence A-B then B-C then A-C. If we don't do that then Terraform will do all the 3 peerings at the same time which will fail 2 times and require 3 apply iterations to complete. The first time the B-C and A-C peering will fail because A-B is being peered. The second time A-C will fail because B-C is being peered. The third time A-C will be created.

So it would be upra-supra-mega cool if the Google provider could handle this for us, one possible way would be that the provider allows only one peering resource to run at any given time. It will be slower but will work in one pass and we can use count in our peering resources, saving a lot of management burden because the example above is simple, in a real use-case it becomes much more harder to maintain the graph.

syl20bnr on 31 May 2019

👍8

All 7 comments

Hi @dgarstang - is this a different issue than https://github.com/terraform-providers/terraform-provider-google/issues/3026?

emilymye on 12 Feb 2019

I will close #3026. I think this ticket describes the situation more clearly.

dgarstang on 12 Feb 2019

Hmm, I can't seem to recreate this issue with the following config:

// Peer the infra vpc with the dev vpc.
resource "google_compute_network_peering" "infra_dev" {
  name         = "infra-dev"
  network      = "${google_compute_network.infra_network.self_link}"
  peer_network = "${google_compute_network.dev_network.self_link}"
}

// Peer the dev vpc with the infra vpc.
resource "google_compute_network_peering" "dev_infra" {
  name         = "dev-infra"
  network      = "${google_compute_network.dev_network.self_link}"
  peer_network = "${google_compute_network.infra_network.self_link}"
  depends_on   = ["google_compute_network_peering.infra_dev"]
}

resource "google_compute_network" "infra_network" {
  name                    = "prodfoobar"
  auto_create_subnetworks = "false"
}

resource "google_compute_network" "dev_network" {
  name                    = "devfoobar"
  auto_create_subnetworks = "false"
}

Do you mind running with the full debug logs? i.e. TF_LOG="DEBUG"

emilymye on 12 Feb 2019

I have been working around this with a null resource:

resource "google_compute_network_peering" "to" {
  name         = "to"
  network      = "network-2"
  peer_network = "network-1"
}

resource "google_compute_network_peering" "from" {
  name         = "from"
  network      = "network-1"
  peer_network = "network-2"

  // only one operation at a time for network peering, so we need an explicit serialization
  depends_on = ["null_resource.force_networks_in_order"]
}

resource "null_resource" "force_networks_in_order" {
  provisioner "local-exec" {
    command = "echo ${google_compute_network_peering.to.id}"
  }
}

JackDavidson on 16 Mar 2019

@emilymye this is because you need more networks and peerings to reproduce the race condition.

We have a lot of different networks in GCP using a shared VPC. Each service lies in its own separated network and we need to peer each network to allow communication between relevant services.

We hit the race condition every single time and without a dependency hack using input/output it would take like 20 iterations of plan/apply to have all the peerings created from scratch.

Our solution is to reproduce a graph of dependency of the peerings using input/output:

1) We have a module to peer two networks in both direction and we use depends_on to do them in sequence:

# Terraform module: gcp/google/vpc_network/network_peering
# Peer a network with another.

# Note: a network cannot be peered to multiple networks simultaneously.
# We have to create the peering sequentially thus you'll notice some hacks
# to be able to do so with Terraform 0.12

resource "google_compute_network_peering" "network" {
  name         = "${var.network_name}-${var.peered_network_name}"
  network      = "${var.network_link}"
  peer_network = "${var.peered_network_link}"
}

resource "google_compute_network_peering" "peered_network" {
  depends_on = ["google_compute_network_peering.network"]

  name         = "${var.peered_network_name}-${var.network_name}"
  network      = "${var.peered_network_link}"
  peer_network = "${var.network_link}"
}

2) This module declares its own network link inputs as outputs:

# Outputs for gcp/google/vpc_network/network_peering module

# Modules dependency hack as of Terraform 0.12
# We use the network variable to define a chain of dependencies between the
# different calls of this module.
# Note that the values seem to be reversed but this is expected as we use the
# google_compute_network_peering.peered_network resource which is the last one
# to be created.
# Inspired from:
# https://github.com/hashicorp/terraform/issues/1178#issuecomment-207369534
output "network_link"        { value = "${google_compute_network_peering.peered_network.peer_network}" }
output "peered_network_link" { value = "${google_compute_network_peering.peered_network.network}" }

3) Then the module callers can reproduce the dependency graph like the following (note that we have another module to actually create the network, they have the name <name>_network):

module "peering_A_B" 
  source = "../../vpc_network/network_peering"

  network_name        = module.A_network.project_name
  network_link        = module.A_network.network_link
  peered_network_name = module.B_network.project_name
  peered_network_link = module.B_network.network_link
}

module "peering_B_C" {
  source = "../../vpc_network/network_peering"

  network_name        = module.B_network.project_name
  network_link        = module.peering_A_B.peered_network_link
  peered_network_name = module.C_network.project_name
  peered_network_link = module.C_network.network_link
}

module "peering_A_C" {
  source = "../../vpc_network/network_peering"

  network_name        = module.A_network.project_name
  network_link        = module.peering_A_B.network_link
  peered_network_name = module.C_network.project_name
  peered_network_link = module.peering_B_C.peered_network_link
}

The example above reproduces the following graph:

A -> B -> C
|__________^

syl20bnr on 31 May 2019

👍8

An easy way to reproduce this is to use google_compute_network_peering with a count of networks. Setting up a hub and spoke network with counts causes this error every single time.

variable "organization_id" {
  description = "The organization where the projects and folders should be created"
  type        = "string"
}

variable "billing_account_id" {
  description = "The ID of the billing account resources should be created under (XXXXXX-XXXXX-XXXXXX)"
  type        = "string"
}

variable "labels" {
  description = "Map of labels that will be applied to all resources that have labels"
  type        = "map"
}

variable "number_of_spokes" {
  description = "How many VPCs should be created and peered with the hub"
  type        = "string"
  default     = 4
}

resource "google_project" "compute_project" {
  name                = "compute-project"
  project_id          = "project-${random_id.compute_project.hex}"
  org_id              = "${var.organization_id}"
  billing_account     = "${var.billing_account_id}"
  labels              = "${var.labels}"
  auto_create_network = false
}

resource "random_id" "compute_project" {
  byte_length = 4
}
resource "google_compute_network" "hub_network" {
  name                            = "hub-network"
  project                         = "${google_project.compute_project.id}"
  auto_create_subnetworks         = false
  delete_default_routes_on_create = true
}

resource "google_compute_subnetwork" "hub_subnetwork" {
  provider         = "google-beta"
  name             = "hub-subnetwork"
  project          = "${google_project.compute_project.id}"
  ip_cidr_range    = "10.1.1.0/24"
  region           = "us-central1"
  network          = "${google_compute_network.hub_network.self_link}"
  enable_flow_logs = true
  log_config {
    aggregation_interval = "INTERVAL_10_MIN"
    flow_sampling        = 0.5
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

resource "google_compute_firewall" "ingress" {
  provider = "google-beta"

  name           = "hub-firewall"
  network        = "${google_compute_network.hub_network.name}"
  project        = "${google_project.compute_project.id}"
  enable_logging = true

  allow {
    protocol = "tcp"
    ports = [
      "80",  //http
      "443", //https
      "22"   //ssh
    ]
  }
}

resource "google_compute_route" "internet" {
  name    = "hub-network"
  project = "${google_project.compute_project.id}"

  dest_range       = "0.0.0.0/0"
  network          = "${google_compute_network.hub_network.name}"
  next_hop_gateway = "default-internet-gateway"
  priority         = 1
}

resource "google_compute_network" "vpc_network" {
  count = "${var.number_of_spokes}"

  name                            = "spoke-network-${count.index}"
  project                         = "${google_project.compute_project.id}"
  auto_create_subnetworks         = false
  delete_default_routes_on_create = true

  depends_on = ["google_compute_subnetwork.hub_subnetwork"]
}

resource "random_id" "vpc_network" {
  count       = "${var.number_of_spokes}"
  byte_length = 4
}

resource "google_compute_subnetwork" "vpc_subnetwork" {
  count = length(google_compute_network.vpc_network)

  provider         = "google-beta"
  name             = "spoke-subnetwork-${count.index}"
  project          = "${google_project.compute_project.id}"
  ip_cidr_range    = "${cidrsubnet("10.1.1.0/16", 8, count.index + 2)}"
  region           = "us-central1"
  network          = "${element(google_compute_network.vpc_network.*.self_link, count.index)}"
  enable_flow_logs = true
  log_config {
    aggregation_interval = "INTERVAL_10_MIN"
    flow_sampling        = 0.5
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

resource "google_compute_network_peering" "hub_to_peer" {
  count = length(google_compute_network.vpc_network)

  name         = "hub-to-peer-${count.index}"
  network      = "${google_compute_network.hub_network.self_link}"
  peer_network = "${element(google_compute_network.vpc_network.*.self_link, count.index)}"

  depends_on = ["google_compute_subnetwork.vpc_subnetwork", "google_compute_subnetwork.hub_subnetwork"]
}

resource "google_compute_network_peering" "peer_to_hub" {
  count = length(google_compute_network.vpc_network)

  name         = "peer-to-hub-${count.index}"
  network      = "${element(google_compute_network.vpc_network.*.self_link, count.index)}"
  peer_network = "${google_compute_network.hub_network.self_link}"

  depends_on = ["google_compute_subnetwork.vpc_subnetwork", "google_compute_subnetwork.hub_subnetwork"]
}

I also agree that it would be really nice if this worked. Creating a wrapper resource just to fulfill this is pretty painful.

I also don't think anyone has yet mentioned the easiest workaround, which is using:
terraform apply -parallelism=1
That nicely sidesteps the issue, at the expense of deployment time increasing.

bruceharrison1984 on 6 Aug 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!