Is there a way of configuring timeouts?
azurerm_subnet.subnet: Creating...
address_prefix: "" => "10.0.2.0/24"
ip_configurations.#: "" => "<computed>"
name: "" => "bach"
network_security_group_id: "" => "<computed>"
resource_group_name: "" => "bach"
route_table_id: "" => "<computed>"
virtual_network_name: "" => "bach"
azurerm_subnet.subnet: Still creating... (10s elapsed)
azurerm_subnet.subnet: Still creating... (20s elapsed)
azurerm_subnet.subnet: Still creating... (30s elapsed)
azurerm_subnet.subnet: Still creating... (40s elapsed)
azurerm_subnet.subnet: Still creating... (50s elapsed)
azurerm_subnet.subnet: Still creating... (1m0s elapsed)
azurerm_subnet.subnet: Still creating... (1m10s elapsed)
azurerm_subnet.subnet: Still creating... (1m20s elapsed)
azurerm_subnet.subnet: Still creating... (1m30s elapsed)
azurerm_subnet.subnet: Still creating... (1m40s elapsed)
azurerm_subnet.subnet: Still creating... (1m50s elapsed)
azurerm_subnet.subnet: Still creating... (2m0s elapsed)
azurerm_subnet.subnet: Still creating... (2m10s elapsed)
azurerm_subnet.subnet: Still creating... (2m20s elapsed)
azurerm_subnet.subnet: Still creating... (2m30s elapsed)
azurerm_subnet.subnet: Still creating... (2m40s elapsed)
azurerm_subnet.subnet: Still creating... (2m50s elapsed)
azurerm_subnet.subnet: Still creating... (3m0s elapsed)
azurerm_subnet.subnet: Still creating... (3m10s elapsed)
azurerm_subnet.subnet: Still creating... (3m20s elapsed)
azurerm_subnet.subnet: Still creating... (3m30s elapsed)
azurerm_subnet.subnet: Still creating... (3m40s elapsed)
azurerm_subnet.subnet: Still creating... (3m50s elapsed)
azurerm_subnet.subnet: Still creating... (4m0s elapsed)
azurerm_subnet.subnet: Still creating... (4m10s elapsed)
azurerm_subnet.subnet: Still creating... (4m20s elapsed)
azurerm_subnet.subnet: Still creating... (4m30s elapsed)
azurerm_subnet.subnet: Still creating... (4m40s elapsed)
azurerm_subnet.subnet: Still creating... (4m50s elapsed)
azurerm_subnet.subnet: Still creating... (5m0s elapsed)
azurerm_subnet.subnet: Still creating... (5m10s elapsed)
azurerm_subnet.subnet: Still creating... (5m20s elapsed)
azurerm_subnet.subnet: Still creating... (5m30s elapsed)
azurerm_subnet.subnet: Still creating... (5m40s elapsed)
azurerm_subnet.subnet: Still creating... (5m50s elapsed)
azurerm_subnet.subnet: Still creating... (6m0s elapsed)
azurerm_subnet.subnet: Still creating... (6m10s elapsed)
azurerm_subnet.subnet: Still creating... (6m20s elapsed)
azurerm_subnet.subnet: Still creating... (6m30s elapsed)
azurerm_subnet.subnet: Still creating... (6m40s elapsed)
azurerm_subnet.subnet: Still creating... (6m50s elapsed)
azurerm_subnet.subnet: Still creating... (7m0s elapsed)
azurerm_subnet.subnet: Still creating... (7m10s elapsed)
azurerm_subnet.subnet: Still creating... (7m20s elapsed)
azurerm_subnet.subnet: Still creating... (7m30s elapsed)
azurerm_subnet.subnet: Still creating... (7m40s elapsed)
azurerm_subnet.subnet: Still creating... (7m50s elapsed)
azurerm_subnet.subnet: Still creating... (8m0s elapsed)
azurerm_subnet.subnet: Still creating... (8m10s elapsed)
azurerm_subnet.subnet: Still creating... (8m20s elapsed)
azurerm_subnet.subnet: Still creating... (8m30s elapsed)
azurerm_subnet.subnet: Still creating... (8m40s elapsed)
azurerm_subnet.subnet: Still creating... (8m50s elapsed)
azurerm_subnet.subnet: Still creating... (9m0s elapsed)
azurerm_subnet.subnet: Still creating... (9m10s elapsed)
azurerm_subnet.subnet: Still creating... (9m20s elapsed)
azurerm_subnet.subnet: Still creating... (9m30s elapsed)
azurerm_subnet.subnet: Still creating... (9m40s elapsed)
azurerm_subnet.subnet: Still creating... (9m50s elapsed)
azurerm_subnet.subnet: Still creating... (10m0s elapsed)
azurerm_subnet.subnet: Still creating... (10m10s elapsed)
Hey @mooperd
As you're seeing above - the time it takes to provision resources in Azure can very wildly - and thus the Azure SDK keeps polling for completion until either the resource is created or an error occurs. There's several resources in Azure which can take a considerable amount of time to provision (e.g. Storage Accounts can be up to 30m or Virtual Network Gateway's up to 2 hours).
Within Terraform, it's possible to specify a custom timeout for each resource - however each resource needs to opt-in for this - and as such we've not got this hooked up for the Azure resources yet. Is this a particular problem you're seeing consistently with the Subnet resource?
Thanks!
Hi,
I was having issues with it all afternoon. I was also playing around with
parameterising and environments so maybe something in terraform was broken.
I was able to create the subnet whilst waiting for terraform using the az
cli which i think supports the broken terraform hypothesis.
I'll investigate the issues i was having more but i think it would
nevertheless be very useful to be able to set timeouts as I'm intending on
using terraform in our CI/CD pipelines. Neverending processes are somewhat
annoying in this context.
Cheers,
Andrew
Some debug from my failing terraform apply: https://gist.github.com/anonymous/458beb2cf154ec20ba6d1c1430a06919
and the .tf:
provider "azurerm" {
subscription_id = "-"
client_id = "-"
client_secret = "-"
tenant_id = "-"
}
# Create a resource group
resource "azurerm_resource_group" "resource_group" {
name = "${var.resource_group}"
location = "West US"
}
resource "azurerm_virtual_network" "virtual_network" {
name = "${azurerm_resource_group.resource_group.name}"
address_space = ["10.0.0.0/16"]
location = "West US"
resource_group_name = "${azurerm_resource_group.resource_group.name}"
}
resource "azurerm_subnet" "subnet" {
name = "${azurerm_resource_group.resource_group.name}"
resource_group_name = "${azurerm_resource_group.resource_group.name}"
virtual_network_name = "${azurerm_virtual_network.virtual_network.name}"
address_prefix = "10.0.2.0/24"
}
resource "azurerm_public_ip" "testing123" {
name = "testing123"
location = "${azurerm_resource_group.resource_group.location}"
resource_group_name = "${azurerm_resource_group.resource_group.name}"
public_ip_address_allocation = "dynamic"
}
resource "azurerm_network_interface" "test" {
name = "acctni"
location = "West US"
resource_group_name = "${azurerm_resource_group.resource_group.name}"
ip_configuration {
name = "testconfiguration1"
subnet_id = "${azurerm_subnet.subnet.id}"
private_ip_address_allocation = "dynamic"
public_ip_address_id = "${azurerm_public_ip.testing123.id}"
}
}
resource "azurerm_virtual_machine" "test" {
name = "acctvm"
location = "West US"
resource_group_name = "${azurerm_resource_group.resource_group.name}"
network_interface_ids = ["${azurerm_network_interface.test.id}"]
vm_size = "Standard_A2_v2"
storage_os_disk {
name = "myosdisk1"
image_uri = "https://factor3packer.blob.core.windows.net/system/Microsoft.Compute/Images/images/packer-osDisk.828b9190-5bc0-462f-87aa-a80367c2959e.vhd"
vhd_uri = "https://factor3packer.blob.core.windows.net/images/need_random_value.vhd"
os_type = "linux"
caching = "ReadWrite"
create_option = "FromImage"
}
os_profile {
computer_name = "hostname"
admin_username = "centos"
admin_password = "X9deiX9dei"
}
os_profile_linux_config {
disable_password_authentication = false
}
tags {
environment = "staging"
}
}
Hey @mooperd
Thanks for posting your Terraform config.
I've taken a look and using your config I've been able to replicate this on Terraform 0.9.11 - and from what I can see this has been fixed in #6 which has been merged and is available in Terraform 0.10-rc1. I've also tested this config on Terraform 0.10-rc1 and can confirm the deadlock issue you're seeing is no longer present :)
The other issue regarding not being able to set timeouts on individual resources still stands however - and as such I'm going to make this issue an enhancement request for those - which we'll investigate adding in the near future :)
Thanks!
I need a configurable timeout on azurerm_virtual_machine
resources. I'm trying to spin up a bunch at once, and a lot of them fail due to timeout. But they're actually created in Azure so my state just gets out of sync.
Any chance this is happening soon?
EDIT: this appears to be because the virtual machine creations are queued for a while waiting on the first "batch" (parallelism) of machines to finish creation. They don't even start "Creating..." before they end up timing out.
+1
would be great for many resources ;-)
Error: Error applying plan:
28 error(s) occurred:
* azurerm_managed_disk.wf_disk_page[0] (destroy): 1 error(s) occurred:
* azurerm_managed_disk.wf_disk_page.0: azure#WaitForCompletion: context has been cancelled: StatusCode=204 -- Original Error: context deadline exceeded
...
@tombuildsstuff can you guys take a relook on this?? Did timeout values change in new terraform versions? We have a custom script extension that runs for longer than an hour and it started failing on us with timeouts.. Now this isnt even the regular cse timeout. I think azure defaut timeouts should be the minimum.. I tried the timeout value.. It isnt opted in..
@tombuildsstuff I'm also seeing this trying to create an App Service Environment via template deployment. Times out after 1h (as you know, ASEs take 90-120 minutes).
[...]
module.ase-internal.azurerm_template_deployment.ase_template: Still creating... (59m51s elapsed)
module.ase-internal.azurerm_template_deployment.ase_template: Still creating... (1h0m1s elapsed)
Error: Error applying plan:
1 error(s) occurred:
* module.ase-internal.azurerm_template_deployment.ase_template: 1 error(s) occurred:
* azurerm_template_deployment.ase_template: Error creating deployment: azure#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
If I try to add
{
create = "3h"
delete = "3h"
}
to my template resource, I get
Error: Error refreshing state: 2 error(s) occurred:
* module.ase-internal.azurerm_template_deployment.ase_template: 1 error(s) occurred:
* module.ase-internal.azurerm_template_deployment.ase_template: [ERR] Error decoding timeout: Timeout Key (delete) is not supported
* module.ase-external.azurerm_template_deployment.ase_template: 1 error(s) occurred:
* module.ase-external.azurerm_template_deployment.ase_template: [ERR] Error decoding timeout: Timeout Key (create) is not supported
To deploy this template, I either need to have a longer default timeout, or the abillity to specify my own timeout.
Thanks (again!)
not sure why we are not defaulting to regular timeouts or help us out by defining timeouts
I would like to see timeouts for azurerm_virtual_machine
myself as well. For me, we're using custom Windows images and it's exceeding the default 10m timeout period to deploy even just one VM :(
@Neutrollized Is there a possibility that the VM is stuck on boot? That can happen if not waiting for the registry flag of sysprep to change.
Also, we had a problem where we installed a bunch of prerequisites on silent installs, those installs apparently needed one clean boot from sysprepping.
Check the Azure portal, if its still flagging your VM as creating for that long, that might be an Azure issue? Well depends how big your image is :) maybe the timeout is your problem..
@pixelicous unfortunately not. I created a VM from the same image via Azure portal and it took just over 10 minutes to deploy. My current work around for this is to make the VM size larger (4CPU/16GB mem) instead of (2CPU/8GB) and it deployed in under 6 min and was ok.
@Neutrollized As suspected, we do not even get the default timeouts of the provider API, but terraform's.. I think the support should be the default timeout, with the option to change for any resource..
@pixelicous @miat-asowers this is/was a bug in the way that the Azure SDK handled polling (where it used the default polling delay returned from the service, rather than what was specified as in previous versions); now that #825 has been resolved we should be able to supporting custom timeouts on resources.
For me, we're using custom Windows images and it's exceeding the default 10m timeout period to deploy even just one VM
@Neutrollized out of interest which timeout are you referring too? Azure (HyperV) has a 10 minute boot timeout after which a hard-error is raised and the machine enters the Failed
state.
Thanks!
@tombuildsstuff while you guys are at it, please also dont forget filtering by tag names ;)
Thanks again for all the support!
Yes, I face this issue as well. Terraform does have a tendency to complain about request timeout stuff and as a result, I have to retry provision via code.
any news regarding this? Need this to unblock ASE deployments (until ASE becomes a native resource in terraform-provider-azurerm).. Is there at least a workaround?
@rahulkp220 @justinbarias nothing since.. still waiting as well
azurerm_virtual_machine_extension
also needs timeout
.
Error: Error running plan: 1 error(s) occurred:
* azurerm_virtual_machine_extension.deploy: 1 error(s) occurred:
* azurerm_virtual_machine_extension.deploy: [ERR] Error decoding timeout: Timeout Key (create) is not supported
It seems the default armClient polling duration is hardcoded here at 60 minutes. Any chance we can configure this via the provider config? @tombuildsstuff
func (c *ArmClient) configureClient(client *autorest.Client, auth autorest.Authorizer) {
setUserAgent(client)
client.Authorizer = auth
client.Sender = autorest.CreateSender(withRequestLogging())
client.SkipResourceProviderRegistration = c.skipProviderRegistration
client.PollingDuration = 60 * time.Minute
}
Edit: Ok, i've taken a local branch and have extended the provider to take in a config for the ARM client timeout "arm_client_timeout =
I'll probably fork it for now, but if anyone else might need this i'll consider putting up an official PR.
@bingosummer the funny thing is the default timeout for this action from azure is longer than what terraform allows you, don't understand why.. you can see the post from @justinbarias where it clearly shows its hardcoded in the product and not the provider.. we should be able to override default provider timeout values if we want in my opinion, but if not overrided terraform should wait till it receives a timeout from the api iteself
@pixelicous it's actually provider code.. The init of the ARM client (from the Azure Go SDK) accepts a PollingDuration config.. I've just tested it out on my machine and i can deploy stuff that takes >1hr now. I'll push the fork up tomorrow to anyone who needs it.
@justinbarias that values actually overridden in a lot of cases by the value returned from the API (so it's basically a last resort), which returns the polling interval and timeout duration that should be used as part of a Long Running Operation which the SDK handles; as such that timeout's only used in a couple of resources. In addition a selection of the API's in Azure (e.g. VM's/Container Service/AKS) have hard-coded timeouts on their end (for things like machine boots), which we've got no way of working around, so this'll fail in either case.
Support for timeouts now exists within the Go SDK and as such we plan to support this feature in the near future and will do so via the timeouts
block - however we want to do this across all resources simultaneously rather than on a subset of resources.
Thanks!
@tombuildsstuff
Will the timeout
block be supported by azure provider? Similar to https://www.terraform.io/docs/configuration/resources.html#timeouts
@bingosummer yes, apologies I meant to write timeouts
rather than lifecycle
above - I'll update that!
@tombuildsstuff awesome! i'd be happy to help out if needed
@justinbarias Yes thats what i said, this is timeout issues on terraform size, not azure, azure timeout values are higher, that is exactly what i complained about
@tombuildsstuff I didnt quite get what you said, but bottom line if you run azure custom script extension from arm template it can run longer than it runs in terraform, terraform restricts the timeout even more!
Same issue for after azurerm_kubernetes_cluster
1 hour of K8s upgrading.
Error: Error applying plan:
1 error(s) occurred:
* azurerm_kubernetes_cluster.xxxxx: 1 error(s) occurred:
* azurerm_kubernetes_cluster.xxxxx: Error waiting for completion of Managed Kubernetes Cluster "xxxxx" (Resource Group "rg-xxxx"): Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
re-opening since #2744 only added the groundwork for this
Would love to see this feature for "azurerm_template_deployment" as well, I am trying to provision an Azure Managed SQL Instance that runs into a timeout after one hour. (Inital deployment can take up to six hours at the moment):
Error: Error applying plan:
1 error(s) occurred:
* azurerm_template_deployment.mssql: 1 error(s) occurred:
* azurerm_template_deployment.mssql: Error waiting for deployment: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
We need this so badly :(
I have to stop working with terraform for CSEs because AZURERM allows me more timeout than terraform allows, this is not fun :(
Is there no way this can be added into the next release of the provider (or the current one ;) )
Would also love to see this for azurerm_template_deployments. I'm provisioning Integration Service Environments (ISE's) which take 1.5 - 2hrs to complete.
Currently running into timeout issues on azurerm_shared_image_version
. The images we're targeting are taking anywhere from 58-65 minutes to apply, and heftier images would potentially take longer.
Any chance to have option to raise timeout soon? Or at least some workaround?
@fpytloun we're working on it, but there's a larger dependency chain than we first thought, unfortunately. It turns out to implement this the way we need to we need to replace the Storage SDK, since we can't use the replacement (which I'm working on at the moment) - but this feature is planned to ship as a part of 2.0.
Upgrading the version of my aks cluster took longer than an hour and thus i got the same error. All nodes were eventually updated but the build failed. Waiting for this feature to become available so that i can set timeouts for my aks clusters create/update operations
Many of the Azure APIs with long running operations have been switched to respond with a 202 Response Code.
@BenMitchell1979 Terraform/the Azure SDK already accounts for those by polling on them, this issue's tracking supporting services with extremely long provisioning times (e.g. SQL Managed Instance) by allowing users to specify a custom timeout - which should resolve this.
To give an update here: we've started working on 2.0 as such support for this will be added to Terraform in the not-too-distant future - but we don't have a timeline just yet, unfortunately - when we do we'll post that in the meta issue for 2.0: #2807
Thanks!
Would love to see this feature for "azurerm_template_deployment" as well, I am trying to provision an Azure Managed SQL Instance that runs into a timeout after one hour. (Inital deployment can take up to six hours at the moment):
Error: Error applying plan: 1 error(s) occurred: * azurerm_template_deployment.mssql: 1 error(s) occurred: * azurerm_template_deployment.mssql: Error waiting for deployment: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded
I received a terraform timeout after 3h, but it still takes SQL Managed Instance longer to provision. Would be curious to see if there are any current workaround for this?
@mjazwiecki unfortunately there isn't a workaround for this using the azurerm_template_deployment
resource at this time - you could instead use the Azure CLI via a local-exec provisioner - native support for custom timeouts will be available once 2.0 is released
๐๐ผ
Over the past few months weโve been working on the functionality coming in version 2.0 of the Azure Provider (outlined in #2807). We've just released version 1.43 of the Azure Provider which allows you to opt-in to the Beta of these upcoming features, including the ability to set Custom Timeouts on Resources.
More details on how to opt-into the Beta can be found in the Beta guide - however please note that this is only supported in Version 1.43 of the Azure Provider. You can upgrade to this version by updating your Provider block like so:
provider "azurerm" {
version = "=1.43.0"
}
and then running terraform init -upgrade
which will download this version of the Azure Provider.
Once you've opted into the Beta you can specify a timeouts
block on Resources (as shown below) to override the default timeouts for each resource - which can be found at the bottom of each page in the documentation (example).
resource "azurerm_resource_group" "test" {
name = "example-resources"
location = "West Europe"
timeouts {
create = "60m"
delete = "2h"
}
}
Note: Certain Azure API's also have hard-coded timeouts within the Azure API (for example, the Compute API's have a hard-timeout starting a Virtual Machine at which point it considers it "Failed"), which it's not possible to override.
Custom Timeouts will be going GA with Version 2.0 of the Azure Provider in the coming weeks - if you've tried the Beta and have feedback please open a Github Issue using the special Beta Feedback
category and we'll take a look.
Thanks!
๐
Custom Timeouts have been enabled by default in #5705 which will ship in version 2.0 of the Azure Provider - as such I'm going to close this issue for the moment. If you're looking to use this in the interim you should be able to use the Beta link above to opt-into the Custom Timeouts Beta.
Thanks!
This has been released in version 2.0.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:
provider "azurerm" {
version = "~> 2.0.0"
}
# ... other configuration ...
Hi all,
Sorry, but timeouts seem are not working when creating resources like "azurerm_kubernetes_cluster".
Before upgrading to azurerm 2.0 I haven't had the need to specify any timeout.
Since update to azurerm 2.0 I keep receiving this error after 3-4 minutes provisioning of resource starts;
"Error waiting for creation of Managed Kubernetes Cluster "XXXXXX" (Resource Group"XXXXXXXX"): Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=404 -- Original Error: Code="NotFound" Message="The entity was not found.""
I can see that the resource gets provisioned without problems on Azure.
I remove the resource and try to recreate again, specifying the timeout section with values of 30m.
No luck!
The worst thing is that I can't roll-back to a previous azurerm, when doing it I receive the following error;
"rpc error: code = Unavailable desc = transport is closing"
Not even on the deployment stage but in the planning one!
Any help please?
Regards,
I am also having issues with this using AzureRM => 2.0. I am receiving the same error:
Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=202
I am creating an App Service Plan on an Application Service Environment. I've added a timeouts block w/ 120m but it still times out in TFE in 26 minutes.
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐ค ๐ , please reach out to my human friends ๐ [email protected]. Thanks!
Most helpful comment
I need a configurable timeout on
azurerm_virtual_machine
resources. I'm trying to spin up a bunch at once, and a lot of them fail due to timeout. But they're actually created in Azure so my state just gets out of sync.Any chance this is happening soon?
EDIT: this appears to be because the virtual machine creations are queued for a while waiting on the first "batch" (parallelism) of machines to finish creation. They don't even start "Creating..." before they end up timing out.