Terraform: Error: Failed to instantiate provider "azurerm" to obtain schema: timeout while waiting for plugin to start

Created on 4 Jun 2020  路  22Comments  路  Source: hashicorp/terraform

Opening an issue here as requested in here

Getting an issue on some Windows 10 workstations where they get the error when trying to run terraform.

Error: Failed to instantiate provider "azurerm" to obtain schema: timeout while waiting for plugin to start

This does not occur when running the same code on Mac or Linux.

Terraform Version

0.12.24
Tried azurerm providers 2.7.0 - through 2.12.0 with same results. Code gives error on 2.6.0 provide because we are using "address_prefixes" in the Azurerm_subnet resource.

Terraform Configuration Files

Happening with all modules. The same modules run fine on Linux and Mac OS.

Debug Output

Do not have at this time.

Crash Output

Do not have at this time.

Expected Behavior

Terraform plan would run normally

Actual Behavior

Receive the error.

Error: Failed to instantiate provider "azurerm" to obtain schema: timeout while waiting for plugin to start

Steps to Reproduce

Run terraform plan

Additional Context

The issue can be hit or miss. It is happening on some workstations but not on all.

References

https://github.com/terraform-providers/terraform-provider-azurerm/issues/6690

bug core explained v0.12 windows

Most helpful comment

Thanks for that extra context, @sai-ns and @drdamour!

With the lead about Trend Micro I found golang/go#36492, where some folks are discussing a situation where the system call interception in Trend Micro causes a deadlock between two threads. The effect of that would be to hang the process, which seems to agree with the behavior we're seeing here in the Azure provider too.

If the Azure provider behavior we're seeing here has the same root cause as golang/go#36492 then unfortunately it seems to be deep inside the Go runtime and not something we on the Terraform team can directly influence, but it does seem like the participants in that issue are investigating possible approaches to avoid it and so hopefully there will be a fix available in a future version of Go. The investigation work on that issue might also yield some possible workarounds other than disabling Trend Micro altogether, but we'll need to wait and see where that discussion leads.

This behavior is in the Azure provider rather than in Terraform Core and so technically this issue would be better placed in the Azure provider repository (Terraform Core is not using Go 1.14 yet), though since it could presumably apply to _any_ Go program built with Go 1.14 -- including all of the Terraform providers and Terraform Core itself -- I think we'll keep this open here for now to track the issue in the broad sense and then if we find more information later that suggests a specific solution that the Azure provider team could take we can consider opening an issue over in their repository once we'd have something more concrete to share in it.

Thanks again, all! If you do learn anything else in your investigations with Trend Micro, please share! (The Go team might also appreciate hearing about what you learn in that issue, though it's probably best to discuss here first so that we can help figure out which parts of the information are generic to all Go programs and which are specific to Terraform before sharing those details.)

All 22 comments

Hi @jf781,

Unfortunately without the full debug logs there's quite a large space of possibilities for what might cause this behavior, but the one we've encountered most commonly in the past is when folks are running Terraform on Windows machines with software firewalls installed.

Terraform provider plugins are child processes which Terraform Core interacts with via sockets on the local loopback interface, and some software firewalls prevent packets from reaching a local listen socket, either blocking it altogether or requiring a manual approval to do so.

Do you have any firewall software on your Windows system that might be blocking requests like these?

@apparentlymart did u see the original ticket this just started happening with a specific version of azurerm provider. It def looks firewall related

Are there details about which ports the providers are trying to open? Its all gRPC still yes? I saw some old posts about a env cookie to pass to try to see the ports but didnt seem to work.

Hi @drdamour,

By default the plugin processes ask the OS to allocate a listen port automatically, so there is no predictable port number. However, it looks like the underlying library has environment variables PLUGIN_MIN_PORT and PLUGIN_MAX_PORT which you can set together to constrain it to try to use a particular range of ports.

If you set them both to the same value then I think that would force it to try that specific port number, assuming it's available, in case having that constrained is helpful for debugging. (Though if your configuration calls for multiple plugins you'll need at least enough distinct port numbers in the range for them all to listen at once, so if you're using this debug strategy it'd be best to do it with a very minimal configuration that only uses one plugin.)

Hello!

I've tried to reproduce this issue, and confirmed for myself that on a Windows 10 VM with Terraform 0.12.26, I'm able to use Terraform and the AzureRM provider to create and destroy a resource group.

We exclusively use GitHub issues for tracking bugs and enhancements, rather than for questions. While we can sometimes help with certain simple problems here, it's better to use the community forum where there are more people ready to help.

Based on the information you've provided, it looks like this doesn't represent a specific bug or feature request, even though I am empathetic to what sounds like a very difficult troubleshooting situation, and so I'm going to close it. I hope that you and @apparentlymart can continue troubleshooting in the community forum - please open a thread there and feel free to post that back here after it's closed to link them together. I don't want to shut down this conversation.

@danieldreier there are multiple people experiencing this issue and only with a recent version of azurerm i think this should remain open

It never happens with azure rm 2.6 and always happens with later versions

Sure! I'll re-open it.

@apparentlymart I have uploaded 2 trace logs to https://github.com/sai-ns/terraformlogs if that is what you are waiting for. terraform plan works like 1 in 100 times but fails when I run terraform apply.
Just like everyone else, I started seeing this issues since azurerm 2.7. I was waiting for few bugs to be fixed in 2.7 and 2.8 and when I am ready to test it, terraform itself starting throwing these errors.

Terraform team, any update on this, please?

The relevant part of the trace log output seems to be this part:

2020/06/19 17:05:11 [TRACE] terraform.NewContext: starting
2020/06/19 17:05:11 [TRACE] terraform.NewContext: resolving provider version selections
2020/06/19 17:05:12 [TRACE] terraform.NewContext: loading provider schemas
2020/06/19 17:05:12 [TRACE] LoadSchemas: retrieving schema for provider type "azurerm"
2020-06-19T17:05:12.119-0500 [INFO]  plugin: configuring client automatic mTLS
2020-06-19T17:05:12.156-0500 [DEBUG] plugin: starting plugin: path="C:\Users\Sai Nooli\.terraform\plugins\windows_amd64\terraform-provider-azurerm_v2.15.0_x5.exe" args=["C:\Users\Sai Nooli\gitrepos\kvtest\.terraform\plugins\windows_amd64\terraform-provider-azurerm_v2.15.0_x5.exe"]
2020-06-19T17:05:12.162-0500 [DEBUG] plugin: plugin started: path="C:\Users\Sai Nooli\.terraform\plugins\windows_amd64\terraform-provider-azurerm_v2.15.0_x5.exe" pid=1116
2020-06-19T17:05:12.162-0500 [DEBUG] plugin: waiting for RPC address: path="C:\Users\Sai Nooli\.terraform\plugins\windows_amd64\terraform-provider-azurerm_v2.15.0_x5.exe"
2020-06-19T17:05:12.298-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Advisor"..
2020-06-19T17:05:12.298-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Advisor"..
2020-06-19T17:05:12.298-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Analysis Services"..
2020-06-19T17:05:12.298-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Analysis Services"..
2020-06-19T17:05:12.299-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "API Management"..
2020-06-19T17:05:12.299-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "API Management"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "App Configuration"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "App Configuration"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "App Platform"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "App Platform"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Application Insights"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Application Insights"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Authorization"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Authorization"..
2020-06-19T17:05:12.301-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Automation"..
2020-06-19T17:05:12.302-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Automation"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Batch"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Batch"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Blueprints"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Blueprints"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "Bot"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "Bot"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Data Sources for "CDN"..
2020-06-19T17:05:12.303-0500 [DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: [DEBUG] Registering Resources for "CDN"..
2020/06/19 17:06:12 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2020/06/19 17:06:12 [TRACE] statemgr.Filesystem: unlocked by closing terraform.tfstate
2020-06-19T17:06:12.168-0500 [WARN]  plugin: plugin failed to exit gracefully
2020-06-19T17:06:12.175-0500 [DEBUG] plugin: plugin process exited: path="C:\Users\Sai Nooli\.terraform\plugins\windows_amd64\terraform-provider-azurerm_v2.15.0_x5.exe" pid=1116 error="exit status 1"

From this output we can see that the provider plugin was indeed launched correctly, because we can see it running and producing its own debug messages.

The next thing that's supposed to happen is the response to the log line plugin: waiting for RPC address. If this were working correctly then the next log lines should look something this:

[INFO]  plugin.terraform-provider-azurerm_v2.15.0_x5.exe configuring server automatic mTLS: timestamp=2020-07-07T16:47:59.771-0700
[DEBUG] plugin: using plugin: version=5
[DEBUG] plugin.terraform-provider-azurerm_v2.15.0_x5.exe: plugin address: address=127.0.0.1:45345 network=tcp timestamp=2020-07-07T16:47:59.804-0700

The fact that we are not seeing messages like these seems to suggest that the provider plugin is not initializing correctly. Normally when we've seen behavior like this caused by firewall problems we'd see messages like the above and then Terraform would fail trying to connect to (for example) 127.0.0.1:45345, but this situation seems different in that the provider doesn't seem to even be completing the plugin handshake.

I'm not sure what exactly to investigate next, but hopefully we can identify something that changed in the provider between versions 2.6 and 2.7 that would give a clue as to why the provider started behaving differently. The fact that relatively few people seem to have encountered this problem (based on the number of comments and votes on this issue compared to the number of folks out there using Terraform on Windows) makes me suspect that this is still caused by something environment-specific, but clearly it's something a little different than the typical firewall interactions we've seen in the past.

I used rsc.io/goversion to try to see what Go runtime versions and Go modules the 2.6.0 and 2.7.0 builds of the AzureRM provider.

Working with the official Windows builds of each I see the following:

  • 2.6.0 is using go1.12.6
  • 2.7.0 is using go1.14.2

Unfortunately it seems that goversions wasn't able to extract module version information from the Windows executables. I'm not sure why that is (I've not tried this tool with Windows executables before) but to get as much information as possible I compared the module versions from the Linux executables because I know that the official provider release process cross-compiles all of the executables from the same sources and thus the modules used in each should be the same:

--- 2.6.0
+++ 2.7.0
@@ -1,9 +1,10 @@
-2.6.0/terraform-provider-azurerm_v2.6.0_x5 go1.12.6
+2.7.0/terraform-provider-azurerm_v2.7.0_x5 unknown Go version
    path  github.com/terraform-providers/terraform-provider-azurerm
    mod   github.com/terraform-providers/terraform-provider-azurerm  (devel)
-   dep
    dep   cloud.google.com/go                                        v0.45.1
    dep   github.com/Azure/azure-sdk-for-go                          v41.2.0+incompatible
+   dep   github.com/Azure/go-autorest/autorest                      v0.10.0
+   =>    github.com/tombuildsstuff/go-autorest/autorest             v0.10.1-0.20200416184303-d4e299a3c04a
    dep   github.com/Azure/go-autorest/autorest/adal                 v0.8.2
    dep   github.com/Azure/go-autorest/autorest/azure/cli            v0.3.0
    dep   github.com/Azure/go-autorest/autorest/date                 v0.2.0
@@ -76,4 +77,4 @@
    dep   google.golang.org/api                                      v0.9.0
    dep   google.golang.org/genproto                                 v0.0.0-20190819201941-24fa4b261c55
    dep   google.golang.org/grpc                                     v1.23.0
-   dep   gopkg.in/yaml.v2                                           v2.2.2
+   dep   gopkg.in/yaml.v2                                           v2.2.4

None of the modules that changed seem like that should affect how the Terraform plugin handshake behaves, so I'm going to dismiss module changes as a possible cause for now and focus on the possibility that Go runtime changes might've affected the behavior somehow, because AzureRM provider 2.7 was built with a Go two minor versions later than AzureRM provider 2.6 and so there may have been notable behavior changes to the Go runtime between those two releases.


I see one change recorded in the Go 1.13 release notes which alludes to the possibility that system call behavior could've changed, though I don't know enough details to decide whether this is relevant to the behavior we're discussing here:

The Windows version specified by internally-linked Windows binaries is now Windows 7 rather than NT 4.0. This was already the minimum required version for Go, but can affect the behavior of system calls that have a backwards-compatibility mode. These will now behave as documented. Externally-linked binaries (any program using cgo) have always specified a more recent Windows version.

I don't see anything else in the 1.13 and 1.14 release notes that stands out as something that could affect the behavior of plugins running on Windows.

I have run out of time to work on this for today, and unfortunately we have no new leads. If someone else picks up this issue in future, a possible next direction could be to try to understand in more detail which backward-compatibility behaviors on Windows are controlled by the binaries specifying Windows 7 rather than NT 4.0 as the target version, to see if any of them seem like they might affect system calls that the Terraform plugin system would be using on Windows systems.

@apparentlymart thank you very much for your time and detailed explanation. I was earlier concentrating on troubleshooting from the network perspective but looks like it is something else. I agree this is environment specific since no one else in our org are seeing this behavior. I will try to troubleshoot from my side as well and keep you all posted.

all the devs in our env who terraform (~50) are experiencing it..so it's def something to do with our draconian security setup. it's funny that there's no issue with WSL though...

Update- I recently started seeing issues with running GO(v1.14.4) in my laptop and based on @apparentlymart research, it does seem too be related. I've had my desktop support team uninstall/disable antivirus(trend micro in my case) from my machine and everything started working just fine.
We are now trying to see what is being blocked and what can be done on the antivirus to have this working.

ah we run trend too, so i bet that's the culprit...

Thanks for that extra context, @sai-ns and @drdamour!

With the lead about Trend Micro I found golang/go#36492, where some folks are discussing a situation where the system call interception in Trend Micro causes a deadlock between two threads. The effect of that would be to hang the process, which seems to agree with the behavior we're seeing here in the Azure provider too.

If the Azure provider behavior we're seeing here has the same root cause as golang/go#36492 then unfortunately it seems to be deep inside the Go runtime and not something we on the Terraform team can directly influence, but it does seem like the participants in that issue are investigating possible approaches to avoid it and so hopefully there will be a fix available in a future version of Go. The investigation work on that issue might also yield some possible workarounds other than disabling Trend Micro altogether, but we'll need to wait and see where that discussion leads.

This behavior is in the Azure provider rather than in Terraform Core and so technically this issue would be better placed in the Azure provider repository (Terraform Core is not using Go 1.14 yet), though since it could presumably apply to _any_ Go program built with Go 1.14 -- including all of the Terraform providers and Terraform Core itself -- I think we'll keep this open here for now to track the issue in the broad sense and then if we find more information later that suggests a specific solution that the Azure provider team could take we can consider opening an issue over in their repository once we'd have something more concrete to share in it.

Thanks again, all! If you do learn anything else in your investigations with Trend Micro, please share! (The Go team might also appreciate hearing about what you learn in that issue, though it's probably best to discuss here first so that we can help figure out which parts of the information are generic to all Go programs and which are specific to Terraform before sharing those details.)

@apparentlymart well we DID open it there, and then they told us to open it here https://github.com/terraform-providers/terraform-provider-azurerm/issues/6690#issuecomment-638739832 by rules of first, it MUST be here ;)

but really, thanks a LOT for that investigation that's exactly the type of data i need to go back to my security team about trend!

I can fully confirm that it is about TrendMicro. As soon as I disable this I don't have any issues anymore (I did first reported in terraform-providers/terraform-provider-azurerm#8466).
I will also create a case at TrendMicro.

Hi All,
How about this issue on Microsoft hosted ubuntu agent? Anyone could fix this issue?

@blindzero u have any luck with trend ticket?

Yes and no, it is really hard to reproduce because not occuring all the time.
It could be reproduced, did not occur at all when deactivating most TM features. But exclusions did not work because of missing variable exclusion

Was this page helpful?
0 / 5 - 0 ratings