Nomad v0.9.5 ('0.9.5')
Nomad is running on Alpine Linux with enabled Docker, exec, and raw_exec task drivers. It is connected to a Consul and Vault cluster all having three nodes. The issue occurred when testing integration with Vault (have not yet tested Consul with consul-template, but the Nomad instances find each other via Consul so I assume this is working). Both, Consul and Vault use server and require client HTTPS certificates. Nomad was compiled with Go 1.13 (Alpine package 1.13-r0).
$ uname -a
Linux ... 4.19.76-0-virt #1-Alpine SMP Tue Oct 1 09:34:00 UTC 2019 x86_64 Linux
Any interactions with Vault will fail after some time (e.g. renew tokens, delete tokens, create tokens for allocations). The root cause is the Go HTTP2 connection pool (see logs). Searching for the error seems to indicate that it was fixed at some time in the past. Due to time constraints, I'm unsure if the vendored golang.org/x/net/http2 simply needs to be updated to fix the issue.
2019-10-07T16:23:08.637+0200 [WARN ] nomad.vault: failed to revoke tokens. Will reattempt until TTL: error="failed to revoke token (alloc: "c3ee58b4-3548-4eed-28a7-0b35534bed19", node: "9c66dbd2-3dc1-f859-7bee-7e6516bf5a36", task: "..."): Post https://active.vault.service.consul:8200/v1/auth/token/revoke-accessor: http2: no cached connection was available"
2019-10-07T16:38:44.047+0200 [ERROR] nomad.client: Vault token creation for alloc failed: alloc_id=fe0b827d-7594-eab2-8e4f-3b9334b2b473 error="failed to create an alloc vault token: Post https://active.vault.service.consul:8200/v1/auth/token/create/nomad-cluster: http2: no cached connection was available"
https://github.com/golang/go/issues/16582
https://github.com/kubernetes/kubernetes/issues/74412
thanks for the report, @mtneug !
A quick workaround seems to be setting the following environment variable on the nomad servers:
GODEBUG=http2client=0
@nvx thanks! I thought I tried that out, but I will try again and report back.
I played around with this a bit more, it looks like GODEBUG trick didn't work after all.
Updating the golang.org/x/net/http2 dependency and rebuilding did the trick however (against 0.10.3 tag):
govendor fetch golang.org/x/net/http2
Was used to update the dependency. Looks like it should be a pretty easy fix.
I can confirm it happens with Nomad 0.10.4 too and, as @nvx said, updating golang.org/x/net/http2 did the trick.
This is a complete showstopper for me for using Vault and Nomad together. Nomad consistently fails to renew its token.
Given that this is a serious bug that prevents two of your flagship products from working together, and that it is apparently an low-effort fix, can we please get the change integrated?
Thank you folks! Sorry that this has slipped our attention for so long. The fix here, will be out in 0.11.1!
Most helpful comment
Thank you folks! Sorry that this has slipped our attention for so long. The fix here, will be out in 0.11.1!