Vault: Vault client slow with CNAME

Created on 18 Dec 2017  路  7Comments  路  Source: hashicorp/vault

Environment:

  • Vault Version: v0.9.0
  • NOTE: Vault is running under the official docker container
  • Operating System/Architecture: Kubernetes 1.8.5

Vault Config File:
backend "consul" { address = "127.0.0.1:8500" path = "vault/" token = "xxxx" disable_registration = "true" }
default_lease_ttl = "168h"
max_lease_ttl = "720h"
listener "tcp" { address = "0.0.0.0:8200" tls_cert_file = "/vault/ssl/vault.pem" tls_key_file = "/vault/ssl/vault.key" tls_client_ca_file = "/vault/ssl/ca.pem" }
listener "tcp" { address = "127.0.0.1:9000" tls_disable = 1 }

Startup Log Output:
==> Vault server started! Log data will stream in below:

2017/12/13 16:03:29.676774 [INFO ] core: vault is unsealed
2017/12/13 16:03:29.676828 [INFO ] core: entering standby mode
2017/12/13 16:03:30.334405 [INFO ] core: acquired lock, enabling active operation
2017/12/13 16:03:30.375861 [INFO ] core: post-unseal setup starting
2017/12/13 16:03:30.378161 [INFO ] core: loaded wrapping token key
2017/12/13 16:03:30.378177 [INFO ] core: successfully setup plugin catalog: plugin-directory=
2017/12/13 16:03:30.395767 [INFO ] core: successfully mounted backend: type=kv path=secret/
2017/12/13 16:03:30.395885 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/12/13 16:03:30.395956 [INFO ] core: successfully mounted backend: type=pki path=pki/primary.prod/auth/
2017/12/13 16:03:30.395983 [INFO ] core: successfully mounted backend: type=aws path=prod.aws/
2017/12/13 16:03:30.396039 [INFO ] core: successfully mounted backend: type=pki path=pki/dev.prod/auth/
2017/12/13 16:03:30.396094 [INFO ] core: successfully mounted backend: type=pki path=pki/primary.prod/namespaces/support-case-triage/tiller/
2017/12/13 16:03:30.396114 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/12/13 16:03:30.396284 [INFO ] core: successfully mounted backend: type=identity path=identity/
2017/12/13 16:03:30.416959 [INFO ] expiration: restoring leases
2017/12/13 16:03:30.417026 [INFO ] rollback: starting rollback manager
2017/12/13 16:03:30.423445 [INFO ] identity: entities restored
2017/12/13 16:03:30.425594 [INFO ] identity: groups restored
2017/12/13 16:03:30.427821 [INFO ] core: post-unseal setup complete
2017/12/13 16:03:30.427843 [INFO ] core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2017/12/13 16:03:30.427918 [INFO ] core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2017/12/13 16:03:30.427941 [INFO ] core/startClusterListener: starting listener: listener_address=127.0.0.1:9001
2017/12/13 16:03:30.427979 [INFO ] core/startClusterListener: serving cluster requests: cluster_listen_address=127.0.0.1:9001
2017/12/13 16:03:30.462812 [INFO ] expiration: lease restore complete

Expected Behavior:
The vault client cli should be responsive whether or not the server address is a CNAME.

Actual Behavior:
When using the vault client cli with VAULT_ADDR containing a CNAME, any command takes roughly a minute to complete. The command will eventually succeed. Even vault status takes a significant amount of time to return. When calling the vault api with curl, the command returns almost instantly. Running vault under strace shows that the client is blocking a number of times. Pointing VAULT_ADDR to the target of the CNAME and setting VAULT_TLS_SERVER_NAME is an effective workaround.

Steps to Reproduce:
export VAULT_ADRR=https://xxx.yyy.zzz
where xxx.yyy.zzz is a CNAME to internal-12345.us-east-1.elb.amazonaws.com.
Run vault status. Observe that it takes from 20 seconds to several minutes for the command to eventually succeed. Now
export VAULT_ADDR=https://internal-12345.us-east-1.elb.amazonaws.com
export VAULT_TLS_SERVER_NAME=xxx.yyy.zzz
The vault command succeeds immediately.
Run curl https://xxx.yyy.zzz/v1/sys/seal-status. Observe that the server provides an immediate response.

Most helpful comment

My guess is the delay is caused by this line of code https://github.com/hashicorp/vault/blob/master/api/client.go#L560 which does a SRV lookup if the port is not specified.

@nickmaccarthy can you try specifying your Vault address including the port, e.g. :443 and see if its quicker?

All 7 comments

If the Vault CLI is blocking, that almost certainly means it's waiting on DNS resolution. Unfortunately that's happening at the Go level so it's not something we can easily manage.

If you can build Vault on your own you may want to try playing with the netdns setting -- see https://golang.org/pkg/net/#hdr-Name_Resolution -- and try the cgo option, which uses system DNS resolution instead of Go's built-in resolution. If that solves the problem, you're likely hitting some pathological case in Go's DNS library and we need to get an issued filed there.

I have the same issue with CentOS 7 running on Fusion. Not issue on the host OS.
From packet capture, it appears to timeout 4 times (5s each) on srv request for _http._tcp.{CNAME}.localdomain

Closing due to lack of feedback. If this is still an issue with 0.10, please write back (new version of Go, and it may have changed/fixed this). But the next steps after that would be to try a dynamic build using system DNS resolution to see if it helps.

I too am having an issue similar to this, but it seems to only affect Mac's in our environment.

I am running Vault v0.10.1 ('756fdc4587350daf1c65b93647b2cc31a6f119cd') on my Mac, and 0.10.0 on the servers in AWS. Our Vault systems, like @adamnoll's also sit behind an ELB, which we tied an A record (vault.company.com) to the ELB's CNAME. On a CentOS 7 VM on my mac, the vault client works fine, which is nat'd on my Mac network. On a co-workers Windows 10 machine, he doesnt have this timeout issue either. It seems its only related the Mac version of the Vault binary.

IIRC, on Mac, Go programs are always compiled dynamically, not statically (it also might be that they're only cross-compiled dynamically, I forget, but we do our releases via cross-compiling). If it's a dynamic binary Go defaults to using system DNS libraries for resolution as opposed to internal name lookup logic. Check out https://golang.org/pkg/net/#hdr-Name_Resolution but basically try setting export GODEBUG=netdns=go and see if this solves your issue.

My guess is the delay is caused by this line of code https://github.com/hashicorp/vault/blob/master/api/client.go#L560 which does a SRV lookup if the port is not specified.

@nickmaccarthy can you try specifying your Vault address including the port, e.g. :443 and see if its quicker?

Hi @deverton , that worked! Specifying :443 at the end of VAULT_ADDR solved it. Thank you, I had worked around this by building a linux VM with vault. Thank you for finding that!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tustvold picture tustvold  路  3Comments

anthonyGuo picture anthonyGuo  路  3Comments

ngunia picture ngunia  路  3Comments

gtmtech picture gtmtech  路  3Comments

gtmtech picture gtmtech  路  3Comments