We are going to use Vault in production environment. We have 3 vault servers say vault1, vault2, vault3 running using domain name mapped directly to 3 consul one to one. In order to use one ip on applications using vault server we have configured haproxy ( round robin fashion ) so that in case one vault server gets down our application can easily figure out other vault server. Doing this we found that vault server(not master) instead of forwarding the request redirects request with response 307. So actually we have to give all dns mapping of all vault servers in all applications using vault.So in future if mapping of any vault server changes we have to do this change in all applications using this.
So, is there any way through which vault servers(except master) forward request instead of redirecting them.
Hi @ashishrathore1,
Vault does not support proxying client requests, but I also don't understand your conclusion. Vault gives the client a redirect with the active node's address, so I don't see how this then results in clients having to maintain a list of Vault instances. This active node address could either be DNS, which the client can resolve, or an IP address, which the client can connect to directly.
This is a problem for me. I run Vault in Kubernetes, and expose my pods via a Load Balancer. Each pod has a private IP address that is not externally accessible. When I hit the LB, if I don't end up routing to master, the request will fail because that node is redirecting me to a private IP that is not externally accessible.
@devth You can use health checks to inform your LB at any given time which node is active. See https://www.vaultproject.io/docs/http/sys-health.html
Thanks. So that essentially means there's only 1 node behind the LB at a given time. Do you know how long failover would take? I assume some requests will be dropped during that process. Any recommendations for how often I should health check? 5s?
How long failover takes depends on various things, including the session time for locks with your HA backend. But, I see no reason not to make the health check from your LB fairly rapid -- every 500ms or second. Especially if it can reuse the connection.
This doesn't work too tell with K8S. A failing health check will remove a pod from a service, but it also prevents rolling updates from progressing. I'm looking into a way to dynamically set labels from within the pod based on whether that pod is leader in order to add or remove it from a LB, but it's not very simple.
Closing due to lack of progress -- the initial ask is not planned to be satisfied.
@jefferai I think the documentation around this could use some clarity on this issue. I just spent a decent amount of time trying to figure out how to set up a Vault cluster in front of a single ELB (for DNS purposes). I really only realized that only one of my nodes should be "InService" at a time after I read this thread, with the failover happening when the leadership changes.
In fact I even tried to use the /sys/health endpoint for my health check but changed it to /sys/leader when I realized the health endpoint gave code 429 for non-leader nodes not realizing that the "failure" was very much intentional.
@tylerFowler sys/health has customizable return codes -- see the docs to see how to configure that. You don't need only one node InService at a time if you're using request forwarding...one of the whole points of implementing that was to fix bad behavior with ELBs.
That's the other thing - the documentation is not very clear on how exactly to set up the request forwarding vs. client redirection. It seems as though you just omit the redirect_addr config but then it just guesses one for you (private IP) rather than defaulting to request forwarding. Or maybe I'm just misunderstanding something, but it seemed like no matter what I do it will redirect clients.
It's covered quite extensively on https://www.vaultproject.io/docs/concepts/ha.html and https://www.vaultproject.io/docs/config/index.html#file
If you have suggestions for improving it please bring them forward.
The documentation assumes a certain level of understanding of the product and infrastructure. It is obviously written by someone who knows the system well and is not written from the perspective that many new users will be coming from.
I am trying to get the forwarding working as well. My cluster has 3 nodes and all of them are currently running version 0.6.5.
Here is my /etc/vault.hcl config file:
backend "consul" {
address = "172.0.0.25:8500"
// redirect_addr = "https://vault-c.domain.net:8200" // removed to see if that helps
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 0
tls_cert_file = "/etc/ssl/certs/domain/cert"
tls_key_file = "/etc/ssl/certs/domain/key"
}
cluster_name = "TheCoolCluster"
Here is a curl session trying to read a value from a slave node:
[10:47:15] SunSparc@themoon:~ $ curl -sv --resolve vault.domain.net:8200:1.2.3.4 -H "X-Vault-Token: 12345678-1234-adad-adad-12ab12ab12ab" -X GET https://vault.domain.net:8200/v1/secret/test
* Added vault.domain.net:8200:1.2.3.4 to DNS cache
* Hostname vault.domain.net was found in DNS cache
* Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to vault.domain.net (1.2.3.4) port 8200 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate: domain.com
* Server certificate: COMODO RSA Domain Validation Secure Server CA
* Server certificate: COMODO RSA Certification Authority
> GET /v1/secret/test HTTP/1.1
> Host: vault.domain.net:8200
> User-Agent: curl/7.51.0
> Accept: */*
> X-Vault-Token: 12345678-1234-adad-adad-12ab12ab12ab
>
< HTTP/1.1 307 Temporary Redirect
< Cache-Control: no-store
< Location: https://172.0.0.25:8200/v1/secret/test
< Date: Thu, 23 Mar 2017 17:30:29 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Curl_http_done: called premature == 0
* Connection #0 to host vault.domain.net left intact
[]
[11:30:30] SunSparc@themoon:~ $
Notice the timestamps. It took 3 minutes to return a 307, and obviously did not forward. I see nothing in the documentation that, at least from my perspective, has yet helped me resolve this issue.
I noticed the comment from @tylerFowler about removing the redirect_addr parameter and so I did that, restarted the cluster, and above is the result, which is the same as it was with the redirect address being specified. Perhaps the cluster needs to be wiped and brought up from scratch?
The comment about removing redirect_addr is incorrect -- you should always have it. You also should specify cluster_addr if it's not being detected correctly.
Full logs in trace mode would help, including the information printed at the beginning.
For clarity, my comment was about it being ambiguous whether the behavior I was going for would require me to omit the redirect address, but that it is definitely not what you want to do.
Redirect addresses are always required for HA, but some backends will attempt to detect it automatically if you don't set it. There is never any harm in setting a redirect address manually, and it changes no forwarding behavior to do so.
@jefferai, thanks for the feedback. Also, you asked for ideas on making the documentation better. My favorite documentation has lots of examples to go along with the detailed write-ups.
The term 'request forwarding' is ambiguous if a redirection occurs regardless..
Fallback behavior is documented: https://www.vaultproject.io/docs/concepts/ha.html
Is a load balancer mandatory if the standby will always redirect to the Active vault node?
Most helpful comment
@jefferai, thanks for the feedback. Also, you asked for ideas on making the documentation better. My favorite documentation has lots of examples to go along with the detailed write-ups.