Describe the bug
A Vault cluster is setup with the Raft storage backend (using the vault-operator). The first node does the init and unseal. The second node, tells me that it is not initialized when the status says the contrary. Hence the vault-unsealer helper is stuck trying to init an already initialized vault.
/ # vault operator init -status
Vault is not initialized
/ # vault status
Key Value
--- -----
Seal Type shamir
Initialized true <------------------------------ MEH!?
Sealed true
Total Shares 4
Threshold 2
Unseal Progress 0/2
Unseal Nonce n/a
Version 1.5.0
HA Enabled true
To Reproduce
Steps to reproduce the behavior:
Expected behavior
vault operator init -status to return true.
Environment:
vault status): 1.5.0vault version): 1.5.0Vault server configuration file(s):
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_cert_file = "/etc/vault/tls/tls.crt"
tls_key_file = "/etc/vault/tls/tls.key"
}
storage "raft" {
path = "/mnt/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
leader_ca_cert_file = "/etc/vault/tls/cacert.crt"
leader_client_cert_file = "/etc/vault/tls/tls.crt"
leader_client_key_file = "/etc/vault/tls/tls.key"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
leader_ca_cert_file = "/etc/vault/tls/cacert.crt"
leader_client_cert_file = "/etc/vault/tls/tls.crt"
leader_client_key_file = "/etc/vault/tls/tls.key"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
leader_ca_cert_file = "/etc/vault/tls/cacert.crt"
leader_client_cert_file = "/etc/vault/tls/tls.crt"
leader_client_key_file = "/etc/vault/tls/tls.key"
}
}
telemetry {
statsd_address = "0.0.0.0:9125"
}
Additional context
Some logs
2020-07-29T08:16:06.974Z [INFO] core: security barrier not initialized
2020-07-29T08:16:06.974Z [INFO] core: attempting to join possible raft leader node: leader_addr=https://vault-0.vault-internal:8200
2020-07-29T08:16:06.981Z [INFO] core: join attempt failed: error="waiting for unseal keys to be supplied"
2020-07-29T08:16:06.981Z [INFO] core: security barrier not initialized
2020-07-29T08:16:06.981Z [INFO] core: attempting to join possible raft leader node: leader_addr=https://vault-1.vault-internal:8200
2020-07-29T08:16:06.992Z [INFO] core: join attempt failed: error="error during raft bootstrap init call: Error making API request.
URL: PUT https://vault-1.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge
Code: 503. Errors:
* Vault is sealed"
2020-07-29T08:16:06.992Z [INFO] core: security barrier not initialized
2020-07-29T08:16:06.992Z [INFO] core: attempting to join possible raft leader node: leader_addr=https://vault-2.vault-internal:8200
2020-07-29T08:16:06.996Z [INFO] core: join attempt failed: error="error during raft bootstrap init call: Put "https://vault-2.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge": dial tcp: lookup vault-2.vault-internal on 10.96.0.10:53: no such host"
2020-07-29T08:16:06.996Z [ERROR] core: failed to retry join raft cluster: retry=2s
/cc @tongpu
Here's an example that explains how to reproduce it using the vault-helm Helm chart. An interesting fact is that the initialized status changes when I run vault operator init on one of the other nodes.
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
helm upgrade --install vault hashicorp/vault -f vault-ha-raft.yaml
kubectl exec vault-0 -- vault operator init -key-shares=1 -key-threshold=1 -format=json | tee unseal.json
kubectl exec vault-0 -- vault operator unseal $(jq -r '.unseal_keys_hex[0]' unseal.json)
sleep 5s
kubectl exec vault-0 -- vault status
sleep 5s
kubectl exec vault-0 -- vault status
for i in {1..2}; do
echo "--- vault-$i"
kubectl exec vault-$i -- vault status
kubectl exec vault-$i -- vault operator init -status
done
--- vault-1
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 1
Threshold 1
Unseal Progress 0/1
Unseal Nonce n/a
Version 1.5.0
HA Enabled true
command terminated with exit code 2
Vault is not initialized
command terminated with exit code 2
--- vault-2
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 1
Threshold 1
Unseal Progress 0/1
Unseal Nonce n/a
Version 1.5.0
HA Enabled true
command terminated with exit code 2
Vault is not initialized
command terminated with exit code 2
vault operator init on vault-1:kubectl exec vault-1 -- vault operator init -key-shares=1 -key-threshold=1 -format=json | tee unseal-vault-1.json
kubectl exec vault-1 -- vault status
kubectl exec vault-1 -- vault operator init -status
$ kubectl exec vault-1 -- vault operator init -key-shares=1 -key-threshold=1 -format=json | tee unseal-vault-1.json
{
"unseal_keys_b64": [
"xxx"
],
"unseal_keys_hex": [
"xxx"
],
"unseal_shares": 1,
"unseal_threshold": 1,
"recovery_keys_b64": [],
"recovery_keys_hex": [],
"recovery_keys_shares": 5,
"recovery_keys_threshold": 3,
"root_token": "s.xxx"
}
$ kubectl exec vault-1 -- vault status
Key Value
--- -----
Seal Type shamir
Initialized false
Sealed true
Total Shares 0
Threshold 0
Unseal Progress 0/0
Unseal Nonce n/a
Version n/a
HA Enabled true
command terminated with exit code 2
$ kubectl exec vault-1 -- vault operator init -status
Error checking init status: Error making API request.
URL: GET http://127.0.0.1:8200/v1/sys/init
Code: 500. Errors:
* core: barrier reports initialized but no seal configuration found
command terminated with exit code 1
kubectl exec vault-1 -- vault operator unseal $(jq -r '.unseal_keys_hex[0]' unseal.json)
kubectl exec vault-1 -- vault operator unseal $(jq -r '.unseal_keys_hex[0]' unseal-vault-1.json)
Somehow vault-1 is now completely broken.
$ kubectl exec vault-1 -- vault operator unseal $(jq -r '.unseal_keys_hex[0]' unseal.json)
Error unsealing: Error making API request.
URL: PUT http://127.0.0.1:8200/v1/sys/unseal
Code: 500. Errors:
* core: barrier reports initialized but no seal configuration found
command terminated with exit code 2
$ kubectl exec vault-1 -- vault operator unseal $(jq -r '.unseal_keys_hex[0]' unseal-vault-1.json)
Error unsealing: Error making API request.
URL: PUT http://127.0.0.1:8200/v1/sys/unseal
Code: 500. Errors:
* core: barrier reports initialized but no seal configuration found
command terminated with exit code 2
Fixed in the context of above PR (making note for my own reference).
Is this issue resolved? If yes, in which vault version?
I'm also wondering if this is resolved or not... it looks like this breaks our automation that uses the Vault API to unseal nodes when trying to use integrated storage.
Most helpful comment
Here's an example that explains how to reproduce it using the vault-helm Helm chart. An interesting fact is that the initialized status changes when I run
vault operator initon one of the other nodes.Vault HA with internal Raft storage
Deploy Vault using the HashiCorp Helm Chart
Wait until one of the pods is in running state and initialize and unseal Vault on vault-0:
Verify that the intialized status is not matching on the other nodes:
Output
Run
vault operator initon vault-1:Output
Try to unseal Vault on vault-1 using either unseal key
Somehow vault-1 is now completely broken.
Output