Describe the bug
Vault crashes with a seg fault immediately upon unseal when using raft as ha_sotrage
To Reproduce
Steps to reproduce the behavior:
raft specified as ha_storage2020-01-21T21:56:07.762Z [DEBUG] core: unseal key supplied
2020-01-21T21:56:07.762Z [DEBUG] core: cannot unseal, not enough keys: keys=1 threshold=3 nonce=0f9a3758-3419-e3ae-d784-ddc95dd7e44f
2020-01-21T21:56:12.610Z [DEBUG] core: unseal key supplied
2020-01-21T21:56:12.610Z [DEBUG] core: cannot unseal, not enough keys: keys=2 threshold=3 nonce=0f9a3758-3419-e3ae-d784-ddc95dd7e44f
2020-01-21T21:56:17.835Z [DEBUG] core: unseal key supplied
2020-01-21T21:56:17.836Z [DEBUG] core: starting cluster listeners
2020-01-21T21:56:17.836Z [INFO] core.cluster-listener: starting listener: listener_address=127.0.0.2:8201
2020-01-21T21:56:17.836Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=127.0.0.2:8201
2020-01-21T21:56:17.836Z [TRACE] storage.raft: setting up raft cluster
2020-01-21T21:56:17.837Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:4d47c7fa-1d37-0c8c-c198-04f3e63b9779 Address:127.0.0.2:8201}]"
2020-01-21T21:56:17.837Z [INFO] core: vault is unsealed
2020-01-21T21:56:17.837Z [INFO] storage.raft: entering follower state: follower="Node at 127.0.0.2:8201 [Follower]" leader=
2020-01-21T21:56:17.838Z [INFO] core: entering standby mode
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4c pc=0x22f7203]
goroutine 188 [running]:
github.com/hashicorp/vault/vendor/github.com/hashicorp/raft.(*raftState).getState(...)
/go/src/github.com/hashicorp/vault/vendor/github.com/hashicorp/raft/state.go:78
github.com/hashicorp/vault/vendor/github.com/hashicorp/raft.(*Raft).State(...)
/go/src/github.com/hashicorp/vault/vendor/github.com/hashicorp/raft/api.go:942
github.com/hashicorp/vault/physical/raft.(*RaftLock).Lock(0xc0005b15c0, 0xc0005c01e0, 0xc0005d9d18, 0x40c5b8, 0x30)
/go/src/github.com/hashicorp/vault/physical/raft/raft.go:967 +0x63
github.com/hashicorp/vault/vault.(*Core).acquireLock(0xc0000e4580, 0x3aa15c0, 0xc0005b15c0, 0xc0005c01e0, 0x24)
/go/src/github.com/hashicorp/vault/vault/ha.go:868 +0x56
github.com/hashicorp/vault/vault.(*Core).waitForLeadership(0xc0000e4580, 0x0, 0xc000095bc0, 0xc0005c01e0)
/go/src/github.com/hashicorp/vault/vault/ha.go:418 +0x13c
github.com/hashicorp/vault/vault.(*Core).runStandby.func7(0x0, 0x0)
/go/src/github.com/hashicorp/vault/vault/ha.go:373 +0x45
github.com/hashicorp/vault/vendor/github.com/oklog/run.(*Group).Run.func1(0xc0006a9bc0, 0xc0005b1590, 0xc0005f6680)
/go/src/github.com/hashicorp/vault/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/hashicorp/vault/vendor/github.com/oklog/run.(*Group).Run
/go/src/github.com/hashicorp/vault/vendor/github.com/oklog/run/group.go:37 +0xbe
Expected behavior
Vault should allow the use of raft as a ha_storage and operate with normal vault ha semantics
Location response headerEnvironment:
vault status): 1.2.3 and 1.3.1vault version): 1.2.3 and 1.3.1Vault server configuration file(s):
log_level="trace"
ha_storage "raft" {
path = "/tmp/vault"
ha_enabled = "true"
}
storage "mysql" {
username = "vault"
password = "[redacted]"
address = "127.0.0.1"
ha_enabled = "true"
}
listener "tcp" {
address = "127.0.0.2:8200"
tls_disable = 1
}
api_addr="https://127.0.0.2:8200"
cluster_addr="https://127.0.0.2:8201"
log_level="trace"
ha_storage "raft" {
path = "/tmp/vaultha"
ha_enabled = "true"
}
storage "raft" {
path = "/tmp/vault"
ha_enabled = "true"
}
listener "tcp" {
address = "127.0.0.2:8200"
tls_disable = 1
}
api_addr="https://127.0.0.2:8200"
cluster_addr="https://127.0.0.2:8201"
Additional context
The vault core documentation says that any storage backend which supports HA may be used in the ha_storage clause of the config.
Hi @drawks
This isn't currently supported, but we should update the logic so it errors instead of crashes for now. Also we can update the documentation.
I'll leave this issue open to track the actual feature request here, which is supporting raft as an ha_storage option.
It seems, to me, an odd design decision to not allow the internal raft storage mechanism as ha_storage. Is there a specific reason why you'd choose to "fix" this by having an error be raised instead of allowing the raft leadership election to handle the HA component and allowing a different storage backend that may provide for accommodating some other site or business specific cause for selection?
Imagine, a shop that has all the mechanisms in place for running mariadb at scale with all the trimmings of sane replication, backup/restore run books, and a strong understanding of all the administrative details. Forcing raft storage in order to use the raft election ends up being a baby with the bathwater situation.
It is something we want to support, but it's a big lift. So in the meantime I want to fix the panic, return an error, and update documentation to improve the UX.
In the longer run we do indeed want to fix this by allowing raft to be used as a ha_storage option.
I've opened #8239 to patch this and document the limitation.
Hello - this has been "fixed" in #8239, meaning you can no longer attempt to use Raft for HA coordination alone.
Going to re-open to continue to track the feature request to support ha_storage and raft.
Most helpful comment
Going to re-open to continue to track the feature request to support
ha_storageand raft.