vault crashes on unseal when raft is used as ha_storage

Created on 21 Jan 2020  路  6Comments  路  Source: hashicorp/vault

Describe the bug
Vault crashes with a seg fault immediately upon unseal when using raft as ha_sotrage
To Reproduce
Steps to reproduce the behavior:

  1. Configure vault with raft specified as ha_storage
  2. Initialize vault
  3. Unseal vault
  4. Observe crash
2020-01-21T21:56:07.762Z [DEBUG] core: unseal key supplied
2020-01-21T21:56:07.762Z [DEBUG] core: cannot unseal, not enough keys: keys=1 threshold=3 nonce=0f9a3758-3419-e3ae-d784-ddc95dd7e44f
2020-01-21T21:56:12.610Z [DEBUG] core: unseal key supplied
2020-01-21T21:56:12.610Z [DEBUG] core: cannot unseal, not enough keys: keys=2 threshold=3 nonce=0f9a3758-3419-e3ae-d784-ddc95dd7e44f
2020-01-21T21:56:17.835Z [DEBUG] core: unseal key supplied
2020-01-21T21:56:17.836Z [DEBUG] core: starting cluster listeners
2020-01-21T21:56:17.836Z [INFO]  core.cluster-listener: starting listener: listener_address=127.0.0.2:8201
2020-01-21T21:56:17.836Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=127.0.0.2:8201
2020-01-21T21:56:17.836Z [TRACE] storage.raft: setting up raft cluster
2020-01-21T21:56:17.837Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:4d47c7fa-1d37-0c8c-c198-04f3e63b9779 Address:127.0.0.2:8201}]"
2020-01-21T21:56:17.837Z [INFO]  core: vault is unsealed
2020-01-21T21:56:17.837Z [INFO]  storage.raft: entering follower state: follower="Node at 127.0.0.2:8201 [Follower]" leader=
2020-01-21T21:56:17.838Z [INFO]  core: entering standby mode
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4c pc=0x22f7203]

goroutine 188 [running]:
github.com/hashicorp/vault/vendor/github.com/hashicorp/raft.(*raftState).getState(...)
    /go/src/github.com/hashicorp/vault/vendor/github.com/hashicorp/raft/state.go:78
github.com/hashicorp/vault/vendor/github.com/hashicorp/raft.(*Raft).State(...)
    /go/src/github.com/hashicorp/vault/vendor/github.com/hashicorp/raft/api.go:942
github.com/hashicorp/vault/physical/raft.(*RaftLock).Lock(0xc0005b15c0, 0xc0005c01e0, 0xc0005d9d18, 0x40c5b8, 0x30)
    /go/src/github.com/hashicorp/vault/physical/raft/raft.go:967 +0x63
github.com/hashicorp/vault/vault.(*Core).acquireLock(0xc0000e4580, 0x3aa15c0, 0xc0005b15c0, 0xc0005c01e0, 0x24)
    /go/src/github.com/hashicorp/vault/vault/ha.go:868 +0x56
github.com/hashicorp/vault/vault.(*Core).waitForLeadership(0xc0000e4580, 0x0, 0xc000095bc0, 0xc0005c01e0)
    /go/src/github.com/hashicorp/vault/vault/ha.go:418 +0x13c
github.com/hashicorp/vault/vault.(*Core).runStandby.func7(0x0, 0x0)
    /go/src/github.com/hashicorp/vault/vault/ha.go:373 +0x45
github.com/hashicorp/vault/vendor/github.com/oklog/run.(*Group).Run.func1(0xc0006a9bc0, 0xc0005b1590, 0xc0005f6680)
    /go/src/github.com/hashicorp/vault/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/hashicorp/vault/vendor/github.com/oklog/run.(*Group).Run
    /go/src/github.com/hashicorp/vault/vendor/github.com/oklog/run/group.go:37 +0xbe

Expected behavior
Vault should allow the use of raft as a ha_storage and operate with normal vault ha semantics

  1. use raft consensus to elect a leader to act as the active node
  2. standby nodes should forward queries to the active node
  3. when a client specifies that they don't want query forwarding standby nodes should respond with a 307 redirect with the active node provided in a Location response header

Environment:

  • Vault Server Version (retrieve with vault status): 1.2.3 and 1.3.1
  • Vault CLI Version (retrieve with vault version): 1.2.3 and 1.3.1
  • Server Operating System/Architecture: Centos 6

Vault server configuration file(s):

log_level="trace"

ha_storage "raft" {
    path        = "/tmp/vault"
    ha_enabled  = "true"
}
storage "mysql" { 
    username    = "vault"
    password    = "[redacted]"
    address     = "127.0.0.1"
    ha_enabled  = "true"
}

listener "tcp" {
    address = "127.0.0.2:8200"
    tls_disable = 1
}

api_addr="https://127.0.0.2:8200"
cluster_addr="https://127.0.0.2:8201"
log_level="trace"

ha_storage "raft" {
    path        = "/tmp/vaultha"
    ha_enabled  = "true"
}

storage "raft" {
    path        = "/tmp/vault"
    ha_enabled  = "true"
}

listener "tcp" {
    address = "127.0.0.2:8200"
    tls_disable = 1 
}

api_addr="https://127.0.0.2:8200"
cluster_addr="https://127.0.0.2:8201"

Additional context
The vault core documentation says that any storage backend which supports HA may be used in the ha_storage clause of the config.

bug hraft versio1.3.x

Most helpful comment

Going to re-open to continue to track the feature request to support ha_storage and raft.

All 6 comments

Hi @drawks

This isn't currently supported, but we should update the logic so it errors instead of crashes for now. Also we can update the documentation.

I'll leave this issue open to track the actual feature request here, which is supporting raft as an ha_storage option.

It seems, to me, an odd design decision to not allow the internal raft storage mechanism as ha_storage. Is there a specific reason why you'd choose to "fix" this by having an error be raised instead of allowing the raft leadership election to handle the HA component and allowing a different storage backend that may provide for accommodating some other site or business specific cause for selection?

Imagine, a shop that has all the mechanisms in place for running mariadb at scale with all the trimmings of sane replication, backup/restore run books, and a strong understanding of all the administrative details. Forcing raft storage in order to use the raft election ends up being a baby with the bathwater situation.

It is something we want to support, but it's a big lift. So in the meantime I want to fix the panic, return an error, and update documentation to improve the UX.

In the longer run we do indeed want to fix this by allowing raft to be used as a ha_storage option.

I've opened #8239 to patch this and document the limitation.

Hello - this has been "fixed" in #8239, meaning you can no longer attempt to use Raft for HA coordination alone.

Going to re-open to continue to track the feature request to support ha_storage and raft.

Was this page helpful?
0 / 5 - 0 ratings