Environment:
Consul Version: Consul v0.8.1
Vault Version: Vault v0.10.0 ('5dd7f25f5c4b541f2da62d70075b6f82771a650d') but the issue originally occurred with the "stock" Vault v0.7.2 ('d28dd5a018294562dbc9a18c95554d52b5d12390')
Vault Config File:
storage "consul" {
address = "10.0.0.1:8500"
path = "vault"
token = "ZZZ239ZZ-ZZZZ-ZZZZ-bZZZ-ZZ0cb69ZZZZZ"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}```
```hcl
# Paste your Vault config here.
# Be sure to scrub any sensitive values
Startup Log Output:
vault log:
==> Vault server configuration:
Api Address: http://10.0.0.1:8200
Cgo: disabled
Cluster Address: https://10.0.0.1:8201
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", tls: "disabled")
Log Level: trace
Mlock: supported: true, enabled: true
Storage: consul (HA available)
Version: Vault v0.10.0
Version Sha: 5dd7f25f5c4b541f2da62d70075b6f82771a650d
==> Vault server started! Log data will stream in below:
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config path set: path=vault
2018-04-23T23:27:57.996Z [WARN ] storage.consul: appending trailing forward slash to path
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config disable_registration set: disable_registration=false
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config service set: service=vault
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config service_tags set: service_tags=
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config service_address set: service_address=<nil>
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config address set: address=10.0.0.1:8500
2018-04-23T23:27:57.996Z [DEBUG] storage.consul: config token set
2018-04-23T23:27:57.997Z [DEBUG] storage.cache: creating LRU cache: size=0
2018-04-23T23:27:58.007Z [DEBUG] cluster listener addresses synthesized: cluster_addresses=[0.0.0.0:8201]
2018-04-23T23:28:14.636Z [DEBUG] core: cannot unseal, not enough keys: keys=1 threshold=3 nonce=9a5c1e39-b9c8-7863-8ba0-399a45378e67
2018-04-23T23:28:14.672Z [DEBUG] core: cannot unseal, not enough keys: keys=2 threshold=3 nonce=9a5c1e39-b9c8-7863-8ba0-399a45378e67
2018-04-23T23:28:14.708Z [INFO ] core: vault is unsealed
2018-04-23T23:28:14.708Z [INFO ] core: entering standby mode
2018-04-23T23:28:14.723Z [INFO ] core: acquired lock, enabling active operation
2018-04-23T23:28:14.728Z [TRACE] core: generating cluster private key
2018-04-23T23:28:14.741Z [DEBUG] core: generating local cluster certificate
2018-04-23T23:28:14.758Z [INFO ] core: post-unseal setup starting
2018-04-23T23:28:14.758Z [DEBUG] core: clearing forwarding clients
2018-04-23T23:28:14.758Z [DEBUG] core: done clearing forwarding clients
2018-04-23T23:28:14.759Z [INFO ] core: loaded wrapping token key
2018-04-23T23:28:14.759Z [INFO ] core: successfully setup plugin catalog: plugin-directory=
2018-04-23T23:28:14.762Z [INFO ] core: successfully mounted backend: type=generic path=secret/
2018-04-23T23:28:14.762Z [INFO ] core: successfully mounted backend: type=system path=sys/
2018-04-23T23:28:14.763Z [INFO ] core: successfully mounted backend: type=identity path=identity/
2018-04-23T23:28:14.763Z [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2018-04-23T23:28:14.773Z [INFO ] core: restoring leases
2018-04-23T23:28:14.773Z [INFO ] rollback: starting rollback manager
2018-04-23T23:28:14.773Z [DEBUG] expiration: collecting leases
2018-04-23T23:28:14.777Z [DEBUG] audit: adding reload function: path=file/
2018-04-23T23:28:14.777Z [DEBUG] audit: file backend options: path=file/ file_path=/tmp/vault_audit.log
2018-04-23T23:28:14.777Z [DEBUG] identity: loading entities
2018-04-23T23:28:14.779Z [DEBUG] identity: entities collected: num_existing=0
2018-04-23T23:28:14.780Z [INFO ] identity: entities restored
2018-04-23T23:28:14.780Z [DEBUG] identity: identity loading groups
2018-04-23T23:28:14.782Z [DEBUG] identity: groups collected: num_existing=0
2018-04-23T23:28:14.782Z [INFO ] identity: groups restored
2018-04-23T23:28:14.783Z [DEBUG] expiration: leases collected: num_existing=5
2018-04-23T23:28:14.784Z [DEBUG] core: starting cluster listeners
2018-04-23T23:28:14.784Z [DEBUG] core: cluster listener setup function
2018-04-23T23:28:14.784Z [DEBUG] core: clearing forwarding clients
2018-04-23T23:28:14.784Z [DEBUG] core: done clearing forwarding clients
2018-04-23T23:28:14.784Z [DEBUG] core: leaving cluster listener setup function
2018-04-23T23:28:14.784Z [INFO ] core: post-unseal setup complete
2018-04-23T23:28:14.784Z [INFO ] core: core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2018-04-23T23:28:14.784Z [INFO ] core: core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2018-04-23T23:28:14.785Z [ERROR] expiration: error restoring leases: error="failed to read lease entry: decryption failed: cipher: message authentication failed"
2018-04-23T23:28:14.785Z [ERROR] core: shutting down
2018-04-23T23:28:14.785Z [DEBUG] core: shutdown called
2018-04-23T23:28:14.785Z [DEBUG] core: shutdown initiating internal seal
2018-04-23T23:28:14.785Z [DEBUG] core: shutdown running internal seal
2018-04-23T23:28:14.785Z [DEBUG] core: marked as sealed
2018-04-23T23:28:14.785Z [DEBUG] core: clearing forwarding clients
2018-04-23T23:28:14.785Z [DEBUG] core: done clearing forwarding clients
2018-04-23T23:28:14.785Z [DEBUG] core: finished triggering standbyStopCh for runStandby
2018-04-23T23:28:14.785Z [INFO ] core: pre-seal teardown starting
2018-04-23T23:28:14.785Z [INFO ] core: stopping cluster listeners
2018-04-23T23:28:14.785Z [DEBUG] core: waiting for success notification while stopping cluster listeners
2018-04-23T23:28:14.785Z [INFO ] core: shutting down forwarding rpc listeners
2018-04-23T23:28:14.786Z [INFO ] core: forwarding rpc listeners stopped
2018-04-23T23:28:15.285Z [INFO ] core: rpc listeners successfully shut down
2018-04-23T23:28:15.285Z [INFO ] core: cluster listeners successfully shut down
2018-04-23T23:28:15.285Z [DEBUG] audit: removing reload function: path=file/
2018-04-23T23:28:15.285Z [DEBUG] expiration: stop triggered
2018-04-23T23:28:15.285Z [DEBUG] expiration: finished stopping
2018-04-23T23:28:15.285Z [INFO ] rollback: stopping rollback manager
2018-04-23T23:28:15.285Z [INFO ] core: pre-seal teardown complete
2018-04-23T23:28:15.296Z [DEBUG] core: stop channel triggered in runStandby
2018-04-23T23:28:15.296Z [DEBUG] core: closed periodic key rotation checker stop channel
2018-04-23T23:28:15.296Z [DEBUG] core: closed periodic leader refresh stop channel
2018-04-23T23:28:15.296Z [DEBUG] core: periodic leader refresh returned
2018-04-23T23:28:15.296Z [DEBUG] core: runStandby done
2018-04-23T23:28:15.296Z [DEBUG] core: sealing barrier
2018-04-23T23:28:15.296Z [INFO ] core: vault is sealed
^C2018-04-23T23:28:16.313Z [INFO ] storage.consul: shutting down consul backend
==> Vault shutdown triggered
2018-04-23T23:28:16.313Z [DEBUG] core: shutdown called
2018-04-23T23:28:16.313Z [DEBUG] core: shutdown initiating internal seal
2018-04-23T23:28:16.313Z [DEBUG] core: shutdown running internal seal
Consul log:
2018/04/19 01:34:27 [INFO] snapshot: Creating new snapshot at /opt/consul/data/raft/snapshots/38176-3886900-152410166ZZZZ.tmp
2018/04/19 01:34:27 [ERR] snapshot: Failed to make snapshot directory: mkdir /opt/consul/data/raft/snapshots/38176-3886900-152410166ZZZZ.tmp: no space left on device
2018/04/19 01:34:27 [ERR] raft: Failed to create snapshot to install: mkdir /opt/consul/data/raft/snapshots/38176-3886900-152410166ZZZZ.tmp: no space left on device
2018/04/19 01:34:27 [ERR] raft-net: Failed to decode incoming command: unknown rpc type 129
2018/04/19 01:34:28 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.0.1:8200:vault-sealed-check?note=Vault+Unsealed, error: failed persisting state for check "vault:10.0.0.1:8200:vault-sealed-check": failed writing temp file "/opt/consul/data/checks/state/022f1120b0a8fa292a4e8b2a700aYYYY.tmp": write /opt/consul/data/checks/state/022f1120b0a8fa292a4e8b2a700aYYYY.tmp: no space left on device from=10.0.0.1:39628
2018/04/19 01:34:29 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.0.1:8200:vault-sealed-check?note=Vault+Unsealed, error: failed persisting state for check "vault:10.0.0.1:8200:vault-sealed-check": failed writing temp file "/opt/consul/data/checks/state/022f1120b0a8fa292a4e8b2a700aYYYY.tmp": write /opt/consul/data/checks/state/022f1120b0a8fa292a4e8b2a700aYYYY.tmp: no space left on device from=10.0.0.1:39628
Expected Behavior:
Vault should successfully boot and run in decreased functionality mode as active node and respond to requests using root token.
Actual Behavior:
Appears to read a corrupt lease and then shuts down. Vault will briefly choose itself as leader, but command vault status never shows state go from standby to active.
Steps to Reproduce:
Important Factoids:
I inherited this setup, had been running for 322 days without issue before issue occured. Running cluster of 3 consul, each server also running vault. Also, vault stores perhaps less than 128KB of user data total, used only 3-5 times a week. Server after restart has 3.4G available. We have critical data in the system and at this point are just looking for a way to read/export the data or temporarily bypass the lease-read. I appear to have 5 entries in
VAULT/SYS/EXPIRE/ID/AUTH/TOKEN/CREATE/ which I presume are the leases, and they are all non-zero length.
References:
This topic refers to a very similar issue, although in their case they are using file storage, rather than consul: https://groups.google.com/forum/#!topic/vault-tool/l6EZUYlu3Vw Their solution was to delete corrupt files, but again, they are using a different back end
If the storage saved corrupt data, there's no recovering it, so you can't read the lease. But you could manually delete that lease entry from the underlying storage. The danger there is just that Vault will no longer track what that lease was tracking, so something could live longer than it should (as in forever).
Thanks for the quick reply. Can I safely delete all leases, but still read all data with the root token? This is the correct path? VAULT/SYS/EXPIRE/ID/AUTH/TOKEN/CREATE/
Deleting all the children values in VAULT/SYS/EXPIRE/ID/AUTH/TOKEN/CREATE/ resolved the issue, closing, thanks sir
I ran into this as well.
I think our server that runs vault got hard powered off maybe and foobared vault.
I tried your above step, but the only thing that worked was a full restore : [
We ran into this twice this week because our Vault servers were patched (they are Windows).
I followed your steps of deleting all the child values under:
VAULT\SYS\EXPIRE\ID\AUTH\TOKEN\CREATE
But I was still unable to unseal the vault.
After deleting all the child vaulues under:
VAULT\SYS\EXPIRE\ID\AUTH\LDAP\LOGIN
I was able to unseal the vault.
Before:
2018-12-06T12:15:34.272-0500 [INFO ] core: vault is unsealed
2018-12-06T12:15:34.273-0500 [INFO ] core: post-unseal setup starting
2018-12-06T12:15:34.274-0500 [INFO ] core: loaded wrapping token key
2018-12-06T12:15:34.274-0500 [INFO ] core: successfully setup plugin catalog: plugin-directory=
2018-12-06T12:15:34.275-0500 [INFO ] core: successfully mounted backend: type=kv path=secret/
2018-12-06T12:15:34.276-0500 [INFO ] core: successfully mounted backend: type=system path=sys/
2018-12-06T12:15:34.277-0500 [INFO ] core: successfully mounted backend: type=identity path=identity/
2018-12-06T12:15:34.279-0500 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2018-12-06T12:15:34.281-0500 [INFO ] rollback: starting rollback manager
2018-12-06T12:15:34.281-0500 [INFO ] core: restoring leases
2018-12-06T12:15:34.284-0500 [INFO ] identity: entities restored
2018-12-06T12:15:34.284-0500 [INFO ] identity: groups restored
2018-12-06T12:15:34.285-0500 [INFO ] core: post-unseal setup complete
2018-12-06T12:15:34.285-0500 [ERROR] expiration: error restoring leases: error="failed to read lease entry: decryption f
ailed: cipher: message authentication failed"
2018-12-06T12:15:34.286-0500 [ERROR] core: shutting down
2018-12-06T12:15:34.286-0500 [INFO ] core: pre-seal teardown starting
2018-12-06T12:15:34.287-0500 [INFO ] core: cluster listeners not running
2018-12-06T12:15:34.287-0500 [INFO ] rollback: stopping rollback manager
2018-12-06T12:15:34.287-0500 [INFO ] core: pre-seal teardown complete
2018-12-06T12:15:34.288-0500 [INFO ] core: vault is sealed
After:
2018-12-06T12:16:47.470-0500 [INFO ] core: vault is unsealed
2018-12-06T12:16:47.471-0500 [INFO ] core: post-unseal setup starting
2018-12-06T12:16:47.472-0500 [INFO ] core: loaded wrapping token key
2018-12-06T12:16:47.472-0500 [INFO ] core: successfully setup plugin catalog: plugin-directory=
2018-12-06T12:16:47.473-0500 [INFO ] core: successfully mounted backend: type=kv path=secret/
2018-12-06T12:16:47.473-0500 [INFO ] core: successfully mounted backend: type=system path=sys/
2018-12-06T12:16:47.474-0500 [INFO ] core: successfully mounted backend: type=identity path=identity/
2018-12-06T12:16:47.476-0500 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2018-12-06T12:16:47.477-0500 [INFO ] rollback: starting rollback manager
2018-12-06T12:16:47.477-0500 [INFO ] core: restoring leases
2018-12-06T12:16:47.479-0500 [INFO ] expiration: lease restore complete
2018-12-06T12:16:47.480-0500 [INFO ] identity: entities restored
2018-12-06T12:16:47.480-0500 [INFO ] identity: groups restored
2018-12-06T12:16:47.481-0500 [INFO ] core: post-unseal setup complete
Most helpful comment
We ran into this twice this week because our Vault servers were patched (they are Windows).
I followed your steps of deleting all the child values under:
VAULT\SYS\EXPIRE\ID\AUTH\TOKEN\CREATE
But I was still unable to unseal the vault.
After deleting all the child vaulues under:
VAULT\SYS\EXPIRE\ID\AUTH\LDAP\LOGIN
I was able to unseal the vault.
Before:
After: