Vault: Panic due to concurrent map writes

Created on 16 May 2018  路  6Comments  路  Source: hashicorp/vault

Environment:

  • Vault Version: v0.10.0
  • Operating System/Architecture: Ubuntu 16.04.3 LTS

Vault Config File:

ui = true

backend "consul" {
  address = "127.0.0.1:8500"
  path = "vault"
  redirect_addr = "http://10.101.22.237:8200"
  token = "xxxxxx"
}

ha_backend "consul" {
  address = "127.0.0.1:8500"
  path = "vault"
  redirect_addr = "http://10.101.22.237:8200"
  token = "xxxxx"
}

listener "tcp" {
  address = "127.0.0.1:8200"
  tls_key_file = "/etc/vault/vault.key"
  tls_cert_file = "/etc/vault/vault.cer"
}

listener "tcp" {
 address = "10.101.22.237:8200"
 tls_key_file = "/etc/vault/vault.key"
 tls_cert_file = "/etc/vault/vault.cer"
}

listener "tcp" {
 address = "10.101.22.237:1867"
 tls_key_file = "/etc/vault/vault.service.key"
 tls_cert_file = "/etc/vault/vault.service.cer"
}

telemetry {
 statsd_address = "127.0.0.1:8125"
}

Startup Log Output:

May  2 21:28:35 ip-10-101-22-237 vault[5440]: ==> Vault server configuration:
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               HA Storage: consul
May  2 21:28:35 ip-10-101-22-237 vault[5440]:              Api Address: http://10.101.22.237:8200
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                      Cgo: disabled
May  2 21:28:35 ip-10-101-22-237 vault[5440]:          Cluster Address: https://10.101.22.237:8201
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "127.0.0.1:8201", tls: "enabled")
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               Listener 2: tcp (addr: "10.101.22.237:8200", cluster address: "10.101.22.237:8201", tls: "enabled")
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               Listener 3: tcp (addr: "10.101.22.237:1867", cluster address: "10.101.22.237:1868", tls: "enabled")
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                Log Level: info
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                    Mlock: supported: true, enabled: true
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                  Storage: consul
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                  Version: Vault v0.10.0
May  2 21:28:35 ip-10-101-22-237 vault[5440]:              Version Sha: 5dd7f25f5c4b541f2da62d70075b6f82771a650d
May  2 21:28:35 ip-10-101-22-237 vault[5440]: ==> Vault server started! Log data will stream in below:

We have only encountered this once, and have not been able to reproduce but we experienced a crash of vault due to a panic around concurrent writes to a map & figured it was worth reporting - link to crash log gist provided in references.

Important Factoids:
The instance that crashed was the active instance in an HA cluster. Since it looks to be around ACL related code it is worth noting that we are using AWS IAM auth to retrieve tokens for lambdas. Probably also worth mentioning that we are also using vault to manage credentials for RabbitMQ through the RabbitMQ secrets engine, credentials for MySQL through the database secrets engine and IAM credentials through the AWS secrets engine.

References:
Crash log: https://gist.github.com/jonsabados/712c915c992c925a66f4d4f66e7fbd68

Most helpful comment

Great, fix will be in 0.10.2! Thanks!

All 6 comments

Many thanks for reporting this -- any panic is a bug that should be fixed, and providing the log is super duper helpful. We'll update when we know more.

Hi there,

I think I've figured out the issue but it involves a specific set of circumstances once all policies given to a token are being evaluated together, namely:

  • Three or more policy path statements with the same path
  • Two or more of those path statements containing allowed_parameters (again, for the same path)

Can you confirm/deny?

would a wildcard + two exact matches count for Three or more policy path statements with the same path? If so then yup, we have a policy roughly along the lines of

path "sys/*" {
  policy = "deny"
}

path "sys/leases/lookup" {
  capabilities = ["update"],
  allowed_parameters = {
    "lease_id" = ["database/creds/someservice*"],
  }
}

path "sys/leases/lookup" {
  capabilities = ["update"],
  allowed_parameters = {
    "lease_id" = ["aws/sts/someservice*"],
  }
}

given to tokens that would have been in use right around the time of the crash

Are you using the default policy? The current version of that policy contains:

# Allow looking up lease properties. This requires knowing the lease ID ahead
# of time and does not divulge any sensitive information.
path "sys/leases/lookup" {
    capabilities = ["update"]
}

If you are, it would indeed mean that you have three exact matches.

Indeed we are, so looks like were a match for the circumstances you described

Great, fix will be in 0.10.2! Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jasonmcintosh picture jasonmcintosh  路  3Comments

adamroddick picture adamroddick  路  3Comments

ngunia picture ngunia  路  3Comments

0x9090 picture 0x9090  路  3Comments

passwordleak picture passwordleak  路  3Comments