Consul: Consul fails to register vault-sealed-check for vault

Created on 7 Mar 2019  ·  23Comments  ·  Source: hashicorp/consul

Overview of the Issue

I am currently running vault server on the same hosts as my consul servers. When upgrading from consul 1.4.0 to consul 1.4.3 vault now fails to register its sealed check with consul 1.4.3 so sealed status is never being reported to consul and im getting the following log messages piling up in my consul and vault logs.

Reproduction Steps

1) Install consul server 1.4.3 on a host as server
2) Configure and install Vault server 1.0.2 on same host.
3) Setup consul backend for vault
4) Unseal vault
5) Unseal checks start failing trying to report a check status to consul that does not exist.
6) Rollback to consul 1.4.0 and seal status checks work correctly.

Operating system and Environment details

Official Docker container for consul 1.4.3 and vault 1.0.2

Log Fragments

Vault Logs

```Vault server configuration:

         Api Address: https://192.168.28.63:8200
                 Cgo: disabled
     Cluster Address: https://192.168.28.63:8201
          Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "192.168.28.63:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
           Log Level: (not set)
               Mlock: supported: true, enabled: true
             Storage: consul (HA available)
             Version: Vault v1.0.2
         Version Sha: 37a1dc9c477c1c68c022d2084550f25bf20cac33

==> Vault server started! Log data will stream in below:

2019-03-07T15:26:05.882Z [WARN] no api_addr value specified in config or in VAULT_API_ADDR; falling back to detection if possible, but this value should be manually set
2019-03-07T15:26:08.434Z [INFO] core: vault is unsealed
2019-03-07T15:26:08.434Z [INFO] core: entering standby mode
2019-03-07T15:26:27.479Z [WARN] storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:192.168.28.63:8200:vault-sealed-check")"

#### Consul Logs 

2019/03/07 15:25:58 [INFO] agent: Deregistered service "vault:192.168.28.63:8200"
2019/03/07 15:25:58 [INFO] agent: Deregistered check "e408baff1455ac4cab95892718cd7494f61693ff"
2019/03/07 15:25:58 [INFO] agent: Deregistered check "mem-util"
2019/03/07 15:25:58 [INFO] agent: Deregistered check "dsk-util"
2019/03/07 15:25:58 [INFO] agent: Deregistered check "a75809917d97ead0eaebb52cfeabe012dc47abc7"
2019/03/07 15:25:58 [INFO] agent: Deregistered check "vault:192.168.28.63:8200:vault-sealed-check"
2019/03/07 15:26:05 [INFO] agent: Synced service "vault:192.168.28.63:8200"
2019/03/07 15:26:05 [INFO] agent: Synced check "vault:192.168.28.63:8200:vault-sealed-check"
2019/03/07 15:26:08 [INFO] agent: Synced check "vault:192.168.28.63:8200:vault-sealed-check"
2019/03/07 15:26:20 [INFO] agent: Synced check "e408baff1455ac4cab95892718cd7494f61693ff"
2019/03/07 15:26:20 [INFO] agent: Synced check "a75809917d97ead0eaebb52cfeabe012dc47abc7"
2019/03/07 15:26:22 [INFO] agent: Deregistered service "_nomad-server-acfqsqjxbgap2cvwlfedkvyx55twlbsl"
2019/03/07 15:26:22 [INFO] agent: Deregistered check "vault:192.168.28.63:8200:vault-sealed-check"
2019/03/07 15:26:22 [INFO] agent: Deregistered check "e408baff1455ac4cab95892718cd7494f61693ff"
2019/03/07 15:26:22 [INFO] agent: Deregistered check "a75809917d97ead0eaebb52cfeabe012dc47abc7"
2019/03/07 15:26:22 [INFO] agent: Deregistered service "_nomad-server-fwtmi5j5ajxekyuajonqk4pwlauj3dko"
2019/03/07 15:26:22 [INFO] agent: Deregistered service "_nomad-server-p4pxc2srsurolx43mc7lztdb7fwnwbh3"
2019/03/07 15:26:22 [ERR] http: Request PUT /v1/agent/check/deregister/a75809917d97ead0eaebb52cfeabe012dc47abc7, error: Unknown check "a75809917d97ead0eaebb52cfeabe012dc47abc7" from=127.0.0.1:51522
2019/03/07 15:26:22 [ERR] http: Request PUT /v1/agent/check/deregister/e408baff1455ac4cab95892718cd7494f61693ff, error: Unknown check "e408baff1455ac4cab95892718cd7494f61693ff" from=127.0.0.1:51522
2019/03/07 15:26:25 [WARN] agent: Check "e408baff1455ac4cab95892718cd7494f61693ff" socket connection failed: dial tcp 0.0.0.0:4648: connect: connection refused
2019/03/07 15:26:27 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:27 [WARN] agent: Check "vault:192.168.28.63:8200:vault-sealed-check" missed TTL, is now critical
2019/03/07 15:26:28 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:29 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:30 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:31 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:32 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:33 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576
2019/03/07 15:26:34 [ERR] http: Request PUT /v1/agent/check/pass/vault:192.168.28.63:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:192.168.28.63:8200:vault-sealed-check" from=127.0.0.1:51576

```

needs-investigation themconsul-vault

Most helpful comment

I'm using vault 1.1.2 and consul 1.5.3

Still seen these errors , what could of I done wrong?
:8200:vault-sealed-check" missed TTL, is now critical

All 23 comments

Same issue here. It happens right after stopping/restarting nomad agent connected to the same consul node. Not sure though what app is exactly to blame: nomad, consul or vault.

Vault manages the various checks here via its integration as a storage backend. However, this could be a bug introduced by a change to Consul APIs. This means a fix for this, if proven to be a bug, would likely end up in Vault.

Issue is fixed in 1.4.4, GH-5456

I'm using vault 1.1.2 and consul 1.5.3

Still seen these errors , what could of I done wrong?
:8200:vault-sealed-check" missed TTL, is now critical

I have the same with consul 1.6.0 and vault 1.1.3

Hello everyone,

I'm facing the same issue. I already have a Consul cluster deployed in Kubernetes (with ACL), and now I'm trying to deploy Vault in the same cluster. However, i'm facing the same issue.

This is my Vault config:

      storage "consul" {
        address = "<CONSUL_SERVICE_NAME>:8500"
        token = "<CONSUL_TOKEN>"
        scheme = "http"
        path = "vault/"
      }

      listener "tcp" {
        address          = "0.0.0.0:8200"
        tls_disable      = "true"
      }

      ui = true
      log_level = "Info"

      api_addr = "https://<CONSUL_POD_IP>:8200"
      cluster_addr = "https://<CONSUL_POD_IP>:8201"

And this is my Vault logs:

2019-10-31T11:06:21.317Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"
2019-10-31T11:06:22.321Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"
2019-10-31T11:06:23.325Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"
2019-10-31T11:06:24.336Z [WARN]  storage.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:10.244.1.165:8200:vault-sealed-check")"

And here, below you can find my consul logs:

    2019/10/31 11:07:22 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.1.165:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.244.1.165:8200:vault-sealed-check" from=10.244.1.77:43020
    2019/10/31 11:07:23 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.1.165:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.244.1.165:8200:vault-sealed-check" from=10.244.1.77:43020
    2019/10/31 11:07:24 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.244.1.165:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.244.1.165:8200:vault-sealed-check" from=10.244.1.77:43020

What am i missing?

//Vault version: 1.2.3
//Consul version: 1.6.0

--- UPDATE ---
Ignoring that logs, everything looks working fine. I just create a test kv, and it works:

~/infrastructure/kubernetes/vault master ⇡2 !6 ?1 ❯ vault list cubbyhole/                                                                                        
Keys
----
first

~/infrastructure/kubernetes/vault master ⇡2 !6 ?1 ❯ vault kv get cubbyhole/first/                                                                               
====== Data ======
Key         Value
---         -----

Consul 1.6.1
Vault 1.2.3
Have a similar issue in Docker Swarm after some time with a Consul cluster of 3 server agents and 0 client agents, a single Vault instance. The health check passes for at least a day before going critical. Pasting the docker-compose snippets:

Vault logs:

...
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:07.108749Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:08.110733Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:09.117084Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:10.119469Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:11.121813Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:12.123746Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
{"@level":"warn","@message":"check unable to talk with Consul backend","@module":"storage.consul","@timestamp":"2019-10-31T14:21:13.125820Z","error":"Unexpected response code: 500 (Unknown check \"vault:10.0.27.177:8200:vault-sealed-check\")"}
...

Consul logs:

...
2019/10/31 14:21:30 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:31 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:32 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:33 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:34 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
2019/10/31 14:21:35 [ERR] http: Request PUT /v1/agent/check/pass/vault:10.0.27.177:8200:vault-sealed-check?note=Vault+Unsealed, error: Unknown check "vault:10.0.27.177:8200:vault-sealed-check" from=10.0.27.186:57246
...

Vault:

compose:
- version: '3.7'
  secrets:
    vault_config.hcl:
      external: true      
  networks: 
    consul:
      external: true
    traefik:
      external: true
    vault:
      external: true
  services:
    server:
      image: vault:1.2.3
      command: server -config=/run/secrets/vault_config.hcl
      secrets:
        - vault_config.hcl
      networks:
        - consul
        - traefik
        - vault

Consul:

compose: 
- version: '3.7'
  secrets:
    consul_config.hcl:
      external: true
  networks:
    consul:
      external: true
    traefik:
      external: true
  services:
    server:
      image: consul:1.6.1
      networks:
        traefik:
        consul:
          aliases:
            - consul
      command: 'agent -config-file=/run/secrets/consul_config.hcl -rejoin'
      hostname: '{% raw %}{{ .Node.Hostname }}.consul.netsoc.co{% endraw %}'
      volumes:
        - /netsoc-neo/docker-data/consul:/consul/data
      environment:
        - CONSUL_BIND_INTERFACE=eth0
      secrets:
        - consul_config.hcl
      deploy:
        endpoint_mode: dnsrr # Needed to get cluster to not rely on pre-known IPs
        mode: global

Vault config:

ui = true
log_format = "json"
cluster_name = "main"

listener "tcp" {
    address = "0.0.0.0:8200"
    tls_disable = 1
}

storage "consul" {
    address = "consul:8500"
    path = "hashicorp-vault/"
    token = "{{ consul_vault_token }}"
}

I have the same issue with Consul 1.6.1 and Vault 1.2.3.

2019/11/11 22:47:11 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical
2019/11/11 22:47:30 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical
2019/11/11 22:48:18 [WARN] agent: Check "vault:127.0.0.1:8200:vault-sealed-check" missed TTL, is now critical

Same issue here consul 1.5 and vault 1.2

Solved my problem by deploying a single consul client agent outside of swarm, one per host, and having a cluster of consul server agents inside swarm. I have two networks(ish), one for the consul server instances and one for the consul clients (one per host, so in effect, n+1 networks where n have the same name). Services are added to the local consul client network and register with consul through there rather than being added to the consul server network

Any updates on why this is happening? (consul 1.6.2 here)

can confirm with consul v1.4.0 and vault 1.3.1

same issue

I'm also seeing this issue when using the Vault and Consul helm charts from the Hashicorp repo.

Yup, me too.
Vault: 1.3.2
Consul: 1.6.2

Yup, me too.
Vault: 1.3.2
Consul: 1.6.2

the same =(

The warning logs are only on the standby vault pod. active vault pod does not have these warn logs. vault 1.3.1 and consul 1.5.3

vault 1.3.2 and consul 1.6.2 and it's happening on all three nodes (one active and two standby)

I am also having the same issue. Any updates on thjis?

I ended up finding a solution for my case. I had 12 dead checks but the current active nodes were passing. I decided to take an outage window and completely de-register the vault service from consul (if you are using the vault consul k/v store, this will stay in tact).

Steps to solve the problem in my situation:

  1. Stop Vault on all nodes
  2. Deregister services (see script below) - # I had to run this several times
  3. Start vault and unseal vault

script:

#!/usr/bin/env bash

consul_url="https://consul.service.aws.prd:8501/v1/catalog/service/vault"
vault_service_ids=$(curl -s -k $consul_url | jq -r '.[] | .ServiceID')
consul_deregister_command="consul services deregister -id="

for id in $vault_service_ids
do
    echo "$consul_deregister_command$id"
    # UNCOMMENT THIS LINE IF YOU WANT TO REALLY WANT TO DEREGISTER THE VAULT SERVICES
    # $consul_deregister_command$id
done

image

but do we know what's the reason these error message ? because the services are also listed under failed service checks.

I've encountered this issue in k8s with consul and vault and believe I have a working solution. The documentation suggests that Vault should always communicate with a local consul agent and not directly to the server. I think the issue is that vault is looking for a consul agent locally (local to the node) and not finding one. This would explain the sporadic nature of the error. If Vault pods landed on a node with consul, great! if not the issue would appear.

To fix this I added some affinity to both vault and consul. Node affinity and pod affinity such that my vault and consul pods would always be on the same nodes.

In the vault chart this is a working configuration... depending on your specific environment labeling.

  affinity: |
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/node-role
            operator: In
            values:
            - management
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: {{ template "vault.name" . }}
              app.kubernetes.io/instance: "{{ .Release.Name }}"
              component: server
          topologyKey: kubernetes.io/hostname
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              component: consul
          topologyKey: kubernetes.io/hostname

EDIT:
Following up on this comment. If you happen to be running vault with the consul chart from the helm/stable repo you'll likely run into this error. The helm/stable chart is a simple standalone server implementation of consul. It does not include configuration for running consul in agent mode.
The official Hashicorp chart of consul does include an agent daemonset.
A full fix for my case was to re-deploy consul with the server statefulset and agent daemonset. Once that is working, deploy vault with a configuration pointing to the daemonset in the connection config.

storage "consul" {
      path = "vault"
      address = "HOST_IP:8500"
    }

Can confirm @jdeprin

After changing the config for vault to use the local consul-agent I stop getting vault-sealed-check and the state is displayed correctly at consul.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

atomantic picture atomantic  ·  4Comments

wing731 picture wing731  ·  3Comments

eshujiushiwo picture eshujiushiwo  ·  3Comments

satheeshCharles picture satheeshCharles  ·  3Comments

wargamez picture wargamez  ·  4Comments