Consul: Client agent not starting when auto_encrypt.tls enabled

Created on 26 Aug 2019  路  11Comments  路  Source: hashicorp/consul

Overview of the Issue

I am deploying new server cluster on azure, using virtual machine scale set, with 3 server nodes according to the documentation (Hashicorp Learn Guide), cloud auto join with scale set setted, gossip encryption, TLS encryption, everything done! My servers are up and running.

Additionally i am trying to run a client agent with auto_encrypt.tls = true, but i am facing problems.
When the client starts, the following error is being displayed:

Aug 26 20:34:44 consul-ui consul[19188]: ==> Starting Consul agent...
Aug 26 20:34:44 consul-ui consul[19188]: Version: 'v1.5.3'
Aug 26 20:34:44 consul-ui consul[19188]: Node ID: '7585ef50-fba4-4aca-1fd1-30b8561dcab3'
Aug 26 20:34:44 consul-ui consul[19188]: Node name: 'consul-ui'
Aug 26 20:34:44 consul-ui consul[19188]: Datacenter: 'dc1' (Segment: '')
Aug 26 20:34:44 consul-ui consul[19188]: Server: false (Bootstrap: false)
Aug 26 20:34:44 consul-ui consul[19188]: Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: -1, DNS: 8600)
Aug 26 20:34:44 consul-ui consul[19188]: Cluster Addr: 10.1.2.4 (LAN: 8301, WAN: 8302)
Aug 26 20:34:44 consul-ui consul[19188]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: true, Auto-Encrypt-TLS: true
Aug 26 20:34:44 consul-ui consul[19188]: ==> Log data will now stream in as it occurs:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Error starting agent: VerifyIncoming set, and no Cert/Key pair provided!
Aug 26 20:34:44 consul-ui consul[19188]: 2019/08/26 20:34:44 [INFO] agent: Exit code: 1
Aug 26 20:34:44 consul-ui consul[19188]: agent: Exit code: 1
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
Aug 26 20:34:44 consul-ui systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Start request repeated too quickly.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".

Important to note that verify_incoming, verify_outgoing setted to false and ports.http setted to 8500 on client configuration, the client run successfully.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a cluster with 1 client nodes and 3 server nodes
  2. Enable Gossip Encryption and RPC Communication with TLS
  3. Configure each server as bellow, each .hcl configuration is a different file. (consul, server, agent, tls)
  4. Configure client as bellow
  5. View error

Consul info / configuration for both Client and Server


Client Configuration

server = false
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "ommited"
retry_join = ["provider=azure subscription_id=ommited tenant_id=ommited client_id=ommited secret_access_key=ommited resource_group=consul vm_scale_set=consul"]
acl = {
    tokens = {
        agent = "ommited"
    }
}
enable_syslog = true
leave_on_terminate = true
log_level = "INFO"
verify_incoming = true
#verify_outgoing = false
ca_file = "/etc/consul.d/consul-agent-ca.pem"
ports = {
    http = -1
    https = 8501
}
auto_encrypt = {
    tls = true
}
ui = true
client_addr = "0.0.0.0"
enable_script_checks = false
disable_remote_exec = true


Client folder files (/etc/consul.d)

-rw-rw-r--  1 consul     consul     1245 Aug 26 14:55 consul-agent-ca.pem
-rw-r-----  1 consul     consul      785 Aug 26 20:34 consul.hcl
-rw-rw-r--  1 azure-user azure-user  227 Aug 26 15:12 dc1-cli-consul-0-key.pem
-rw-rw-r--  1 azure-user azure-user 1082 Aug 26 15:12 dc1-cli-consul-0.pem


Server Configuration (the same on 3 nodes)

#consul.hcl

datacenter = "dc1"
data_dir = "/datadisks/disk1/consul"
encrypt = "ommited"
retry_join = ["provider=azure subscription_id=ommited tenant_id=ommited client_id=ommited secret_access_key=ommited resource_group=consul vm_scale_set=consul"]
performance {
  raft_multiplier = 1
}

#server.hcl

server = true
bootstrap_expect = 3
log_level = "INFO"


# agent.hcl
acl = {
    enabled = true
    default_policy = "deny"
    enable_token_persistence = true
    tokens = {
        agent = "ommited"
    }
}

# tls.hcl
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
auto_encrypt = {
    allow_tls = true
}
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/dc1-server-consul-0.pem"
key_file = "/etc/consul.d/dc1-server-consul-0-key.pem"
ports = {
    http = -1,
    https = 8501
}


Server folder files (/etc/consul.d)

-rw-r-----  1 consul     consul      169 Aug 23 21:52 agent.hcl
-rw-rw-r--  1 consul     consul     1245 Aug 23 22:45 consul-agent-ca.pem
-rw-r-----  1 consul     consul      407 Aug 23 20:54 consul.hcl
-rw-rw-r--  1 azure-user azure-user  227 Aug 23 23:33 dc1-cli-consul-0-key.pem
-rw-rw-r--  1 azure-user azure-user 1078 Aug 23 23:33 dc1-cli-consul-0.pem
-rw-r-----  1 consul     consul      227 Aug 23 22:47 dc1-server-consul-0-key.pem
-rw-r-----  1 consul     consul     1139 Aug 23 22:47 dc1-server-consul-0.pem
-rw-r-----  1 consul     consul       54 Aug 23 20:53 server.hcl
-rw-r-----  1 consul     consul      313 Aug 26 19:47 tls.hcl


consul info (server)

agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = a42ded47
        version = 1.5.3
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = true
        leader_addr = 10.1.0.4:8300
        server = true
raft:
        applied_index = 31765
        commit_index = 31765
        fsm_pending = 0
        last_contact = 0
        last_log_index = 31765
        last_log_term = 64
        last_snapshot_index = 16825
        last_snapshot_term = 11
        latest_configuration = [{Suffrage:Voter ID:6b29900f-bbc2-95eb-6a17-629d74c5c487 Address:10.1.0.4:8300} {Suffrage:Voter ID:41fb9c98-7695-76b2-bf25-9658fb806ae0 Address:10.1.0.6:8300} {Suffrage:Voter ID:62e57d8a-0d74-21d3-de58-9b35a91a0827 Address:10.1.0.7:8300}]
        latest_configuration_index = 31420
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 64
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 110
        max_procs = 2
        os = linux
        version = go1.12.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 16
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 76
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 38
        members = 3
        query_queue = 0
        query_time = 1

Operating system and Environment details

Azure Virtual Machine Scale Set, Ubuntu 18.04 LTS

Log Fragments

Aug 26 20:34:44 consul-ui consul[19188]: ==> Starting Consul agent...
Aug 26 20:34:44 consul-ui consul[19188]: Version: 'v1.5.3'
Aug 26 20:34:44 consul-ui consul[19188]: Node ID: '7585ef50-fba4-4aca-1fd1-30b8561dcab3'
Aug 26 20:34:44 consul-ui consul[19188]: Node name: 'consul-ui'
Aug 26 20:34:44 consul-ui consul[19188]: Datacenter: 'dc1' (Segment: '')
Aug 26 20:34:44 consul-ui consul[19188]: Server: false (Bootstrap: false)
Aug 26 20:34:44 consul-ui consul[19188]: Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: -1, DNS: 8600)
Aug 26 20:34:44 consul-ui consul[19188]: Cluster Addr: 10.1.2.4 (LAN: 8301, WAN: 8302)
Aug 26 20:34:44 consul-ui consul[19188]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: true, Auto-Encrypt-TLS: true
Aug 26 20:34:44 consul-ui consul[19188]: ==> Log data will now stream in as it occurs:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Error starting agent: VerifyIncoming set, and no Cert/Key pair provided!
Aug 26 20:34:44 consul-ui consul[19188]: 2019/08/26 20:34:44 [INFO] agent: Exit code: 1
Aug 26 20:34:44 consul-ui consul[19188]: agent: Exit code: 1
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
Aug 26 20:34:44 consul-ui systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Start request repeated too quickly.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".

themtls

Most helpful comment

I can confirm this. Same issue on the 1.6.0 version

All 11 comments

@luanbon thanks for reporting. I think this is a bug - it should work since you provided an extra CA that could be used to verify the connection. I will look into it.

I can confirm this. Same issue on the 1.6.0 version

Got the same bug in 1.6.0 release.
This seems to be related with verify_incoming in the configuration of the client because at boot the agent doesn't have a certificate (see https://github.com/hashicorp/consul/blob/master/tlsutil/config.go#L329).

A workaround can be to set verify_incoming: false in the client configuration. My configuration is:

  • Client configuration:
{
  "verify_server_hostname": true,
  "ca_file": "/etc/consul.d/consul-agent-ca.pem",
  "ports": {
    "http": -1,
    "https": 8501
  },
  "auto_encrypt": {
    "tls": true
  },
  "connect": {
    "enabled": true,
    "ca_config": {
      "private_key_type": "ec",
      "private_key_bits": 256
    }
  }
}

Note: the connect stanza is a workaround until 1.6.1 is released, see #6391

  • Server configuration:
{
  "verify_incoming": true,
  "verify_outgoing": true,
  "verify_server_hostname": true,
  "ca_file": "/etc/consul.d/consul-agent-ca.pem",
  "cert_file": "/etc/consul.d/dc1-server-consul-0.pem",
  "key_file": "/etc/consul.d/dc1-server-consul-0-key.pem",
  "ports": {
    "http": -1,
    "https": 8501
  },
  "auto_encrypt": {
    "allow_tls": true
  }
}

From my understanding of the encryption doc, there is no point to set "verify_incoming": true in a client's configuration anyway because this check is only performed on servers, right ?
Or maybe it can introduce some vuln if you aren't using ACL for example if someone is trying to call the API on the agent without a valid TLS cert ?

From my understanding of the encryption doc, there is no point to set "verify_incoming": true in a client's configuration anyway because this check is only performed on servers, right ?

thats not correct, clients also need verify_incoming for their api endpoints. they are insecure otherwise.

Confirmed in 1.6.1

@i0rek Can you confirm that this fix has been included in consul enterprise pro v1.6.1? Or point me to someone who can? I'm still receiving the error described by @luanbon . Thanks!

Same issue here in 1.6.1

Thanks for the patience everybody. I have made up my mind on how to approach this issue now.

This issues is not a bug, contrary to what I thought before, it is exactly how it is supposed to work. verify_incoming enforces a TLS connection which cannot be established because there is a CA but no cert.

There is a related PR https://github.com/hashicorp/consul/pull/6489 which configures auto_encrypt certs for listeners on clients as well. This will enable setting up (insecure) HTTPS connections to the client's https endpoint.

The missing piece here is the ability to export auto_encrypt certs which can then be used to query client https endpoints with auto_encrypt certs. Only then it makes sense to enable verify_incoming_https. Because right now it clearly never works, because there is no way to export such cert. Corresponding issue: https://github.com/hashicorp/consul/issues/6791.

Do you have any thoughts or questions? Would that work for you?

I created another PR for this: https://github.com/hashicorp/consul/pull/6811 which also has the doc changes you rightfully mentioned.

And I would like to ask everyone to go to #6811 and tell me about your use case for verify_incoming on consul clients, because we were wondering in which cases it is necessary to turn that on. Apart from the reason that it you were told so by the docs.

Thanks!

Closing now. Feel free to chime in on #6811 or create a new issue if there is something you would like us to address/consider.

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least _30_ days.

If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.

Was this page helpful?
0 / 5 - 0 ratings