I am deploying new server cluster on azure, using virtual machine scale set, with 3 server nodes according to the documentation (Hashicorp Learn Guide), cloud auto join with scale set setted, gossip encryption, TLS encryption, everything done! My servers are up and running.
Additionally i am trying to run a client agent with auto_encrypt.tls = true, but i am facing problems.
When the client starts, the following error is being displayed:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Starting Consul agent...
Aug 26 20:34:44 consul-ui consul[19188]: Version: 'v1.5.3'
Aug 26 20:34:44 consul-ui consul[19188]: Node ID: '7585ef50-fba4-4aca-1fd1-30b8561dcab3'
Aug 26 20:34:44 consul-ui consul[19188]: Node name: 'consul-ui'
Aug 26 20:34:44 consul-ui consul[19188]: Datacenter: 'dc1' (Segment: '')
Aug 26 20:34:44 consul-ui consul[19188]: Server: false (Bootstrap: false)
Aug 26 20:34:44 consul-ui consul[19188]: Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: -1, DNS: 8600)
Aug 26 20:34:44 consul-ui consul[19188]: Cluster Addr: 10.1.2.4 (LAN: 8301, WAN: 8302)
Aug 26 20:34:44 consul-ui consul[19188]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: true, Auto-Encrypt-TLS: true
Aug 26 20:34:44 consul-ui consul[19188]: ==> Log data will now stream in as it occurs:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Error starting agent: VerifyIncoming set, and no Cert/Key pair provided!
Aug 26 20:34:44 consul-ui consul[19188]: 2019/08/26 20:34:44 [INFO] agent: Exit code: 1
Aug 26 20:34:44 consul-ui consul[19188]: agent: Exit code: 1
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
Aug 26 20:34:44 consul-ui systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Start request repeated too quickly.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
Important to note that verify_incoming, verify_outgoing setted to false and ports.http setted to 8500 on client configuration, the client run successfully.
Steps to reproduce this issue, eg:
Client Configuration
server = false
datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "ommited"
retry_join = ["provider=azure subscription_id=ommited tenant_id=ommited client_id=ommited secret_access_key=ommited resource_group=consul vm_scale_set=consul"]
acl = {
tokens = {
agent = "ommited"
}
}
enable_syslog = true
leave_on_terminate = true
log_level = "INFO"
verify_incoming = true
#verify_outgoing = false
ca_file = "/etc/consul.d/consul-agent-ca.pem"
ports = {
http = -1
https = 8501
}
auto_encrypt = {
tls = true
}
ui = true
client_addr = "0.0.0.0"
enable_script_checks = false
disable_remote_exec = true
Client folder files (/etc/consul.d)
-rw-rw-r-- 1 consul consul 1245 Aug 26 14:55 consul-agent-ca.pem
-rw-r----- 1 consul consul 785 Aug 26 20:34 consul.hcl
-rw-rw-r-- 1 azure-user azure-user 227 Aug 26 15:12 dc1-cli-consul-0-key.pem
-rw-rw-r-- 1 azure-user azure-user 1082 Aug 26 15:12 dc1-cli-consul-0.pem
Server Configuration (the same on 3 nodes)
#consul.hcl
datacenter = "dc1"
data_dir = "/datadisks/disk1/consul"
encrypt = "ommited"
retry_join = ["provider=azure subscription_id=ommited tenant_id=ommited client_id=ommited secret_access_key=ommited resource_group=consul vm_scale_set=consul"]
performance {
raft_multiplier = 1
}
#server.hcl
server = true
bootstrap_expect = 3
log_level = "INFO"
# agent.hcl
acl = {
enabled = true
default_policy = "deny"
enable_token_persistence = true
tokens = {
agent = "ommited"
}
}
# tls.hcl
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true
auto_encrypt = {
allow_tls = true
}
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/dc1-server-consul-0.pem"
key_file = "/etc/consul.d/dc1-server-consul-0-key.pem"
ports = {
http = -1,
https = 8501
}
Server folder files (/etc/consul.d)
-rw-r----- 1 consul consul 169 Aug 23 21:52 agent.hcl
-rw-rw-r-- 1 consul consul 1245 Aug 23 22:45 consul-agent-ca.pem
-rw-r----- 1 consul consul 407 Aug 23 20:54 consul.hcl
-rw-rw-r-- 1 azure-user azure-user 227 Aug 23 23:33 dc1-cli-consul-0-key.pem
-rw-rw-r-- 1 azure-user azure-user 1078 Aug 23 23:33 dc1-cli-consul-0.pem
-rw-r----- 1 consul consul 227 Aug 23 22:47 dc1-server-consul-0-key.pem
-rw-r----- 1 consul consul 1139 Aug 23 22:47 dc1-server-consul-0.pem
-rw-r----- 1 consul consul 54 Aug 23 20:53 server.hcl
-rw-r----- 1 consul consul 313 Aug 26 19:47 tls.hcl
consul info (server)
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = a42ded47
version = 1.5.3
consul:
acl = enabled
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 10.1.0.4:8300
server = true
raft:
applied_index = 31765
commit_index = 31765
fsm_pending = 0
last_contact = 0
last_log_index = 31765
last_log_term = 64
last_snapshot_index = 16825
last_snapshot_term = 11
latest_configuration = [{Suffrage:Voter ID:6b29900f-bbc2-95eb-6a17-629d74c5c487 Address:10.1.0.4:8300} {Suffrage:Voter ID:41fb9c98-7695-76b2-bf25-9658fb806ae0 Address:10.1.0.6:8300} {Suffrage:Voter ID:62e57d8a-0d74-21d3-de58-9b35a91a0827 Address:10.1.0.7:8300}]
latest_configuration_index = 31420
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 64
runtime:
arch = amd64
cpu_count = 2
goroutines = 110
max_procs = 2
os = linux
version = go1.12.1
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 16
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 76
members = 3
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 38
members = 3
query_queue = 0
query_time = 1
Azure Virtual Machine Scale Set, Ubuntu 18.04 LTS
Aug 26 20:34:44 consul-ui consul[19188]: ==> Starting Consul agent...
Aug 26 20:34:44 consul-ui consul[19188]: Version: 'v1.5.3'
Aug 26 20:34:44 consul-ui consul[19188]: Node ID: '7585ef50-fba4-4aca-1fd1-30b8561dcab3'
Aug 26 20:34:44 consul-ui consul[19188]: Node name: 'consul-ui'
Aug 26 20:34:44 consul-ui consul[19188]: Datacenter: 'dc1' (Segment: '')
Aug 26 20:34:44 consul-ui consul[19188]: Server: false (Bootstrap: false)
Aug 26 20:34:44 consul-ui consul[19188]: Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: -1, DNS: 8600)
Aug 26 20:34:44 consul-ui consul[19188]: Cluster Addr: 10.1.2.4 (LAN: 8301, WAN: 8302)
Aug 26 20:34:44 consul-ui consul[19188]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: true, Auto-Encrypt-TLS: true
Aug 26 20:34:44 consul-ui consul[19188]: ==> Log data will now stream in as it occurs:
Aug 26 20:34:44 consul-ui consul[19188]: ==> Error starting agent: VerifyIncoming set, and no Cert/Key pair provided!
Aug 26 20:34:44 consul-ui consul[19188]: 2019/08/26 20:34:44 [INFO] agent: Exit code: 1
Aug 26 20:34:44 consul-ui consul[19188]: agent: Exit code: 1
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
Aug 26 20:34:44 consul-ui systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Start request repeated too quickly.
Aug 26 20:34:44 consul-ui systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 26 20:34:44 consul-ui systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
@luanbon thanks for reporting. I think this is a bug - it should work since you provided an extra CA that could be used to verify the connection. I will look into it.
I can confirm this. Same issue on the 1.6.0 version
Got the same bug in 1.6.0 release.
This seems to be related with verify_incoming in the configuration of the client because at boot the agent doesn't have a certificate (see https://github.com/hashicorp/consul/blob/master/tlsutil/config.go#L329).
A workaround can be to set verify_incoming: false in the client configuration. My configuration is:
{
"verify_server_hostname": true,
"ca_file": "/etc/consul.d/consul-agent-ca.pem",
"ports": {
"http": -1,
"https": 8501
},
"auto_encrypt": {
"tls": true
},
"connect": {
"enabled": true,
"ca_config": {
"private_key_type": "ec",
"private_key_bits": 256
}
}
}
Note: the connect stanza is a workaround until 1.6.1 is released, see #6391
{
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"ca_file": "/etc/consul.d/consul-agent-ca.pem",
"cert_file": "/etc/consul.d/dc1-server-consul-0.pem",
"key_file": "/etc/consul.d/dc1-server-consul-0-key.pem",
"ports": {
"http": -1,
"https": 8501
},
"auto_encrypt": {
"allow_tls": true
}
}
From my understanding of the encryption doc, there is no point to set "verify_incoming": true in a client's configuration anyway because this check is only performed on servers, right ?
Or maybe it can introduce some vuln if you aren't using ACL for example if someone is trying to call the API on the agent without a valid TLS cert ?
From my understanding of the encryption doc, there is no point to set
"verify_incoming": truein a client's configuration anyway because this check is only performed on servers, right ?
thats not correct, clients also need verify_incoming for their api endpoints. they are insecure otherwise.
Confirmed in 1.6.1
@i0rek Can you confirm that this fix has been included in consul enterprise pro v1.6.1? Or point me to someone who can? I'm still receiving the error described by @luanbon . Thanks!
Same issue here in 1.6.1
Thanks for the patience everybody. I have made up my mind on how to approach this issue now.
This issues is not a bug, contrary to what I thought before, it is exactly how it is supposed to work. verify_incoming enforces a TLS connection which cannot be established because there is a CA but no cert.
There is a related PR https://github.com/hashicorp/consul/pull/6489 which configures auto_encrypt certs for listeners on clients as well. This will enable setting up (insecure) HTTPS connections to the client's https endpoint.
The missing piece here is the ability to export auto_encrypt certs which can then be used to query client https endpoints with auto_encrypt certs. Only then it makes sense to enable verify_incoming_https. Because right now it clearly never works, because there is no way to export such cert. Corresponding issue: https://github.com/hashicorp/consul/issues/6791.
Do you have any thoughts or questions? Would that work for you?
I created another PR for this: https://github.com/hashicorp/consul/pull/6811 which also has the doc changes you rightfully mentioned.
And I would like to ask everyone to go to #6811 and tell me about your use case for verify_incoming on consul clients, because we were wondering in which cases it is necessary to turn that on. Apart from the reason that it you were told so by the docs.
Thanks!
Closing now. Feel free to chime in on #6811 or create a new issue if there is something you would like us to address/consider.
Hey there,
This issue has been automatically locked because it is closed and there hasn't been any activity for at least _30_ days.
If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.
Most helpful comment
I can confirm this. Same issue on the 1.6.0 version