Hi,
Some context :
I am using Nomad 0.10.0 and Consul 1.6.1. Both Nomad and Consul are working with TLS and ACLs enabled.
I try to make my Nomad jobs running with Connect but in the logs I always have these error messages:
2019-10-30T20:34:42.894Z [ERROR] client.alloc_runner.task_runner.task_hook.envoy_bootstrap: error creating bootstrap configuration for Connect proxy sidecar: alloc_id=4660d74d-c834-9219-e8ee-c0fbd6911732 task=connect-proxy-test error="exit status 1" stderr="==> Failed looking up sidecar proxy info for _nomad-task-4660d74d-c834-9219-e8ee-c0fbd6911732-group-test_group-test-1313: Unexpected response code: 400 (Client sent an HTTP request to an HTTPS server.
Then trying to understand more, I noticed Nomad runs this process without success
consul connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr endpoint.local.compuscene.net:8500 -bootstrap -sidecar-for _nomad-task-4660d74d-c834-9219-e8ee-c0fbd6911732-group-test_group-test-131
This doen't work too with exactly the same error message.
But if I put https:// before endpoint.local.compuscene.net:8500 this command works nice.
It seems Nomad doesn't take care about it's configuration, and in particular the ssl=true option :
"consul": {
"address": "endpoint.local.compuscene.net:8500",
"auto_advertise": true,
"checks_use_advertise": true,
"ssl": true,
"token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
Moreover when I dig in the Nomad code, I see no reference to the Consul ssl option when creating Connect classes. Only the address is used.
I don't know if this is clear. If you have any question don't hesitate to ask me more if needed.
Vincent
What I coincidence. I was about to create a ticket for this since I'm also running into the same issue. Like @vvanholl says: Nomad currently assumes the local Consul agent is available over plain HTTP. Our configuration has TLS enabled on the Consul clients and Consul servers and we don't expose a plain HTTP endpoint on the Consul agent.
The problem is Nomad start the Consul Envoy proxy without any HTTP flags: https://github.com/hashicorp/nomad/blob/master/client/allocrunner/taskrunner/envoybootstrap_hook.go#L89
Therefore the Consul proxy fails to connect to the local Consul agent: https://github.com/hashicorp/consul/blob/cc9a6f79934a6da58b7aec63c057681d82aded5a/command/connect/proxy/proxy.go#L221
What Nomad should do is grab the Consul client configuration (the consul stanza in the Nomad config) and pass this (the TLS settings) along when starting the Consul proxy binary. The latter already accepts these settings.
Thanks for reporting this @vvanholl and @rkettelerij !
As of right now Consul ACL support is one of the known limitations of our implementation but is in the works. For TLS, I do see that we have an open issue for testing that properly (https://github.com/hashicorp/nomad/issues/6502) but this looks like a bug in how we look up the Consul address.
There is a workaround in the short term that _could be used_. You can provide the necessary consul values as environment variables in your init script/systemd unit. I was able to work around this by adding the following values to the Nomad systemd unit on my nomad client.
Environment="CONSUL_HTTP_SSL=true"
Environment="CONSUL_CACERT=/path/to/cacert.pem"
Environment="CONSUL_CLIENT_CERT=/path/to/clientcert.pem"
Environment="CONSUL_CLIENT_KEY=/path/to/clientkey.pem"
replacing the paths above with paths to your actual certificates.
There is still an issue with Nomad consul connect jobs when Consul has TLS enabled
https://github.com/hashicorp/nomad/issues/7715
this are my environments vars
export DATACENTER=dc1
export VAULT_CACERT=/var/vault/config/ca.crt.pem
export VAULT_CLIENT_CERT=/var/vault/config/server.crt.pem
export VAULT_CLIENT_KEY=/var/vault/config/server.key.pem
export VAULT_ADDR=https://${HOST_IP}:8200
export NOMAD_ADDR=https://${HOST_IP}:4646
export NOMAD_CACERT=/var/vault/config/ca.crt.pem
export NOMAD_CLIENT_CERT=/var/vault/config/server.crt.pem
export NOMAD_CLIENT_KEY=/var/vault/config/server.key.pem
export CONSUL_SCHEME=https
export CONSUL_PORT=8500
export CONSUL_HTTP_ADDR=${CONSUL_SCHEME}://${HOST_IP}:${CONSUL_PORT}
export CONSUL_CACERT=/var/vault/config/ca.crt.pem
export CONSUL_CLIENT_CERT=/var/vault/config/server.crt.pem
export CONSUL_CLIENT_KEY=/var/vault/config/server.key.pem
export CONSUL_HTTP_SSL=true
I enabled TLS on consul and I am also seeing this problem. I've ensured that I have the following in /etc/sysconfig/nomad
Environment="CONSUL_HTTP_SSL=true"
Environment="CONSUL_CACERT=/path/to/cacert.pem"
Environment="CONSUL_CLIENT_CERT=/path/to/clientcert.pem"
Environment="CONSUL_CLIENT_KEY=/path/to/clientkey.pem"
I also have in my systemd unit file
[Service]
EnvironmentFile=-/etc/sysconfig/nomad
Nomad = 0.11.1
Consul = 1.7.2
@spuder, If you're talking about the deployment issue that Crizstian mentioned, I'd encourage you to head over to #7715 and chime in there. If you are experiencing something else, you might want to post a fresh issue.
An aside, as of Nomad 0.11 you do not need to provide the CONSUL SSL environment variables. That workaround is only necessary for Nomad 0.10.4
Most helpful comment
There is a workaround in the short term that _could be used_. You can provide the necessary consul values as environment variables in your init script/systemd unit. I was able to work around this by adding the following values to the Nomad systemd unit on my nomad client.
replacing the paths above with paths to your actual certificates.