Nomad: Nomad with Consul TLS not working - bad certificate

Created on 17 Nov 2016  Â·  7Comments  Â·  Source: hashicorp/nomad

Nomad version

Nomad v0.5.0

Operating system and Environment details

Ubuntu 14.04.5 LTS

Issue

I can't get nomad to communicate with consul over https. It works just fine using http.

Reproduction steps

With following config:

consul {
  address = "127.0.0.1:8550" # I have setup consul to listen on 0.0.0.0:8550 for https
  ssl = true

  ca_file = "/etc/ssl/ssc/ca.crt"
  cert_file = "/etc/ssl/ssc/consul.crt"
  key_file = "/etc/ssl/ssc/consul.key"
}

I get following error:

2016/11/17 12:13:44.658049 [DEBUG] consul.syncer: error in syncing: 1 error(s) occurred:

* server.consul: unable to query Consul datacenters: Get https://127.0.0.1:8550/v1/catalog/datacenters: remote error: tls: bad certificate

It seems that certificates aren't handled properly. Fetching the same endpoint with CURL works just fine:

curl -v -L --key /etc/ssl/ssc/consul.key --cert /etc/ssl/ssc/consul.crt --cacert /etc/ssl/ssc/ca.crt https://127.0.0.1:8550/v1/catalog/datacenters

* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8550 (#0)
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/ssc/ca.crt
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Request CERT (13):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS handshake, CERT verify (15):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
*    subject: CN=*.consul; ST=London; C=GB; O=XXX
*    start date: 2016-11-15 15:51:00 GMT
*    expire date: 2026-11-13 15:51:00 GMT
*    subjectAltName: 127.0.0.1 matched
*    issuer: C=GB; ST=London; L=London; O=XXX; CN=SOMECA
*    SSL certificate verify ok.
> GET /v1/catalog/datacenters HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8550
> Accept: */*
>
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< Date: Thu, 17 Nov 2016 12:15:52 GMT
< Content-Length: 16
<
* Connection #0 to host 127.0.0.1 left intact
["eu-central-1"]

It seems that someone has already encountered the same issue but it wasn't reported here:
TLS on Nomad -> Consul

I will switch to http for now but I hope this gets resolved...

Thanks

themconfig themdiscovery themtls typbug

All 7 comments

Thanks for filing. Will try to replicate and then resolve this soon!

Have the exact same issue here with Nomad v0.4.0 and TLS enabled Consul v0.7.0.

My TLS configuration for consul includes all verify_incoming, verify_outgoing and verify_server_hostname set to true, and nomad configuration is essentially the same with the reported one. I am also able to connect to consul via curl with the same certificates that I supply to nomad.

Willing to share more info if needed!

Sorry our Nomad+Consul TLS story is quite confusing at the moment. I've been trying to sort it out and am taking notes here: https://gist.github.com/schmichael/7394eb8f2686af1a4434a2d64ae7b0f2

Option 1: Use HTTP for Nomad+Consul

I must admit that if you're not worried about rogue processes binding to localhost:8500 then simply using HTTP to communicate from Nomad to Consul is the fastest way to get things working. Since Nomad communicates with Consul over localhost the only protection TLS adds is ensuring the process listening on localhost:8500 is in fact Consul. Encryption rarely has any benefit on localhost.

Option 2: HTTPS Everywhere

First, I'd recommend upgrading Nomad to 0.5.3 (just released) to make sure you're not hitting a fixed bug.

Secondly, the cert_file and key_file configuration values are for mutual TLS authentication and shouldn't be the same certificate as Consul uses. They should be the same certificates as you'd use in Nomad's tls configuration section (and as you can see in my notes: we should probably make that automatic!).

Because Consul only cares that incoming connections use certificates signed by its CA, using the same certificate for Nomad and Consul will probably Just Work though...

...however there's another wrinkle that I think is a bug: if you have consul.verify_ssl=true (the default) in Nomad then it verifies Consul's certificate's names against the address you specify (127.0.0.1 in your case). So Consul's certificate has to include 127.0.0.1 in its Common Name or Alternative Names list.

If you paste the output of the following command we'll be able to verify if that's the issue:

openssl x509 -in /etc/ssl/ssc/consul.crt -noout -text

First of all I just upgraded nomad to v0.5.3 as you suggested and the error message changed:

2017/01/31 09:27:36.660753 [ERR] client.consul: error reaping services in consul: Get https://127.0.0.1:8500/v1/agent/services: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

and indeed my self-signed demo certificate does not include 127.0.0.1 in common name or alternative names list.

So, from here there are several solutions:

  • Setting nomad's consul.verify_ssl configuration to false

    • perfectly working but not desired

  • Setting nomad's consul.address to include FQDN (the common name certificate signed for) rather than 127.0.0.1

    • also works, but can complicate our cluster config

  • Having 127.0.0.1 in certificate alternate names (in addition to FQDN)

    • works as well, but complicates certificate generation a bit (found this blog post on how to do it)

    • however it is a one time operation, so this looks like the most promising

So, thanks for investigation! From there, documenting the behaviour would be enough from my point of view.

PS: my config only includes TLS enabled Consul, not Nomad (yet), so things may change on full HTTPS mode.

Mind posting what names your certificates have? Our intention across
products is to not sign certificates with hostnames but rather roles like
client.region1.nomad or server.dc1.consul. This is so certificates work
in dynamically scaled clusters with DNS based discovery without having to
generate per-node certificates.

On Jan 30, 2017 11:29 PM, "Ecem Unal" notifications@github.com wrote:

First of all I just upgraded nomad to v0.5.3 as you suggested and the
error message changed:

2017/01/31 09:27:36.660753 [ERR] client.consul: error reaping services in consul: Get https://127.0.0.1:8500/v1/agent/services: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

and indeed my self-signed demo certificate does not include 127.0.0.1 in
common name or alternative names list.

So, from here there are several solutions:

  • Setting nomad's consul.verify_ssl configuration to false

    • perfectly working but not desired

  • Setting nomad's consul.address to include FQDN (the common name
    certificate signed for) rather than 127.0.0.1

    • also works, but can complicate our cluster config

  • Having 127.0.0.1 in certificate alternate names (in addition to
    FQDN)

    • works as well, but complicates certificate generation a bit

      (found this blog post https://bowerstudios.com/node/1007 on how

      to do it)

    • however it is a one time operation, so this looks like the most

      promising

So, thanks for investigation! From there, documenting the behaviour would
be enough from my point of view.

PS: my config only includes TLS enabled Consul, not Nomad (yet), so things
may change on full HTTPS mode.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/nomad/issues/2002#issuecomment-276293549,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAG60inu-ddseHWLJ17wgTuEMhhMC0XLks5rXuLUgaJpZM4K1PVm
.

Closing this issue since Nomad's Consul support was dramatically refactored for the Nomad 0.6 release.

Please feel free to reopen with the output of openssl x509 -in /etc/ssl/ssc/consul.crt -noout -text (or at least the SANs from that output) if you're still having issues.

Not sure if my problem is related, but I get the error [ERROR] nomad: error looking up Nomad servers in Consul: error="server.nomad: unable to query Consul datacenters: Get https://127.0.0.1:8501/v1/catalog/datacenters: x509: certificate signed by unknown authority" in the nomad logs

connecting via cli works when I set ca-file (which is also set in nomad/vault), trying to connect with vault shows an error about Unexpected response code: 400

Was this page helpful?
0 / 5 - 0 ratings

Related issues

byronwolfman picture byronwolfman  Â·  3Comments

hynek picture hynek  Â·  3Comments

ashald picture ashald  Â·  3Comments

jippi picture jippi  Â·  3Comments

Smuerdt picture Smuerdt  Â·  3Comments