Nomad v0.7.0
Consul v1.0.1 (likely relevant)
CentOS 7.4
Nomad is unable to deregister services that have complex tags, such as those used by fabio.
Nomad actually doesn't produce any relevant log messages for some reason, but consul does, and I included that below.
Run the job below, and then try to stop it. Nomad fails to deregister the service. This apparently blocks all other service registrations as well.
job "test" {
datacenters = ["us-west-2"]
type = "service"
group "test" {
count = 1
task "test" {
driver = "docker"
config {
image = "nginx"
port_map { http = 80 }
}
service {
name = "test"
tags = [
"public-test.ettaviation.com:80/ redirect=302,https://test.ettaviation.com",
"public-test.ettaviation.com:443/"
]
port = "http"
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
resources {
cpu = 100
memory = 10
network {
mbits = 1
port "http" {}
}
}
} #/task test
} #/group test
} #/job test
Dec 5 19:36:01 ip-10-25-20-137 consul: 2017/12/05 19:36:01 [ERR] http: Request GET /v1/agent/service/deregister/_nomad-executor-44ef9b22-0f24-b414-69b2-c23324197ce5-test-test-public-test.example.com:80/%20redirect=302,https:/test.example.com-public-test.example.com:443/, error: method GET not allowed from=127.0.0.1:42046
What I find odd about the log line above is that it is a GET request, not a PUT which is required by the /agent/service/deregister endpoint. Also I just now noticed that the double-slash after https: became a single slash. Maybe the service ID needs to be URL encoded?
I've narrowed the issue down to any tag containing http:// or https:// causing Consul to return a redirect to the same URL with only 1 slash: http:/ or https:/. Nomad's HTTP client changes the PUT to a GET on redirects, hence the error message.
Consul prior to v1.0.0 has the same behavior but does not error on GET deregistration requests, so Nomad would just try to deregister an invalid service (due to the missing / on redirect).
URL encoding is definitely the answer. Working on a fix.
tl;dr - We need to path escape service IDs, but I'm not sure we can do it in a backward compatible way.
Did some more digging and discovered the root cause: Go's ServerMux calls path.Clean on incoming request paths which normalizes foo//bar to foo/bar.
You can download and go run ... this file for a reproducer.
I was all ready to file a bug with Go proper to stop normalizing // to / in URLs, but then I read what I think is the best spec for URLs: https://url.spec.whatwg.org/#url-serializing URLs are equivalent if their serialized forms are equivalent. Therefore it seems appropriate for Go to serialize incoming paths. This definition of URL paths makes me think eliding consecutive //s in paths is valid:
A URL鈥檚 path is a list of zero or more ASCII strings holding data (emphasis mine)
If I'm parsing the spec correctly it's totally appropriate to elide empty path segments (the empty string between two slashes), and therefore the second slash would get dropped just as Go is doing.
Back to the escaping option!
If the goal of Nomad passing these tags through to Consul is to make them available in DNS queries, maybe Nomad should enforce RFC-1035? Consul accepts tags that don't confirm to the spec but emits a warning that DNS functionality won't work.
@preetapan Tags are used to configure Fabio routing rules, so we can't really introduce any restrictions on their format.
Nomad does control the Service ID format. The problem is that we tack a concatenated list of tags onto the ID verbatim to ensure ID uniqueness. Even with proper escaping you can easily end up with unreasonably large Service IDs if you're using Fabio, so we should probably consider changing our Service ID format to use a hash of tags.
I like the idea of using a hash of the tags in the service ID instead of tacking them on verbatim. Services could have a large number and variety of tags for as-yet unknown reasons. Using a hash future-proofs it against the unknown.
@ctlajoie We determined that having two slashes, even with URL encoding causes a redirect due to the default behavior of ServerMux like @schmichael mentions above, combined with the fact that the request path in the url is already decoded when it sees // in the URL.
This means that besides the fix in PR#3632 to generate a hashed id to prevent this in the future, you will also have to shut down the agent and manually delete the service file that contains the tags mentioned above. You can find the service file in a subdirectory named services inside your agent's data-dir. There is one file per service with a 32 character randomly generated file name. Find the files corresponding to this service and delete it and restart the agent, after that that service will also be removed from the catalog.
We are sorry for the inconvenience, let us know if you were able to manually delete the file.
@preetapan Thanks! No worries about the inconvenience.
@schmichael Thanks dude
@preetapan Just out of curiosity: nomad is interpreting the tags?
@magiconair nomad only uses the tags to generate a unique service ID. For example the service ID for my example above would be (before the fix) _nomad-executor-44ef9b22-0f24-b414-69b2-c23324197ce5-test-test-public-test.example.com:80/ redirect=302,https://test.example.com-public-test.example.com:443/.
After the fix, service IDs look more like this: _nomad-task-WYN2WJ6K246WAJLRFUJEKLFLFOW4TL2U
@ctlajoie is correct (although in 0.7.1-rc1 we lowercased the service IDs. Just a UX adjustment. No functional change)