Nomad v0.10.1 (829f9af35c77d564b3dab74454eeba9bf25e2df8)
Linux
When running nomad job run or nomad job plan, there appears to be a validation step that checks for duplicate Consul service names. The problem is this check doesn't appear to be expanding placeholders such as NOMAD_TASK_NAME resulting in false positives about duplicates.
This is a regression from previous versions, although I'm not 100% sure which version the regression appeared in.
Run run a job plan on the provided job file
job "test" {
datacenters = ["test"]
type = "service"
group "group" {
task "task1" {
driver = "docker"
config {
image = "docker.elastic.co/elasticsearch/elasticsearch:7.4.2"
port_map {
http = 9200
}
}
env {
discovery.type = "single-node"
TAKE_FILE_OWNERSHIP = "yes"
}
resources {
cpu = 500
memory = 2048
network {
port "http" {}
}
}
service {
name = "${NOMAD_JOB_NAME}-${NOMAD_TASK_NAME}"
port = "http"
address_mode = "host"
check {
name = "health"
type = "http"
path = "/"
interval = "5s"
timeout = "2s"
}
}
}
task "task2" {
driver = "docker"
config {
image = "docker.elastic.co/elasticsearch/elasticsearch:7.4.2"
port_map {
http = 9200
}
}
env {
discovery.type = "single-node"
TAKE_FILE_OWNERSHIP = "yes"
}
resources {
cpu = 500
memory = 2048
network {
port "http" {}
}
}
service {
name = "${NOMAD_JOB_NAME}-${NOMAD_TASK_NAME}"
port = "http"
address_mode = "host"
check {
name = "health"
type = "http"
path = "/"
interval = "5s"
timeout = "2s"
}
}
}
}
}
$ nomad job plan test.nomad
Error during plan: Unexpected response code: 500 (1 error(s) occurred:
* Task group group validation failed: 1 error(s) occurred:
* Task group service validation failed: 1 error(s) occurred:
* Service ${NOMAD_JOB_NAME}-${NOMAD_TASK_NAME} is duplicate)
$
Hi @nvx and thanks for this report! It looks like I may have introduced this regression in c4a45a6bbcb7b036062bf2796d6fe17cf7da74a0 when we added validation for Task Group services. We're checking for collisions between the Task and it's Task Group, but none of the tasks have had their environment interpolated yet. This is exposing an existing bug where we were checking for collisions within a task but not between different tasks in a job.
I don't think we have the task environment at this stage of validation to do the interpolation, so the solution may need to be that we allow for possible collisions across tasks at the validation stage and let it bubble up as errors on service registration. I'll look into that.
Makes sense.
Is it actually invalid to have multiple tasks registering the same service name in Consul though? Logically it might be a bit strange (I'd expect it more across different task groups, or when the group has a count > 1) but I don't think Consul would have an issue with it?
Strictly speaking, no it's not invalid to have multiple tasks registering the same service name in Consul. But it'd be a very unusual case where someone would intentionally name multiple services the same name within a task group (and it's a common configuration error), so we validate for it.
Now that I've had a chance to work through #6836, which turns out not to be related, I wanted to circle back to the UX of this. The task environment doesn't exist at the time we do the validation. In c4a45a6bbcb7b036062bf2796d6fe17cf7da74a0 we describe the validation as:
// validateServices runs Service.Validate() on group-level services,
// checks that group services do not conflict with task services and that
// group service checks that refer to tasks only refer to tasks that exist.
I'm wondering if it makes any sense for us to check for name conflicts; while it's a _weird_ case, it's not strictly an error and it's something Consul can manage just fine with so long as you've got the right checks. Maybe we should drop that part of the validation and only check that the group service checks refer to tasks that exist. @schmichael any thoughts here?
I would like to chime in on this and say that the service name collision check between tasks in the same group should be removed. It's a hindrance and I've just come upon a use case where I would want multiple tasks in the same group registering separate instances of something as the same service name.
I also have a use case where registering a variant of a service in tasks within the group is required.
In my case running a Redis Cluster with a master and slave on the same node. Both having different ports, different tags and different host_volumes.
Only when I ran the final version of the job file i found it blocked by the validation. As @tgross indicates, if this isn't an error could the validation be removed?
Running into a similar case as @davidatkinsondoyle with related to same service name, different tags. I also think the validation should be removed.
If you move them out of the group, there is no validation.
I should have followed up here. We are going to remove the duplicate check, but I haven't had a chance to make the change yet.
Most helpful comment
I should have followed up here. We are going to remove the duplicate check, but I haven't had a chance to make the change yet.