Nomad: [question] [enhancement] Canary tagging in Nomad 0.6.0

Created on 27 Jul 2017  路  12Comments  路  Source: hashicorp/nomad

Canaries are now a first class thing in the new update stanza and while within nomad is it possible to check what allocations are in which version, this does not propagate to the service registration in consul.

How can the canaries be filtered apart under the consul catalog? I was hopping for them to have a "canary" tag or something. But I could not find anything in the service catalog that allows me to filter out the canary.

If there is a recommended way of doing this, maybe it could mentioned in the docs, if there is not here is a suggestion:

Maybe nomad could add a pre-defined "canary" tag for the services in the canary state, and remove it after promotion. Or that tag could be configured with a new canary_tag keyword in the update stanza.

This would help tremendously marking the canary instances in the gateways in order to adjust it's weight in the pool or only allow access from a non-public endpoint

stagthinking themdeployments themdiscovery typenhancement

Most helpful comment

Just to update folks, we will be tackling this in the near term. It will hopefully be in a 0.8.X release and at latest in 0.9 (worst case).

All 12 comments

we're just adding tag with v:${VERSION} to distinguish between them
Cause your internal application version has much more information, than nomad task version ;)

We were already doing adding a version tag in our nomad templates. But the use case I'm describing it to be able to tell what instances are canaries regardless of the service version.

If you deploy a new nomad job version using with a canary update strategy but using with the same service artefact (let's say you just changed job parameters, like resources, ports, etc) a version tag will be ambiguous, and yet these changes can dictate the success of failure of the deployment. I agree that is this a stretched corner case, but it's just for sake of example.

Having a well known tag, like canary of a custom static one, during the deployment stage as per nomad definition: "the state between to jobs version" will make it much easier to filter it from the catalog in a more generic way. This tag would be obviously removed after the promote.

According to https://www.nomadproject.io/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html the canary parameter is meant to support both Canary deploys (inserting a single upgraded member into a cluster) and blue/green deployments (directing a select part of your load to a separate cluster). I don't see how to achieve the second through Consul given that the canaries show up under any services declared by the tasks, effectively becoming indistinguishable members of the cluster.

Being able to point your staging frontend to a DNS name likecanary.backend.service.consul. exported by the consul DNS interface would be really awesome but it does not give us proper blue/green deployments (without writing scripts), because for that we would need a mechanism to instruct the baseline/production load balancer to direct traffic only to members that don't have the canary tag, something e.g. fabio or the Consul DNS interface does not support.

One idea would be to have a CanaryServiceName config within the service declaration that would be used during the canary transition period instead of the normal service name. You could also have an OnCanaries yes/no directive that decided whether canaries joins this service; you could then set up canary and non-canary service pairs. Another idea would be to have an TagBlacklist directive in the update stanza which would make sure the canaries does not get a specific tag (e.g. production) so that we can use that tag to identify the baseline members.

For what it's worth, even having a rudimentary ability to add a canary tag to Consul service registrations while they are in Canary/Blue-Green deployment mode will go a LONG way. It will then becomes possible to add prepared queries to Consul to allow for various scenarios to filter out or not those services and have different DNS routes, or do service lookups against Consul API from proxies like Traefik, Fabio or LinkerD, etc.

Looking at Nomad code it also seems fairly straight forward to add the extra tag during the deployment start and remove it upon promote.

Just a thought; could canary deployments during upgrade get a feature of connection draining to services rather than the host.

What this could do is have a safe manner of upgrade which is transparent to the user, and possibly help the user to achieve the right thing.

Hi, @dagar looking how straight forward to add and remove the canary can we add this feature in the 0.8 release ? I can raise an pull request if it ok to add this feature .

Just to update folks, we will be tackling this in the near term. It will hopefully be in a 0.8.X release and at latest in 0.9 (worst case).

While you're thinking about this, please also consider changing the alloc index. If I now add one canary to my job, it gets index 0. So there's 2 instances of index 0. This seriously screws up metrics. Another thought: since we're not replacing exiting instances, but adding new ones, the overall load on the job gets less (as there's more instances). If you are trying to do performance tuning, this is quite annoying.

I've confirmed this is still an open issue for my hashi-ui.nomad job that "Requires Promotion"

Running

  • Consul v1.4.0-rc1 (1757fbc0a)
  • Nomad v0.8.6 (ab54ebcfcde062e9482558b7c052702d4cb8aa1b+CHANGES)
  • traffik v1.7.3

Using

        canary_tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:canary-hashiui.localhost",
        ]

        tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:hashiui.localhost",
        ]

where my http://consul.localhost/ui/dc1/services/hashi-ui has all three allocation tagged with both traefik.frontend.rule. IMHO only the canary running service should have the canary tag and nothing else.
Nor should the original two allocation have any new canary tag added.

Now http://canary-hashiui.localhost/ works because it is the final route rule defined as
Host:canary-hashiui.localhost for all three instances. And http://hashiui.localhost/ no longer works.

Latest Deployment
ID          = 2a2de0cb
Status      = running
Description = Deployment is running but requires promotion

Deployed
Task Group  Auto Revert  Promoted  Desired  Canaries  Placed  Healthy  Unhealthy
server      true         false     2        1         1       1        0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
5ce80ae4  c24de578  server      7        run      running   30s ago     11s ago
ad129abe  c24de578  server      6        stop     complete  2m46s ago   25s ago
f4898f13  c24de578  server      4        stop     complete  24m58s ago  18m42s ago
7c134bfb  c24de578  server      3        stop     complete  26m5s ago   24m53s ago
7e6a67c9  c24de578  server      5        run      running   2h18m ago   18m31s ago
e862eae4  c24de578  server      5        run      running   2h18m ago   18m31s ago

I would have reopened https://github.com/hashicorp/nomad/issues/3340 but it was closed since it was marked as a duplicate of this ticket.

maybe it would have to have canary_name as well as canary_tag
this way I can redefine the service name=canary-hashi
and hashi-ui.service.consul would not be improperly altered

nvm ... ignore my comments, I figured it out.

      service {
        port = "http"
        name = "canary-hashi-ui"
        canary_tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:canary-hashiui.localhost",
        ]
        tags = []
      }

      service {
        port = "http"
        name = "hashi-ui"
        tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:hashiui.localhost",
        ]
      }
Was this page helpful?
0 / 5 - 0 ratings

Related issues

clinta picture clinta  路  3Comments

bdclark picture bdclark  路  3Comments

mancusogmu picture mancusogmu  路  3Comments

jippi picture jippi  路  3Comments

stongo picture stongo  路  3Comments