Consul: Dynamic tags applied like health checks.

Created on 19 Jun 2015 · 50Comments · Source: hashicorp/consul

In issue #867 I suggested an idea to make tags that depend on the result of scripts, just like health checks.

I run mongo in the cloud with multiple machines all spun up from the same image. On boot they will query for mongodb.service.consul and join the cluster. That all works flawlessly. In being a good Ops person I have a cron job that will kill random machines in my infrastructure at random times. It will eventually hit the mongodb master, the system will hiccup and a slave will be promoted automatically. Life is fantastic.

In comes Legacy Software that must connect directly to the master mongodb instance. I would like to have master.mongodb.service.consul resolve to the one IP of the master in the cluster.

Current solution (runs via cron on all machines):

Get my service definition through API
Check the status of the cluster. This determines if we should or should not have a tag.
Determine if the service definition's tag list needs to be updated.
If an update is required, POST data back to the API.

Ideal solution:

Set up my service definition with dynamic tags.
Write a script that returns the status of the cluster, with an exit code of 0 meaning to apply the tag.
Let consul update itself automatically.

Sample JSON (one static tag, one dynamic tag):

{
    "service": {
        "name": "mongodb",
        "tags": [
            "fault-tolerant",
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

This sort of solution could apply to issue #155 and #867, and possibly other.

themservice-metadata thinking

Source

fidian

👍41 ❤13

Most helpful comment

Hi,
I've added support for dynamic tags here, branch dynamic-tags.
If you are interested in this feature, please build and test it, any critique is appreciated. If everything is ok, I'll make a PR.
The syntacs for service registration is the following:

{
    "service": {
        "name": "mongodb",
        "tags": ["tag1"],
        "dynamictags": [
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

avdva on 16 Sep 2016

👍9

All 50 comments

Interesting idea. I think the work-around you mentioned is a decent way of doing this, but I'm going to leave this open as a thought ticket for now. Thanks!

ryanuber on 22 Jun 2015

@fidian with respect to your statement: "On boot they will query for mongodb.service.consul and join the cluster."

Can you describe this a bit more, since I want to setup something similar for a redis cluster. Do you use some handcrafted script (e.g., via consul-tenplate or the REST API) for querying for mongodb.service.consul to get all registered nodes for that service or are you relying on the DNS mechanism for that? At least one problem with solely relying on the DNS mechanism is, that if the node registeres itself (e.g., with registrator) within the consul cluster before it does the DNS lookup for mongodb.service.consul it might get back its own IP address, which would not be helpful to join the cluster... :-)

Kosta-Github on 22 Jun 2015

This would useful for services like zookeeper which dynamically elects a leader node among themselves every time a node joins or leaves the cluster and the leader has the setting on so that it no longer accepts client connections. Having dynamic tags like this via check would make so I could query consul for the non-leader nodes and not have a client trying to connect to the leader at all.

walrusVision on 23 Jun 2015

@Kosta-Github asked how I manage to auto cluster my mongo instances.

Consul is hooked up through dnsmasq.
Consul is started before mongo.
The health check fails unless mongo reports success and mongo is part of a cluster. This second part is vital - the health check fails until mongo is in a cluster.
The init script for mongo queries DNS for other members in the cluster. This will only report mongo instances that are already in a replica set.
- If IPs are found, become a slave and connect to the IP that we found.
- With no IPs, configure as a master and enable the replica set, which then makes the health check pass.

The only snag is that I must start one instance of mongo initially so it will bootstrap the replica set. Once it is running I am able to add and remove instances to my replica set.

fidian on 23 Jun 2015

👍1

@fidian thanks for the explanation; just one more question: how does your dnsmasq config look like? :-)

Kosta-Github on 23 Jun 2015

@Kosta-Github it looks like the following. I'd also answer questions off this issue. Feel free to email me directly at [email protected] so we don't continue to pollute this thread.

server=/consul./127.0.0.1#8600

fidian on 23 Jun 2015

+1 for this feature request

igoratencompass on 13 Jul 2015

eloycoto on 20 Jul 2015

hugochinchilla on 27 Jul 2015

xakraz on 21 Aug 2015

jh409 on 24 Sep 2015

adbourne on 8 Oct 2015

This would be very very nice. There are all kinds of things for which clients need to connect the mast expliclty. A dynamic tag would be so elegant. So much better then a bunch of add scripts to tweak tags.

memelet on 28 Nov 2015

danielbenzvi on 7 Dec 2015

wyhysj on 8 Dec 2015

+1 tag plus script would be very usful to implement custom DNS response logic

123BLiN on 14 Dec 2015

Currently have to run two 'services' for a similar situation, have a "redis" service which includes all nodes in the cluster, then a "redis-master" service

This has the unfortunate side-effect of meaning most of the redis nodes are always 'failing' the health check because theyre not the master..

Would definitely appreciate this feature as a way around this

richard-hulm on 17 Dec 2015

👍2

Consul 0.6 added a "tag override" feature that's useful for implementing schemes like this, though the logic is run outside of Consul, not from Consul itself as suggested here. Here's the issue that brought it in https://github.com/hashicorp/consul/issues/1102.

Here's a bit of the documentation, from https://www.consul.io/docs/agent/services.html:

The enableTagOverride can optionally be specified to disable the anti-entropy feature for this service. If enableTagOverride is set to TRUE then external agents can update this service in the catalog and modify the tags. Subsequent local sync operations by this agent will ignore the updated tags.

This would let an external agent like a script working with redis-sentinel to apply the tags to the current master via Consul's catalog API.

slackpad on 19 Dec 2015

+1 Would love to see this instead of the workaround with tag overriding.

mvanderlee on 14 Apr 2016

jcua on 19 Apr 2016

PedroAlvarado on 19 Apr 2016

This is brilliant idea :), I would also want this for redis cluster!

onnimonni on 11 May 2016

+1 This would give us the ability to determine which application version should receive LB traffic in marathon.

nickwales on 18 May 2016

rafaelcapucho on 18 May 2016

tomwganem on 19 May 2016

{
    "service": {
        "name": "mongodb",
        "tags": ["tag1"],
        "dynamictags": [
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

avdva on 16 Sep 2016

👍9

Was there ever a pull request for this topic. Still looks like something that was needed.

Techcadia on 28 Nov 2016

👍1

+1 This is much better than the current enableTagOverride or multiple service work around imho. Please pull this!

rhamon on 13 Dec 2016

I've mergred master branch from hashicorp/consul into my dynamic-tags branch. If you are interested in this feature, please, build and test it.
We've tested it in our environment and it worked. However, I'd like to receive more feedback before I make a PR. Error reports will be highly appreciated.

avdva on 14 Dec 2016

A colleague tried to build it and add to our internal debian repo but was
apparently stuck in dependency hell and gave up.

Le mer. 14 déc. 2016 7:53 AM, Aleksandr Demakin notifications@github.com
a écrit :

I've mergred master branch from hashicorp/consul into my dynamic-tags
branch. If you are interested in this feature, please, build and test it.
We've tested it in our environment and it worked. However, I'd like to
receive more feedback before I make a PR. Error reports will be highly
appreciated.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/consul/issues/1048#issuecomment-267026455,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGq6azO9gVfYXioHfjReVPKgKmwmTnHqks5rH-bHgaJpZM4FHT43
.

rhamon on 14 Dec 2016

This feature would be great for my use case. I would really like to see this merged in eventually.

andremarianiello on 2 May 2017

+1 Consul DNS even with the two service method takes 15 to 30 minutes to propagate in the UI, API, and DNS.

Sieabah on 23 Jun 2017

@Sieabah that sounds like a function of DNS caching some place - you can adjust the TTL value to maybe improve that. The API/UI shouldn't have any delay.

slackpad on 23 Jun 2017

@slackpad I have all of the DNS caching set to 0. Querying the API and ignoring the DNS takes about the same amount of resolution.

I'm sure there is something misconfigured as when I monitor the two boxes they're saying "synced service:mongo" and "synced service:primary-mongo". With the current service definition I'm able to get it to 5 minutes. During that time both services actually say they're the primary (in the UI and API) even when in the logs they switch immediately.

{
  "service": {
    "name": "primary-mongo",
    "tags": ["primary", "mongo"],
    "port": 27017,
    "check": {
      "name": "primary",
      "script": "python ~/consul_check_tags.py $(mongo --eval 'db.isMaster().ismaster' | grep 'true')",
      "interval": "5s",
      "timeout": "1s"
    }
  }
}

I've tried both reregistering via the API, reloading the config during the health check, reloading from the API. I don't know what is making it take 5 minutes to propagate to a cluster of 3 server and 2 client other than the anti-entropy timeout of syncing only every 1 minute?

Sieabah on 24 Jun 2017

@Sieabah we have a few issues we are looking into like https://github.com/hashicorp/consul/issues/2970, but it may be worthwhile for you to open a new GH issue so we can try to track down what's happening to you. Better to do it on a different issue than this one.

slackpad on 25 Jun 2017

❤1

:+1: this would simplify a lot of "workaround" we did to achieve this functionality to have master/slave tags

caquino on 22 Aug 2017

Did we get anywhere with this? I'm looking for something similar at the moment where I have a service that has a master / slave type setup.

adamlc on 6 Nov 2017

Not sure if Prepared Queries can be used to apply such rules. However, dynamic tagging is a good idea. Any plans to get it in ?

ramukima on 29 Nov 2018

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

drawks on 7 Jan 2019

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

nicholasamorim on 11 Oct 2019

👍1

This would be useful for us as well. What are your thoughts on how to design this ?
IMO a simple way would be to add a field like "output_as_tag": true to the check declaration struct. When set to true, the check output (as seen in Output in /v1/health/service/<service> query for example) would be captured and set as a tag, either on the node for a node check, or on the service for a service check.
If the value change, the previously set tag would be removed and the new one be added.
This tags would also be applied to sidecar services to ensure compatibility with Connect.

They are a few points to address tho :

If a command goes crazy and outputs 3Kb we probably don't want that as a tag, some filtering is probably needed.
We need to store that a tag comes from a check : if the output changes from "leader" to "follower", we need to remove the "leader" tag and add the "follower" tag. This needs to survive agent restart and be stored in the agent state files on disk, possibly requiring to update the file format.
This would also work for HTTP checks, especially since command checks can be dangerous (we have them disabled in our shop). Do we want the value for 50x's as tags ? I'd argue that we probably don't since in many cases 50x return a default "its not working" HTML, so dropping the tag in case of connection refused / http error codes would seem the most sensible to me.

ShimmerGlass on 12 Oct 2019

👍3

@Aestek I think this kind of stuff would be, I agree really useful. For now we have lots of services such as:

mycluster-zookeeper (all green in normal circumstances)
mycluster-zookeeper-leader (which have the same members, but tests are not the same, meaning that mycluster-zookeeper-leader are always 4 instances in warning state and 1 in passing

Having a way to merge those services in 1 single service and just add a tag leader would be great.

I know several systems where checks for this kind of features can also be simple HTTP checks, so limiting it to scripts is a bit less interesting.
I am not convinced by scrapping the output of regular checks to get the new tags because:

If you can set several checks with this value if it goes against another check ?
How to you add/remove existing tags ?

The https://github.com/hashicorp/consul/issues/1048#issuecomment-247585117 looks like a sensible approach (I mean, not linked to existing checks), because:

for each dynamic check, it describes explicitly what tag would be added
it avoid conflicts between outputs of several checks not agreeing on tags
its output would not change the output of checks
it would allow checks not only being script, but HTTP, TCP and so on

I did not check in details what has been done in https://github.com/avdva/consul/tree/dynamic-tags but it sounds to me like the right approach. While limiting the ability to have very dynamic things, it would greatly ease implementation (by avoiding conflicts on several checks most notably)

pierresouchay on 12 Oct 2019

👍1

I'll try to resurrect my branch soon, Will see, if it still works.

avdva on 15 Oct 2019

👍4

@avdva we are really interested by this, tell us when you do so ;)

pierresouchay on 18 Oct 2019

👍1

@avdva Did you have time to resurrect your branch ? Hope it's not too complicated with all the conflict their must be since 2016

ShimmerGlass on 6 Nov 2019

RedStalker on 13 Nov 2019

This is an interesting idea and I could imagine us adding such feature. The best way to get it in is to create a PR so that we have something to discuss. That would also make it easier to see the impact.

i0rek on 18 Feb 2020

+1. I think @ShimmerGlass's suggestion is great, the tag should come from the script itself. This covers OP's use case but would solve additional ones.
In our case, we have a dynamically generated ID in some of our services (the ID comes from dedicated hardware, and must be generated within the service), and it'd be great if we could propagate this ID to consul. A great way to solve this is to have a periodic script return the tag(s) to be applied.