Consul: SRV records changed to addr instead of node format

Created on 27 Feb 2020  路  5Comments  路  Source: hashicorp/consul

Overview of the Issue

Previously SRV records looked like:

;; ANSWER SECTION:
debug.packager-api.query.admiral. 1 IN  SRV     1 1 21748 ivy.node.staging-gce-us-east1.admiral.
debug.packager-api.query.admiral. 1 IN  SRV     1 1 16620 quinn.node.staging-gce-us-east1.admiral.

;; ADDITIONAL SECTION:
ivy.node.staging-gce-us-east1.admiral. 1 IN A   10.128.0.8
quinn.node.staging-gce-us-east1.admiral. 1 IN A 10.142.0.40

but with 1.7.1 they now look like:

;; ANSWER SECTION:
debug.packager-api.query.admiral. 1 IN  SRV     1 1 28945 0a800008.addr.staging-gce-us-east1.admiral.
debug.packager-api.query.admiral. 1 IN  SRV     1 1 19605 0a8e0028.addr.staging-gce-us-east1.admiral.

;; ADDITIONAL SECTION:
0a800008.addr.staging-gce-us-east1.admiral. 1 IN A 10.128.0.8
0a8e0028.addr.staging-gce-us-east1.admiral. 1 IN A 10.142.0.40

Reproduction Steps

We had previously relied on the output to be in the format <node>.node.<dc>.admiral but now it looks like it's changed to <random?>.addr.<dc>.admiral. I don't see anything in the CHANGELOG mentioning this, was it intentional or accidental? Can we choose the behavior?

Consul info for both Client and Server


Client info

agent:
        check_monitors = 0
        check_ttls = 71
        checks = 103
        services = 71
build:
        prerelease =
        revision = 2cf0a3c8
        version = 1.7.1
consul:
        acl = disabled
        known_servers = 3
        server = false
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 204
        max_procs = 4
        os = linux
        version = go1.13.8
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 72
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1197
        members = 5
        query_queue = 0
        query_time = 1


Server info

agent:
        check_monitors = 0
        check_ttls = 6
        checks = 6
        services = 6
build:
        prerelease =
        revision = 2cf0a3c8
        version = 1.7.1
consul:
        acl = disabled
        bootstrap = false
        known_datacenters = 7
        leader = false
        leader_addr = 10.142.15.197:8300
        server = true
raft:
        applied_index = 19366355
        commit_index = 19366355
        fsm_pending = 0
        last_contact = 36.245636ms
        last_log_index = 19366355
        last_log_term = 47
        last_snapshot_index = 19359054
        last_snapshot_term = 47
        latest_configuration = [{Suffrage:Voter ID:5492e882-3ad4-af25-9b37-4e23b2ebf1f5 Address:10.142.15.198:8300} {Suffrage:Voter ID:ab3569f5-6fa5-c69e-94d4-d662d7738261 Address:10.142.15.197:8300} {Suffrage:Voter ID:63f20cf7-dffa-8890-3086-db5382449cf8 
Address:10.142.15.199:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 47
runtime:
        arch = amd64
        cpu_count = 1
        goroutines = 188
        max_procs = 1
        os = linux
        version = go1.13.8
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 72
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1197
        members = 5
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1297
        members = 21
        query_queue = 0
        query_time = 1

Operating system and Environment details

CentOS 7 on GCE

Log Fragments

N/A

themdns

All 5 comments

Agree, we're running into the same issue here as well:

root@nomad-compute-i-0943ad3695c5bd2b1 [dev-usw2-dev1] ~ # dig jobs-shawn-postgres.service.consul SRV @127.0.0.1 -p 8600

; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> jobs-shawn-postgres.service.consul SRV @127.0.0.1 -p 8600
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43432
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;jobs-shawn-postgres.service.consul. IN SRV

;; ANSWER SECTION:
jobs-shawn-postgres.service.consul. 0 IN SRV    1 1 23258 0a16004b.addr.dev-usw2-core1.consul.

;; ADDITIONAL SECTION:
0a16004b.addr.dev-usw2-core1.consul. 0 IN A 10.22.0.75

;; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Apr 21 01:12:51 UTC 2020
;; MSG SIZE  rcvd: 134

We leverage unbound and consul-template to cache records but we can't figure where 0a16004b is coming from?

This is definitely a breaking change.

I'm seeing the same and wondering where the id.addr.datacenter.consul is coming from. It's a breaking change for us, too.

For now we're passing Consul 1.6.5 previous answer format which includes the node to work around it:

nomad-compute-i-0cdc8320aa6b1b1aa.node.dev-usw2-core1.consul

Bump on this issue please. Is there any way we can help speed up this issue?

I forgot to mention this in the OP: This was broken in https://github.com/hashicorp/consul/pull/6792 and given the commit message says:

Current implementation returns the node name instead of the service
address.
With this fix when querying for SRV record service address is return in
the SRV record.

This seems intentional.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

eshujiushiwo picture eshujiushiwo  路  3Comments

pritam97 picture pritam97  路  3Comments

wargamez picture wargamez  路  4Comments

powerman picture powerman  路  3Comments

philsttr picture philsttr  路  3Comments