Previously SRV records looked like:
;; ANSWER SECTION:
debug.packager-api.query.admiral. 1 IN SRV 1 1 21748 ivy.node.staging-gce-us-east1.admiral.
debug.packager-api.query.admiral. 1 IN SRV 1 1 16620 quinn.node.staging-gce-us-east1.admiral.
;; ADDITIONAL SECTION:
ivy.node.staging-gce-us-east1.admiral. 1 IN A 10.128.0.8
quinn.node.staging-gce-us-east1.admiral. 1 IN A 10.142.0.40
but with 1.7.1 they now look like:
;; ANSWER SECTION:
debug.packager-api.query.admiral. 1 IN SRV 1 1 28945 0a800008.addr.staging-gce-us-east1.admiral.
debug.packager-api.query.admiral. 1 IN SRV 1 1 19605 0a8e0028.addr.staging-gce-us-east1.admiral.
;; ADDITIONAL SECTION:
0a800008.addr.staging-gce-us-east1.admiral. 1 IN A 10.128.0.8
0a8e0028.addr.staging-gce-us-east1.admiral. 1 IN A 10.142.0.40
We had previously relied on the output to be in the format <node>.node.<dc>.admiral but now it looks like it's changed to <random?>.addr.<dc>.admiral. I don't see anything in the CHANGELOG mentioning this, was it intentional or accidental? Can we choose the behavior?
Client info
agent:
check_monitors = 0
check_ttls = 71
checks = 103
services = 71
build:
prerelease =
revision = 2cf0a3c8
version = 1.7.1
consul:
acl = disabled
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 4
goroutines = 204
max_procs = 4
os = linux
version = go1.13.8
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 72
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1197
members = 5
query_queue = 0
query_time = 1
Server info
agent:
check_monitors = 0
check_ttls = 6
checks = 6
services = 6
build:
prerelease =
revision = 2cf0a3c8
version = 1.7.1
consul:
acl = disabled
bootstrap = false
known_datacenters = 7
leader = false
leader_addr = 10.142.15.197:8300
server = true
raft:
applied_index = 19366355
commit_index = 19366355
fsm_pending = 0
last_contact = 36.245636ms
last_log_index = 19366355
last_log_term = 47
last_snapshot_index = 19359054
last_snapshot_term = 47
latest_configuration = [{Suffrage:Voter ID:5492e882-3ad4-af25-9b37-4e23b2ebf1f5 Address:10.142.15.198:8300} {Suffrage:Voter ID:ab3569f5-6fa5-c69e-94d4-d662d7738261 Address:10.142.15.197:8300} {Suffrage:Voter ID:63f20cf7-dffa-8890-3086-db5382449cf8
Address:10.142.15.199:8300}]
latest_configuration_index = 0
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Follower
term = 47
runtime:
arch = amd64
cpu_count = 1
goroutines = 188
max_procs = 1
os = linux
version = go1.13.8
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 72
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1197
members = 5
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1297
members = 21
query_queue = 0
query_time = 1
CentOS 7 on GCE
N/A
Agree, we're running into the same issue here as well:
root@nomad-compute-i-0943ad3695c5bd2b1 [dev-usw2-dev1] ~ # dig jobs-shawn-postgres.service.consul SRV @127.0.0.1 -p 8600
; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> jobs-shawn-postgres.service.consul SRV @127.0.0.1 -p 8600
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43432
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;jobs-shawn-postgres.service.consul. IN SRV
;; ANSWER SECTION:
jobs-shawn-postgres.service.consul. 0 IN SRV 1 1 23258 0a16004b.addr.dev-usw2-core1.consul.
;; ADDITIONAL SECTION:
0a16004b.addr.dev-usw2-core1.consul. 0 IN A 10.22.0.75
;; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Apr 21 01:12:51 UTC 2020
;; MSG SIZE rcvd: 134
We leverage unbound and consul-template to cache records but we can't figure where 0a16004b is coming from?
This is definitely a breaking change.
I'm seeing the same and wondering where the id.addr.datacenter.consul is coming from. It's a breaking change for us, too.
For now we're passing Consul 1.6.5 previous answer format which includes the node to work around it:
nomad-compute-i-0cdc8320aa6b1b1aa.node.dev-usw2-core1.consul
Bump on this issue please. Is there any way we can help speed up this issue?
I forgot to mention this in the OP: This was broken in https://github.com/hashicorp/consul/pull/6792 and given the commit message says:
Current implementation returns the node name instead of the service
address.
With this fix when querying for SRV record service address is return in
the SRV record.
This seems intentional.