I just tried the nice documentation on https://learn.hashicorp.com/consul/datacenter-deploy/deployment-guide. However, systemd can start consul and everything locks fine but the startup process never returns and thus finally systemd runs in a timeout.
Exactly like shown on https://learn.hashicorp.com/consul/datacenter-deploy/deployment-guide
This is for test purposes so I just configured bootstrap_expect = 1, no retry_join, performance or metrics.
Server info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 1200f25e
version = 1.6.2
consul:
acl = disabled
bootstrap = true
known_datacenters = 1
leader = true
leader_addr = 10.2.0.2:8300
server = true
raft:
applied_index = 186
commit_index = 186
fsm_pending = 0
last_contact = 0
last_log_index = 186
last_log_term = 9
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:f516c909-0079-07ba-cb2a-f2cd3d5a405b Address:10.2.0.2:8300}]
latest_configuration_index = 1
num_peers = 0
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 9
runtime:
arch = amd64
cpu_count = 1
goroutines = 80
max_procs = 1
os = linux
version = go1.12.13
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 1
event_time = 9
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
/usr/local/bin/consul --version
Consul v1.6.2
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
vim /etc/systemd/system/consul.service
[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl
[Service]
Type=notify
User=consul
Group=consul
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/
ExecReload=/usr/local/bin/consul reload
KillMode=process
Restart=on-failure
LimitNOFILE=65536
TimeoutSec=900
[Install]
WantedBy=multi-user.target
Jan 25 19:49:15 test systemd[1]: Starting "HashiCorp Consul - A service mesh solution"...
Jan 25 19:49:15 test consul[10967]: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Jan 25 19:49:15 test consul[10967]: WARNING: LAN keyring exists but -encrypt given, using keyring
Jan 25 19:49:15 test consul[10967]: WARNING: WAN keyring exists but -encrypt given, using keyring
Jan 25 19:49:15 test consul[10967]: bootstrap = true: do not enable unless necessary
Jan 25 19:49:15 test consul[10967]: ==> Starting Consul agent...
Jan 25 19:49:15 test consul[10967]: Version: 'v1.6.2'
Jan 25 19:49:15 test consul[10967]: Node ID: 'f516c909-0079-07ba-cb2a-f2cd3d5a405b'
Jan 25 19:49:15 test consul[10967]: Node name: 'test'
Jan 25 19:49:15 test consul[10967]: Datacenter: 'test' (Segment: '')
Jan 25 19:49:15 test consul[10967]: Server: true (Bootstrap: true)
Jan 25 19:49:15 test consul[10967]: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Jan 25 19:49:15 test consul[10967]: Cluster Addr: 10.2.0.2 (LAN: 8301, WAN: 8302)
Jan 25 19:49:15 test consul[10967]: Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
Jan 25 19:49:15 test consul[10967]: ==> Log data will now stream in as it occurs:
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:f516c909-0079-07ba-cb2a-f2cd3d5a405b Address:10.2.0.2:8300}]
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] serf: EventMemberJoin: test.test 10.2.0.2
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] serf: EventMemberJoin: test 10.2.0.2
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] raft: Node at 10.2.0.2:8300 [Follower] entering Follower state (Leader: "")
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [WARN] serf: Failed to re-join any previously known node
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [WARN] serf: Failed to re-join any previously known node
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] consul: Adding LAN server test (Addr: tcp/10.2.0.2:8300) (DC: test)
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] consul: Handled member-join event for server "test.test" in area "wan"
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
Jan 25 19:49:15 test consul[10967]: 2020/01/25 19:49:15 [INFO] agent: started state syncer
Jan 25 19:49:15 test consul[10967]: ==> Consul agent running!
Jan 25 19:49:20 test consul[10967]: 2020/01/25 19:49:20 [WARN] raft: Heartbeat timeout from "" reached, starting election
Jan 25 19:49:20 test consul[10967]: 2020/01/25 19:49:20 [INFO] raft: Node at 10.2.0.2:8300 [Candidate] entering Candidate state in term 7
Jan 25 19:49:20 test consul[10967]: 2020/01/25 19:49:20 [INFO] raft: Election won. Tally: 1
Jan 25 19:49:20 test consul[10967]: 2020/01/25 19:49:20 [INFO] raft: Node at 10.2.0.2:8300 [Leader] entering Leader state
Jan 25 19:49:20 test consul[10967]: 2020/01/25 19:49:20 [INFO] consul: cluster leadership acquired
Jan 25 19:49:20 test consul[10967]: 2020/01/25 19:49:20 [INFO] consul: New leader elected: test
Jan 25 19:49:21 test consul[10967]: 2020/01/25 19:49:21 [INFO] agent: Synced node info
Jan 25 20:04:15 test systemd[1]: consul.service: Start operation timed out. Terminating.
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Caught signal: terminated
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Graceful shutdown disabled. Exiting
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Requesting shutdown
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] consul: shutting down server
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [WARN] serf: Shutdown without a Leave
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [WARN] serf: Shutdown without a Leave
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] manager: shutting down
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: consul server down
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: shutdown complete
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp)
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp)
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Stopping HTTP server 127.0.0.1:8500 (tcp)
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Waiting for endpoints to shut down
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Endpoints down
Jan 25 20:04:15 test consul[10967]: 2020/01/25 20:04:15 [INFO] agent: Exit code: 1
Jan 25 20:04:15 test systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 20:04:15 test systemd[1]: consul.service: Failed with result 'timeout'.
Jan 25 20:04:15 test systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
Jan 25 20:04:15 test systemd[1]: consul.service: Service hold-off time over, scheduling restart.
Jan 25 20:04:15 test systemd[1]: consul.service: Scheduled restart job, restart counter is at 1.
Jan 25 20:04:15 test systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
I have the same issue, did you resolve it?
I figure it out, as in the service definition file, there is an option Type under [service] seciton:
[Service]
Type=notify
notify means the service should send a notification message to systemd when it has finished starting up, otherwise systemd will not proceed, see systemd document for details.
for consul, it just send the notification after a LAN join has completed, see consul agent:
When running under systemd on Linux, Consul notifies systemd by sending READY=1 to the $NOTIFY_SOCKET when a LAN join has completed. For this either the join or retry_join option has to be set and the service definition file has to have Type=notify set.
but our configuration is to set up a single node agent (bootstrap_expect = 1, no retry_join), so LAN join will not complete ever, this is why systemd get timeout.
we can solve this by set Type option to "exec", systemd will proceed after consul process started.
but I have other problem, when I run systemctl stop consul, the log always show that consul exits with error:
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: consul server down
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: shutdown complete
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=tcp
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=udp
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: Stopping server: protocol=HTTP address=[::]:8500 network=tcp
Feb 22 21:01:53 pyddot-ecs consul[6602]: 2020-02-22T21:01:53.238+0800 [WARN] agent: Timeout stopping server: protocol=HTTP address=[::]:8500 network=tcp
Feb 22 21:01:53 pyddot-ecs systemd[1]: consul.service: main process exited, code=exited, status=1/FAILURE
Feb 22 21:01:53 pyddot-ecs systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Feb 22 21:01:53 pyddot-ecs systemd[1]: Unit consul.service entered failed state.
Feb 22 21:01:53 pyddot-ecs systemd[1]: consul.service failed.
it seems that consul does not gracefully stop. still don't know why...
Hi @iotdog and @PMacho , thanks for reporting this.
I created an internal issue for our learn site to investigate on this and will comment on this issue as soon as we have updates.
I figure it out, as in the service definition file, there is an option Type under [service] seciton:
[Service]
Type=notifynotify means the service should send a notification message to systemd when it has finished starting up, otherwise systemd will not proceed, see systemd document for details.
for consul, it just send the notification after a LAN join has completed, see consul agent:
When running under systemd on Linux, Consul notifies systemd by sending READY=1 to the $NOTIFY_SOCKET when a LAN join has completed. For this either the join or retry_join option has to be set and the service definition file has to have Type=notify set.
but our configuration is to set up a single node agent (bootstrap_expect = 1, no retry_join), so LAN join will not complete ever, this is why systemd get timeout.
we can solve this by set Type option to "exec", systemd will proceed after consul process started.
but I have other problem, when I run systemctl stop consul, the log always show that consul exits with error:
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: consul server down
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: shutdown complete
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=tcp
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: Stopping server: protocol=DNS address=0.0.0.0:8600 network=udp
Feb 22 21:01:52 pyddot-ecs consul[6602]: 2020-02-22T21:01:52.238+0800 [INFO] agent: Stopping server: protocol=HTTP address=[::]:8500 network=tcp
Feb 22 21:01:53 pyddot-ecs consul[6602]: 2020-02-22T21:01:53.238+0800 [WARN] agent: Timeout stopping server: protocol=HTTP address=[::]:8500 network=tcp
Feb 22 21:01:53 pyddot-ecs systemd[1]: consul.service: main process exited, code=exited, status=1/FAILURE
Feb 22 21:01:53 pyddot-ecs systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Feb 22 21:01:53 pyddot-ecs systemd[1]: Unit consul.service entered failed state.
Feb 22 21:01:53 pyddot-ecs systemd[1]: consul.service failed.it seems that consul does not gracefully stop. still don't know why...
Indeed solved the problem. Thank you. And yes I also see this error when stopping consul.
I found a workaround solution. Add these two configurations in your Consul Server config file:
bootstrap_expect = 1
retry_join = ["127.0.0.1"]
This can work fine with type=notify.
same issue, it make me
Hi @iotdog and @PMacho , thanks for reporting this.
I created an internal issue for our learn site to investigate on this and will comment on this issue as soon as we have updates.
how's it going?
@Anricx , @PMacho , @iotdog the issue should now be fixed in the learn platform and the new version is live at https://learn.hashicorp.com/consul/datacenter-deploy/deployment-guide#configure-systemd.
We are closing this issue. In case you still get issues with the systemd configuration feel free to re-open this ticket.
Most helpful comment
I figure it out, as in the service definition file, there is an option Type under [service] seciton:
notify means the service should send a notification message to systemd when it has finished starting up, otherwise systemd will not proceed, see systemd document for details.
for consul, it just send the notification after a LAN join has completed, see consul agent:
but our configuration is to set up a single node agent (bootstrap_expect = 1, no retry_join), so LAN join will not complete ever, this is why systemd get timeout.
we can solve this by set Type option to "exec", systemd will proceed after consul process started.
but I have other problem, when I run systemctl stop consul, the log always show that consul exits with error:
it seems that consul does not gracefully stop. still don't know why...