Viewing leader logs after promoting a node to leader through POST to /api/v0/leaders, the node sees leader logs of []. A node started initially as a leader through the command line secrets is able to successfully see the leader logs.
Because the new leader cannot see the logs, it fails to mint any blocks!!
Mandatory Information
jcli --full-version output;jcli 0.8.5 (master-8e82d180+, release, linux [x86_64]) - [rustc 1.40.0 (73528e339 2019-12-16)]jormungandr --full-version output;To Reproduce
Steps to reproduce the behavior:
--secret parameter to start a passive nodeExpected behavior
When a node is promoted to a leader, the leader logs should be available so that the new leader can successfully mint blocks.
Node Started as Leader through command line option
~/cardano on î‚ master! ⌚ 0:36:20
$ jcli rest v0 leaders logs get --host http://127.0.0.1:3100/api
---
- created_at_time: "2020-01-08T08:27:26.838739967+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.26336"
scheduled_at_time: "2020-01-08T09:51:29+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838754876+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.35938"
scheduled_at_time: "2020-01-08T15:11:33+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838757070+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.39925"
scheduled_at_time: "2020-01-08T17:24:27+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838742552+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.27455"
scheduled_at_time: "2020-01-08T10:28:47+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838749305+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.34741"
scheduled_at_time: "2020-01-08T14:31:39+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838763111+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.41961"
scheduled_at_time: "2020-01-08T18:32:19+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838746490+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.29741"
scheduled_at_time: "2020-01-08T11:44:59+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838755607+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.37109"
scheduled_at_time: "2020-01-08T15:50:35+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838732303+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.25935"
scheduled_at_time: "2020-01-08T09:38:07+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838748433+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.34024"
scheduled_at_time: "2020-01-08T14:07:45+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838757821+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.40081"
scheduled_at_time: "2020-01-08T17:29:39+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838747462+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.31421"
scheduled_at_time: "2020-01-08T12:40:59+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838760256+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.40840"
scheduled_at_time: "2020-01-08T17:54:57+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838754014+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.35394"
scheduled_at_time: "2020-01-08T14:53:25+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838756328+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.38086"
scheduled_at_time: "2020-01-08T16:23:09+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838752972+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.35115"
scheduled_at_time: "2020-01-08T14:44:07+00:00"
status: Pending
wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838741590+00:00"
enclave_leader_id: 1
finished_at_time: ~
scheduled_at_date: "25.27255"
scheduled_at_time: "2020-01-08T10:22:07+00:00"
status: Pending
wake_at_time: ~
Node promoted to leader from passive
~/cardano ⌚ 21:29:23
$ jcli rest v0 leaders get --host http://127.0.0.1:3100/api
---
[]
~/cardano ⌚ 0:42:09
$ jcli rest v0 leaders post -f <mypoolsecretfile>.yaml --host http://127.0.0.1:3100/api
1
~/cardano ⌚ 0:42:33
$ jcli rest v0 leaders get --host http://127.0.0.1:3100/api
---
- 1
~/cardano ⌚ 0:42:42
$ jcli rest v0 leaders logs get --host http://127.0.0.1:3100/api
---
[]
@AndrewWestberg the situation is the following:
The leadership scheduler runs in 2 moments:
When you start the node with the secrets included (lets call it Leader mode), due to (1) it calculates the leadership events for the remaining slots within the existing epoch, since it finds cryptographic material (secrets) available.
When you start the node without the secrets (lets call it Passive mode), the leadership scheduler still runs, but since there is no cryptographic material (secrets included) it does not generate any event and you always have a log (if you have debug mode enabled is logs) like:
DEBG no item scheduled, waiting for next epoch, task: leadership
NB: You may get that same log also in "Leader" mode with secrets included when you are not selected for any slot leader (due to available stake and randomness)
Now the hint is in the ...waiting for next epoch... part of the logs. This means that if you load the secrets to a running Passive node using POST /api/v0/leaders, you won't see any leader events (as you already observed) since leadership scheduler does not run upon that action, only in (1) and (2), and have to wait until next epoch for that node to become active Leader, due to (2).
All this was to explain the reason why you do not have leadership logs in the current epoch in your case and AFAIK this is/was the intended behaviour, meaning that it is not a bug (more like a feature request)
Now, there is one workaround about this available right now that is being (ab)used around, but is not very practical/safe since it requires correct timing from the operators to execute POST and DELETE /api/v0/leaders per each END and START of epoch. Let me know if you need more info. Thank you.
@rinor Is it safe to start all the nodes as a leader (with secrets file) and then immediately execute DELETE when bootstrap completes? The idea is to run multiple passive nodes, monitor them for health and only ever promote one to leader with POST (after successful DELETE of previous leader).
For your #2, does the node have to be an active leader at the beginning of the epoch to absorb slot 0, or can it be in DELETE mode and still have that happen?
@rinor Is it safe to start all the nodes as a leader (with secrets file) and then immediately execute DELETE when bootstrap completes?
Only if you node is not elected for an early block that would occur before you have a chance to delete the leader's secrets.
For your #2, does the node have to be an active leader at the beginning of the epoch to absorb slot 0, or can it be in DELETE mode and still have that happen?
Should be a leader, this has to be achieved by adding leadership at the end of epoch of deleting by first slot of next epoch
I think I understand the issue. Going to go ahead and close this. I'll leave it up to IOHK folks if they want to do an enhancement to how this works in the future.
Most helpful comment
@AndrewWestberg the situation is the following:
The leadership scheduler runs in 2 moments:
When you start the node with the secrets included (lets call it Leader mode), due to (1) it calculates the leadership events for the remaining slots within the existing epoch, since it finds cryptographic material (secrets) available.
When you start the node without the secrets (lets call it Passive mode), the leadership scheduler still runs, but since there is no cryptographic material (secrets included) it does not generate any event and you always have a log (if you have debug mode enabled is logs) like:
DEBG no item scheduled, waiting for next epoch, task: leadershipNB: You may get that same log also in "Leader" mode with secrets included when you are not selected for any slot leader (due to available stake and randomness)
Now the hint is in the ...waiting for next epoch... part of the logs. This means that if you load the secrets to a running Passive node using POST
/api/v0/leaders, you won't see any leader events (as you already observed) since leadership scheduler does not run upon that action, only in (1) and (2), and have to wait until next epoch for that node to become active Leader, due to (2).All this was to explain the reason why you do not have leadership logs in the current epoch in your case and AFAIK this is/was the intended behaviour, meaning that it is not a bug (more like a feature request)
Now, there is one workaround about this available right now that is being (ab)used around, but is not very practical/safe since it requires correct timing from the operators to execute POST and DELETE
/api/v0/leadersper each END and START of epoch. Let me know if you need more info. Thank you.