Jormungandr: REST API: Passive node promoted to Leader through rest api doesn't see leader logs

Created on 8 Jan 2020  Â·  5Comments  Â·  Source: input-output-hk/jormungandr

Viewing leader logs after promoting a node to leader through POST to /api/v0/leaders, the node sees leader logs of []. A node started initially as a leader through the command line secrets is able to successfully see the leader logs.

Because the new leader cannot see the logs, it fails to mint any blocks!!

Mandatory Information

  1. jcli --full-version output;
    jcli 0.8.5 (master-8e82d180+, release, linux [x86_64]) - [rustc 1.40.0 (73528e339 2019-12-16)]
  2. jormungandr --full-version output;
    jormungandr 0.8.5 (master-8e82d180+, release, linux [x86_64]) - [rustc 1.40.0 (73528e339 2019-12-16)]

To Reproduce
Steps to reproduce the behavior:

  1. Start jormungandr without the --secret parameter to start a passive node
  2. Send a POST request to the node with json data to make it a leader. See response that it was made into a leader.
  3. run jcli command for leader logs and see empty array

Expected behavior
When a node is promoted to a leader, the leader logs should be available so that the new leader can successfully mint blocks.

Node Started as Leader through command line option

~/cardano on  master! ⌚ 0:36:20
$ jcli rest v0 leaders logs get --host http://127.0.0.1:3100/api
---
- created_at_time: "2020-01-08T08:27:26.838739967+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.26336"
  scheduled_at_time: "2020-01-08T09:51:29+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838754876+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.35938"
  scheduled_at_time: "2020-01-08T15:11:33+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838757070+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.39925"
  scheduled_at_time: "2020-01-08T17:24:27+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838742552+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.27455"
  scheduled_at_time: "2020-01-08T10:28:47+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838749305+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.34741"
  scheduled_at_time: "2020-01-08T14:31:39+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838763111+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.41961"
  scheduled_at_time: "2020-01-08T18:32:19+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838746490+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.29741"
  scheduled_at_time: "2020-01-08T11:44:59+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838755607+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.37109"
  scheduled_at_time: "2020-01-08T15:50:35+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838732303+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.25935"
  scheduled_at_time: "2020-01-08T09:38:07+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838748433+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.34024"
  scheduled_at_time: "2020-01-08T14:07:45+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838757821+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.40081"
  scheduled_at_time: "2020-01-08T17:29:39+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838747462+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.31421"
  scheduled_at_time: "2020-01-08T12:40:59+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838760256+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.40840"
  scheduled_at_time: "2020-01-08T17:54:57+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838754014+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.35394"
  scheduled_at_time: "2020-01-08T14:53:25+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838756328+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.38086"
  scheduled_at_time: "2020-01-08T16:23:09+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838752972+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.35115"
  scheduled_at_time: "2020-01-08T14:44:07+00:00"
  status: Pending
  wake_at_time: ~
- created_at_time: "2020-01-08T08:27:26.838741590+00:00"
  enclave_leader_id: 1
  finished_at_time: ~
  scheduled_at_date: "25.27255"
  scheduled_at_time: "2020-01-08T10:22:07+00:00"
  status: Pending
  wake_at_time: ~

Node promoted to leader from passive

~/cardano ⌚ 21:29:23
$ jcli rest v0 leaders get --host http://127.0.0.1:3100/api                                         
---
[]

~/cardano ⌚ 0:42:09
$ jcli rest v0 leaders post -f <mypoolsecretfile>.yaml --host http://127.0.0.1:3100/api
1

~/cardano ⌚ 0:42:33
$ jcli rest v0 leaders get --host http://127.0.0.1:3100/api                                         
---
- 1

~/cardano ⌚ 0:42:42
$ jcli rest v0 leaders logs get --host http://127.0.0.1:3100/api
---
[]
enhancement question

Most helpful comment

@AndrewWestberg the situation is the following:

The leadership scheduler runs in 2 moments:

  1. Once when the node starts, after initial bootstrap is completed, and
  2. At the very beggining of each epoch (absorbing slot 0 as a side effect)
  • When you start the node with the secrets included (lets call it Leader mode), due to (1) it calculates the leadership events for the remaining slots within the existing epoch, since it finds cryptographic material (secrets) available.

  • When you start the node without the secrets (lets call it Passive mode), the leadership scheduler still runs, but since there is no cryptographic material (secrets included) it does not generate any event and you always have a log (if you have debug mode enabled is logs) like:
    DEBG no item scheduled, waiting for next epoch, task: leadership
    NB: You may get that same log also in "Leader" mode with secrets included when you are not selected for any slot leader (due to available stake and randomness)

Now the hint is in the ...waiting for next epoch... part of the logs. This means that if you load the secrets to a running Passive node using POST /api/v0/leaders, you won't see any leader events (as you already observed) since leadership scheduler does not run upon that action, only in (1) and (2), and have to wait until next epoch for that node to become active Leader, due to (2).

All this was to explain the reason why you do not have leadership logs in the current epoch in your case and AFAIK this is/was the intended behaviour, meaning that it is not a bug (more like a feature request)

Now, there is one workaround about this available right now that is being (ab)used around, but is not very practical/safe since it requires correct timing from the operators to execute POST and DELETE /api/v0/leaders per each END and START of epoch. Let me know if you need more info. Thank you.

All 5 comments

@AndrewWestberg the situation is the following:

The leadership scheduler runs in 2 moments:

  1. Once when the node starts, after initial bootstrap is completed, and
  2. At the very beggining of each epoch (absorbing slot 0 as a side effect)
  • When you start the node with the secrets included (lets call it Leader mode), due to (1) it calculates the leadership events for the remaining slots within the existing epoch, since it finds cryptographic material (secrets) available.

  • When you start the node without the secrets (lets call it Passive mode), the leadership scheduler still runs, but since there is no cryptographic material (secrets included) it does not generate any event and you always have a log (if you have debug mode enabled is logs) like:
    DEBG no item scheduled, waiting for next epoch, task: leadership
    NB: You may get that same log also in "Leader" mode with secrets included when you are not selected for any slot leader (due to available stake and randomness)

Now the hint is in the ...waiting for next epoch... part of the logs. This means that if you load the secrets to a running Passive node using POST /api/v0/leaders, you won't see any leader events (as you already observed) since leadership scheduler does not run upon that action, only in (1) and (2), and have to wait until next epoch for that node to become active Leader, due to (2).

All this was to explain the reason why you do not have leadership logs in the current epoch in your case and AFAIK this is/was the intended behaviour, meaning that it is not a bug (more like a feature request)

Now, there is one workaround about this available right now that is being (ab)used around, but is not very practical/safe since it requires correct timing from the operators to execute POST and DELETE /api/v0/leaders per each END and START of epoch. Let me know if you need more info. Thank you.

@rinor Is it safe to start all the nodes as a leader (with secrets file) and then immediately execute DELETE when bootstrap completes? The idea is to run multiple passive nodes, monitor them for health and only ever promote one to leader with POST (after successful DELETE of previous leader).

For your #2, does the node have to be an active leader at the beginning of the epoch to absorb slot 0, or can it be in DELETE mode and still have that happen?

@rinor Is it safe to start all the nodes as a leader (with secrets file) and then immediately execute DELETE when bootstrap completes?

Only if you node is not elected for an early block that would occur before you have a chance to delete the leader's secrets.

For your #2, does the node have to be an active leader at the beginning of the epoch to absorb slot 0, or can it be in DELETE mode and still have that happen?

Should be a leader, this has to be achieved by adding leadership at the end of epoch of deleting by first slot of next epoch

I think I understand the issue. Going to go ahead and close this. I'll leave it up to IOHK folks if they want to do an enhancement to how this works in the future.

Was this page helpful?
0 / 5 - 0 ratings