Jormungandr: [JIP] Easier leader promotion from a passive node

Created on 13 Jan 2020  路  8Comments  路  Source: input-output-hk/jormungandr

Is your feature request related to a problem/context ? Please describe if applicable.
As the network grew unstable in the last couple of days, pool operators began investigating ways how to run multiple nodes and switch leadership between them. Many of them have already implemented some scripts with more or less success.

However, this process is still really non-trivial due to #1522. Because of that issue, we cannot start with only one leader and in case of failure, promote another passive node to leadership. Rather some tricks are required - e.g. starting the redundancy node in leadership mode also and then quickly demoting it (which takes several seconds right after bootstrap is finished).

Because of this complicated process, it happens (rarely, but still) that operators unwillingly make multiple blocks for the same slot.

Describe the solution you'd like
The scheduler for leadership events is currently triggered:

  1. when the node starts, after initial bootstrap is completed
  2. at the very beginning of each epoch

I propose two possible solutions:

  1. Make a third scheduler trigger, right after calling jcli rest v0 leaders post.
  2. Add a new REST API endpoint, that will be used for manually triggering the scheduler.
enhancement Priority - Medium

Most helpful comment

@tstdin thank you for the proposal. To expand a bit, there is already an internal discussion for smth like this, that follows your logic with a small difference. Here is a short version of that proposal in a general form to handle live the cryprographic data (pool secrets) movements.

  1. Add leadership scheduler trigger for the actual epoch on new secret addition/insertion through api endpoint. - which is the same as you mentioned in (1)

  2. Add a new cold/hot node mode parameter ex: LeaderMode (on/off) available from command line args and config file, also changeable form rest api interface. - this is different from your proposal in (2)

A simplified graphical representation of the flow (within an epoch) may look like the following. The background color filled ones are the new possible additions to the current flow.

LeaderMode

Depending on the implementation the second part of your proposal

  1. Add a new REST API endpoint, that will be used for manually triggering the scheduler.
    may or may not be needed.

Thank you.

All 8 comments

IMO the protocol should allow a list of nodes to share leader for redundancy purposes and ensure only one is present when responsible to mint a block. The method in which that is done could be different than this suggestion, but would also be nice to have this suggestion right now

@tstdin thank you for the proposal. To expand a bit, there is already an internal discussion for smth like this, that follows your logic with a small difference. Here is a short version of that proposal in a general form to handle live the cryprographic data (pool secrets) movements.

  1. Add leadership scheduler trigger for the actual epoch on new secret addition/insertion through api endpoint. - which is the same as you mentioned in (1)

  2. Add a new cold/hot node mode parameter ex: LeaderMode (on/off) available from command line args and config file, also changeable form rest api interface. - this is different from your proposal in (2)

A simplified graphical representation of the flow (within an epoch) may look like the following. The background color filled ones are the new possible additions to the current flow.

LeaderMode

Depending on the implementation the second part of your proposal

  1. Add a new REST API endpoint, that will be used for manually triggering the scheduler.
    may or may not be needed.

Thank you.

@rinor Thank you for sharing the idea, I really like it. Having the secrets loaded with the LeaderMode on/off switch would also fix loosing track of the upcoming leader events during the epoch rollover.

The "2. Add a new REST API endpoint, that will be used for manually triggering the scheduler." would not be needed in this scenario.

Flow chart looks good. Is the function to generate logs idempotent? If so it would be useful to have that step run after the LeadMode is toggled on

This is excellent! Thank you for sharing, @rinor.

I was about to write a story to put onto the backlog for this very thing. The concept I had in mind is almost identical to what's described in the flow chart. Basically, add a third state for the node to be able to take on. Currently, there are effectively two states: passive and leader. The third state would be "promotable". A promotable node would be started just like a leader node, but with the --promotable command line switch. Since a promotable node is loaded with the secrets, it is able to generate the leader logs, but it is not allowed to act on them by creating blocks. Once a promotable node is promoted to leader, it is given permission to produce blocks during scheduled slots. (One assumes that the pool operator has done the right thing and demoted the previous leader, simultaneous with promoting the new leader.)

The only thing that would make this proposal even better, is if we could incorporate a message that is sent from the leader that's about to be promoted to the existing leader, so that the leadership swap could be done with a single command. This would ensure that an operator isn't able to make two nodes active leaders simultaneously. The message to do this could be a direct message from one node id to another, which would require that the nodes be assigned static node ids when they are launched. The REST call could even require that the node operator supply the node id of the current leader, or else the call would fail, and the leaders would stay as is until the call is made correctly.

Even when the network stability issues are resolved, this will become important for performing regular maintenance (software upgrades; hardware upgrades; etc) on stake pools. Thus, it should be a story in the backlog of the haskell team as well.

Again, thank you for making this a priority!

@rinor Love the graphic. Pardon my ignorance. What stops a bad actor from having multiple cloned leaders with configs turned ON at the start of the boundary from doing the same thing that is going on now? We get one node_ID propagating one block multiple times with multiple hashes, causing more forks.

@JSCSJSCS topic regarding addressing adversarial fork is for #1503

@NicolasDP, any possibility of this making it into next Wednesday's release that Charles referred to in his 2/3/2020 update?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dorin100 picture dorin100  路  13Comments

disassembler picture disassembler  路  14Comments

consuman picture consuman  路  16Comments

stanfieldr picture stanfieldr  路  42Comments

hemants1 picture hemants1  路  14Comments