Js-ipfs: offline boot: does not (re)bind

Created on 22 Jan 2019 · 13Comments · Source: ipfs/js-ipfs

When using js-ipfs v0.34.2 and websocket-star-multi with the ignore_no_online option, I'm able to start an js-ipfs embedded node while I have no internet connection.

But, if I regain connectivity, the transport doesn't try to bind to any of the given ws-star servers, rendering the current process unreachable.

What I would like to happen is to be able to bind (and rebind) if connectivity is intermitted, in spite of the node starting offline.

Is this something that should be handled by a) the app (and how?) b) libp2p or c) the ws-star-multi transport?

/cc @parkan

P1 dihard kinbug

Source

pgte

All 13 comments

seems like this would need some combination of an ononline event listener and a periodic connectivity check with backoff

for whatever reason, healing connections for web apps seems to be a Hard Problem -- gmail, slack, etc all rely heavily on the manual checks (Retrying in 3s... Try now), which makes me suspect the navigator.isOnline/online events don't work very well

parkan on 22 Jan 2019

the API that I would like would be a libp2p level facility that imitates the browser online status, which the app can rely on regardless of the transport

overall, this feels like a libp2p concern, and the real fix probably requires implementing the optional addresses at that level as per https://github.com/ipfs/js-ipfs/pull/1793#issuecomment-454391734

parkan on 22 Jan 2019

another idea would be to just periodically manually retry binding from inside the app as an interim solution, though I don't think there's an ipfs swarm command for doing that online right now

parkan on 22 Jan 2019

cc @mkg20001

alanshaw on 22 Jan 2019

I'm unsure whether the underlying logic for this should go into libp2p or ws-star-multi itself, ~~but I feel like the app would still need to trigger it instead of it automatically happening~~ but the app would still need a way of "forcing" some behaviours such as a manual .reconnect() function if the user is clicking a Reconnect Now button or a .cancelFurtherReconnects() for a Offline-Mode [×] toggle

Possible solutions:

ws-star-multi instance could emit reconnect(nullOrError, nullOrNextReconnectTime) events so the app could display disconnections from the ws-star network while automatically being handled by ws-star-multi itself
the above just in libp2p. existing close and listening events could be reused: libp2p would simply start reconnection if a listener would send a close event and libp2p isn't expecting this (when it's not trying to close the listener anyways) it would trigger the above re-connection cycle.
the above just in IPFS?

I'd go for either putting it into ws-star-multi (this could enable re-listening on servers that previously were down when the initial listen happened, but would require a hack to update the addresses in the peer-info) or putting it into libp2p core (possibly by mixing this with the proposal to make some addresses optional, so if a TCP address for e.x. is already used by another instance it would automatically start to listen on that address once the other instance is closed)

Opinions?

mkg20001 on 25 Jan 2019

So the pseudo-interface I made up so far:

class libp2p {
  // event(per-listener) reconnect(Error?: error, Number?: nextReconnectTime)
  // event(global) reconnect(Object?{[listener.address]: Error?}: errors, Number?: nextReconnectTime)
  set reconnectsEnabled (bool) {
    this._reconnectsEnabled = bool = Boolean(bool)
    if (bool) {
      recheckAndReconnect()
    } else {
      stopAllReconnects()
    }
  }

  get reconnectsEnabled() {
    return this._reconnectsEnabled
  }

  function recheckAndReconnect() {
    if (this._scheudledReconnect) {
      clearTimeout(this._scheudledReconnect)
      this._scheudledReconnect = null
      this._scheudledReconnectAt = null
    }
    iterateOverListeners(listener => {
      if (listener.needReconnect) {
        // somehow iterate over all listeners as key-value where address is key and do async listen
        // so it yields an object in the form of {"/ip4": Error("Something...")}
      }
    })
  }

  function _reconnectListener(listener) {
    listener.listen(listener.address, (err) => {
          if (err) {
            this._reconnectErrors[listener.address] = err // store errors for ui updates
            if (!this._scheudledReconnect) {
              const reconnectAt = Date.now() + this._reconnectCooldown // exponential growth? static? user specifiable growth function?
              this._scheudledReconnect = setTimeout(this.recheckAndReconnect.bind(this), reconnectAt) // scheudle reconnect
              this._scheudledReconnectTime = reconnectAt
            }

            this.emit('reconnect', this._reconnectErrors, this._scheudledReconnectTime) // notify app
            listener.emit('reconnect', err, this._scheudledReconnectTime) // notify per-listener watcher
          } else {
            delete this._reconnectErrors[listener.address]
            this.emit('reconnect', this._reconnectErrors, this._scheudledReconnectTime) // notify app
            listener.emit('reconnect') // notify per-listener watcher
          }
        })
  }

  /*
  listener.on('close', () => {
    if (!libp2pIsStopping) {
      if (this._reconnectsEnabled && !this._scheudledReconnect) {
        listener.needReconnect = true
      } else {
        this._reconnectListener(listener)
      }
    }
  })
  */
}

An app would hook it like this (but with actual react code, and not some random JSX-returning functions):

function offlineMode (enabled) {
  libp2p._reconnectsEnabled = enabled
}

libp2p.on('reconnect', (errors, nextReconnect) {
  if (errors) {
    return (<div>
      <h2>Reconnecting in {(nextReconnect - Date.now()) / 1000}...</h2>
      {Object.keys(errors).map(address => (<h4>{address} couldn't be reconnected: {errors[address]}))}
    </div>)
  } else {
    return (<div>Connected</div>)
  }
})

(Noticed that I missed events for when it begins reconnecing. This could either be a seperate event (reconnecting) or just a reconnect without nextReconnect variable set, addressess that are currently being dialed could have the error variable set to true to indicate a dial in progress - Also note that this was written without any type-checking just as a quick idea)

mkg20001 on 26 Jan 2019

I think libp2p should try to do a good effort of keeping the peer connected:

Delegating this to user-land makes app-development harder.
Delegating this to each transport yields each transport more complex and error-prone.

Implementing this in libp2p would avoid these 2 problems, implementing reconnect logic in a generic way.

Also, I think that the reconnecting behaviour should be the default behaviour and not require special configuration. IMO, opting out of this behaviour should be the exception.

Here is to better offline / intermittent-connectivity support from libp2p! :)

pgte on 27 Jan 2019

Implementing this in libp2p would avoid these 2 problems, implementing reconnect logic in a generic way.

Also, I think that the reconnecting behaviour should be the default behaviour and not require special configuration. IMO, opting out of this behaviour should be the exception.

Here is to better offline / intermittent-connectivity support from libp2p! :)

@pgte That's sort of what I've implemented in https://github.com/ipfs/js-ipfs/issues/1837#issuecomment-457771000

Btw, could I get some feedback on that idea before I start implementing it?

mkg20001 on 27 Jan 2019

Also I've got a few questions that I need answered before continuing with this:

I assume that exponential backoff will be used on failures.
- Should the user be able to specify their own backoff function?
- Should the backoff be counted per-listener or globally?
Should there be a maximum amount of retries? Should there be an option for that?
How is reconnection state best communicated to an application that consumes the libp2p swarm? Via the reconnecting event as described above? Different method?
Should there be APIs like the ones described above to allow the application to change the reconnect behavior on the fly (such as forcing a reconnection before the end of the cooldown?)

mkg20001 on 27 Jan 2019

@mkg20001 I'm not sure about the internals or the API (perhaps it's better to ask @jacobheun et al.), but in my opinion, this shouldn't require any new user-land API, reconnects should work out of the box.

One question (also to @jacobheun et al.) is what should be the transport interface that allows libp2p to manage dis/reconnects? I'm guessing the de-facto standard in node.js land would be emitting the events "connect" (once connected), "error", "reconnecting" and "end". I think that "connecting" can also be useful. Not sure how this plays with the current interface provided by a libp2p transport...

Internally, I'd like to see an exponential backoff (in the past I think I've used this package to achieve that), without a permanent failure (backoff ceils at about a few seconds).

IMO, all these backoff settings should be overridable.

pgte on 28 Jan 2019

@pgte

Not sure how this plays with the current interface provided by a libp2p transport...

The equivalent for end would be close and connect would be listening for a libp2p transport

but in my opinion, this shouldn't require any new user-land API, reconnects should work out of the box.

In my proposal it did, but there are also extra APIs to force a reconnect at _any point_ such as for adding a "Reconnect Now" button next to the countdown. IMO adding a few APIs to steer behavior is definitively a good idea.

Other than that I'd agree on the mentioned things.

Btw, @jacobheun could I get some feedback on my initial idea (See above)?

mkg20001 on 28 Jan 2019

I've started an issue in libp2p, https://github.com/libp2p/js-libp2p/issues/312, to track this since it's been discussed in several places for various reasons and will require updates to several modules. I added my thoughts to the issue there.

@mkg20001 the main thing I can see becoming an issue there is with multiple transports and multiple addresses. Consuming reconnect type events from libp2p might make it hard for apps to understand the state. I think keeping the listening state for each listener in the switch and bubbling up the overall state of the switch would be easier to consume. Then apps would only need to care about the listening state of libp2p. We could still expose the ability to check the state of each transport, but I think in the majority of cases apps would only care if we are listening on at least 1 address.