Binderhub: EventStream disconnects are "harmless errors"

Created on 30 Nov 2018  路  7Comments  路  Source: jupyterhub/binderhub

ref: https://github.com/jupyter/repo2docker/issues/412#issuecomment-442663788

A frequent source of "my repo doesn't build" comments comes from when the BinderHub loading page loses connection to the EventStream. This happens sometimes for builds (short and long :-/) as well as when you snooze your laptop. The visual feedback currently is unambiguous: your build failed.

You have to know that this error is one of these errors that you can ignore and fix by refreshing the page.

Can we detect losing connection to the event stream and distinguish it from the stream ending because the build is over? That way we could either reload automatically or present a different message to the user that encourages them to do so.

Most helpful comment

In lieu of any serious fix, is it possible to just append a message to the existing "Failed to connect to event stream" to have the user simply try refreshing the page?

All 7 comments

I really want to implement the proper event-stream api where connecting to an event stream never has any effects, and exclusively retrieves events. Then there's no need to notify the user of lost event-stream connections, just constantly reconnect. As it is now, reconnecting the event stream causes a new request to build and launch.

What do you think of splitting the current /build/<spec> endpoint into two:

  1. PUT /trigger_build/<spec> to kick off a build and
  2. GET /build_logs/<spec> to see a stream of the current logs

The PUT can return a 201 if the repo has never been built before and a 200 (or 204?) otherwise. If the return code is not a 201 there would be no promise that the log endpoint contains anything. It however might contain stuff if a previously triggered build is still in progress.

This would let our "client website" connect to the log endpoint all the time like you suggested and didn't have to tell the user.

One thing I don't know how to do: where do we keep the Queue that is created for the builder to put log messages? We'd need some shared place that both the build trigger and build log handler can access. Would it be an attribute of the tornado app instance?


Do we need to go from API v2 to v3 for this? Maybe we can stay at v2 and keep /build/ as is but stop using it ourselves and persuade thebelab/juniper/nbinteract maintainers to switch over to the new endpoints?

We've sketched this out in #358, but that's basically how it works. The trick is in designing the API such that:

  1. any state lives in the token, so that multiple BinderHub instances can handle each of the the now multiple requests (this works already for build logs, but maybe not launch)
  2. event-stream handler is purely side-effect-free
  3. when is the launch requested?

Here's a sketch:

  1. consumer does POST /api/v2/build/:spec. This triggers the build if needed and selects everything that will be used throughout the process (build id, username, image name)
  2. reply contains a jwt with information about a username and build id, and whether a build has been launched
  3. GET /api/v2/events/:jwt parses the token and hooks up logs of the running build, notifying when build is complete (build may be complete before this request is made)
  4. POST /api/v2/launch/:jwt actually does the launching and replies with the running server url and token (should error appropriately if build is incomplete or failed, which would mean the request was made too early)

In lieu of any serious fix, is it possible to just append a message to the existing "Failed to connect to event stream" to have the user simply try refreshing the page?

I think that would be a good first step. Especially considering this issue has been here since November last year and no one has had time to work on it. Do you want to give extending the error message a go @rheiland ?

Not sure if this is the exact same issue but I get the dreaded failed to connect to event stream for this repo: https://github.com/gaelforget/MeshArrayNotebooks. I tried from a different branch and get the same result. However, mybinder.org does build fine when I use this identical fork: https://github.com/hubgcm/MeshArrayNotebooks

What am I missing? How do I get back to being able to use binder with the original repo?

Please advise.

ps, 2020/01/15. Chrome did not show the same issue as Safari does.

This is an issue that is very hard to debug for us. It seems to depend more on the user than the repo. Evidence of this is that repositories users report as broken with this error message "just work" for me. Sometimes switching browsers or location (other network provider) fixes things for the users as well.

This means we rely on someone who has this problem to debug it or figure out how to make it reproducible for others.

A PR that adds "try reloading the page" to the error message would be welcome and (empirically) would help a large fraction of users who encounter this problem.

Was this page helpful?
0 / 5 - 0 ratings