Galaxy: GIE Hardening

Created on 19 Sep 2017  Â·  24Comments  Â·  Source: galaxyproject/galaxy

After a few attempts to teach workshops (mainly overseas) we've discovered some robustness problems with GIEs on usegalaxy.org. Here's a list of identified issues and potential resolutions, as well as planning for enhancements, in order of priority:

  • [x] Re-fix autoloading launch dataset @natefoo (broken by #4458, fix in #4760)
  • [x] Client alive timeouts don't work for slow connections: client side functions (the readiness test and keepalive) use a 500ms timeout which is easily hit. Also these functions can be updated to handle general failures and timeouts differently. The timeout count should be reset when the connection succeeds @natefoo #4680
  • [x] Increase the in-container timeout (this is trivially easynot supported with the containers lib =/) @natefoo #4740
  • [ ] Monitoring: ensure usegalaxy.org GIEs are working, probably w/ Jenkins/Selenium @jmchilton #4694
  • [ ] Reconnecting to an existing running container @martenson

    • [ ] Reconnection UI

    • [ ] Indicator of whether you have a running container

  • [x] More feedback on progress while waiting for launch, remove spinner and display permanent error message on failure @natefoo #4680
  • [ ] Asynchronous dataset get() on launch
  • [x] Missing green "save" button in 17.05.1 Jupyter image

    • [x] Fix and build new image @bgruening bgruening/docker-jupyter-notebook#23

    • [x] Deploy new image on Test/Main @natefoo

  • [ ] Prevent launch on configurably large datasets
  • [ ] uWSGI proxy @natefoo
  • [ ] Keep terminated containers for a configurable period and allow them to be relaunched

    • [ ] Most recent container only

    • [ ] Older containers

  • [ ] Multiple containers per user/session
  • [x] swarmscale bug fixes @natefoo galaxyproject/usegalaxy-playbook@dd9ab853788639b2b2916a215a8d955258d55e98
  • [x] Implement Docker API containers interface @natefoo #5861
  • [ ] Improve the launch UI
areGIEs areUI-UX kinbug kinenhancement statuplanning

Most helpful comment

mockup of adjusted UI, this would allow non-uniform approach based on what environment you select (some allow selecting datasets, some whole histories, some nothing etc.). You could also initialize the dialogue based on what is passed as params and validate it against the already available IE configs

ie_launcher

All 24 comments

ping @nekrut

@erasche @bgruening - we need to solve the mystery of the green button. @davebx volunteered to help so can you guide him?

We're happy to help, but this isn't a good week for us. Next week will be better for me at least, I think @bgruening will still be at conferences/meetings then though.

Would be neat to enable checkpointing and restoring, including filesystem changes.

Client/timeout fixes are now on Test/Main.

The container timeout is now 5 minutes on Test/Main, and both now use the 17.09 image, which includes the fixed magic button.

mockup of adjusted UI, this would allow non-uniform approach based on what environment you select (some allow selecting datasets, some whole histories, some nothing etc.). You could also initialize the dialogue based on what is passed as params and validate it against the already available IE configs

ie_launcher

vector: ie_launcher.pdf

@martenson I like this. Any thoughts on how the UI might display any running containers you have and potentially any stopped/relaunchable containers (assuming we decide to offer that capability)?

@natefoo ?
screenshot 2017-10-06 14 31 17

  • I would also add a column for "Datasets Attached" with the first few dataset names hyperlinked (but made so it just ... rather than overflowing. Not so important, but might give more context as to why that was launched. This obviously would not capture datasets added after the fact, but even just "Launched on HISAT results from Arabidopsis sequencing" might be helpful
  • I'd make the Environment + version one thing. The image version is way less important. I haven't seen many "flavours", mostly just the default one that ships with that version of Galaxy.

@erasche

  • in Jupyter, and I assume elsewhere too, you can get() any datasets you want so the initial set is not that helpful I believe. Maybe something like total_uptime = 8 hours might help to identify what you want?
  • if there are not many combinations this makes sense to unify into one step

@martenson

yes, you can get(), but I really believe that knowing which datasets it was initially launched on will be a good reminder. More than "It was running for 8 hours on tuesday", I don't remember what I did tuesday, but I might remember that I started analysing my (e.g.) htseq data. I know it is extra data to store, but if we're doing this as UX improvements to GIEs, I think it'd be important.

Tags?
On Fri, Oct 6, 2017 at 3:09 PM Eric Rasche notifications@github.com wrote:

@martenson https://github.com/martenson

yes, you can get(), but I really believe that knowing which datasets it
was initially launched on will be a good reminder. More than "It was
running for 8 hours on tuesday", I don't remember what I did tuesday, but I
might remember that I started analysing my (e.g.) htseq data

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/galaxyproject/galaxy/issues/4651#issuecomment-334844225,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAE4ZUP-iPVvAkKrNqvAyq3m71mJQG02ks5spnrJgaJpZM4Pc7Ot
.

I'd make the Environment + version one thing

I don't know - I'd keep the version separate if there are more than one. You can imagine different flavors of any of these. I just wouldn't call it version. I'd imagine different flavors of Jupyter for instance with different libraries installed. If there is just one version - then definitely collapse - that is totally cool by me.

You can imagine different flavours (indeed we did during implementation), but I am yet to see anyone deploy multiple flavours. Even we only have one jupyter flavour.

@jxtx something like this?
screenshot 2017-10-06 15 18 21

I'd consider having more if they weren't so big. They are very big.

On Fri, Oct 6, 2017 at 15:16 Eric Rasche notifications@github.com wrote:

You can imagine different flavours (indeed we did during
implementation), but I am yet to see anyone deploy multiple flavours. Even
we only have one jupyter flavour.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/galaxyproject/galaxy/issues/4651#issuecomment-334845810,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARg-Y7y5xzxfkxxwKYs7L6brZHNB1UXks5spnxrgaJpZM4Pc7Ot
.

I was thinking explicit tags for GIE instances. Seems like a better
reminder than inferring from associated datasets. Could initialize
automatically that way though.

On Fri, Oct 6, 2017 at 3:18 PM Martin Cech notifications@github.com wrote:

@jxtx https://github.com/jxtx something like this?
[image: screenshot 2017-10-06 15 18 21]
https://user-images.githubusercontent.com/1814954/31294623-9d5d2c32-aaa9-11e7-972d-7103ba12755c.png

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/galaxyproject/galaxy/issues/4651#issuecomment-334846364,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAE4Zf0QjgQbpCWuVCb5vfewwoIXXvjQks5spn0KgaJpZM4Pc7Ot
.

screenshot 2017-10-06 15 48 26

Hi @natefoo , everyone,

Maybe here is the place to post a comment related to the fact that we can't save into a Galaxy history a Jupyter notebook .ipynb if the kernel is R... It seems that the "save the current notebook in Galaxy" (the so-called magix button ;) ) function works only if we reconnect the Python 2 kernel.... If you want I create a dedicated issue, don't hesitate.

In the meantime, as the user can (switching to the Python 2 kernel) save the ipynb, this is not a priority I think to work on it..

When testing loading of Jupyter notebooks on main today from Selenium I encountered a bunch of different transient failures getting the notebook to load:

  • [ ] One time I just landed on a page with a single line of text that said something about proxy target not being set maybe.
  • [ ] One time the notebook loaded but there was no stylesheet applied.
  • [ ] I got just a big gray block instead of the Jupyter notebook a few times - I'm not sure what that is about.
  • [ ] Also encountered a couple timeouts after several minutes.

I added a new to-do item, there are some ways we can improve the launch experience that all generally pertain to the "readiness check" (i.e. spinning on /interactive_environments/ready):

  1. There's a hack in place (#5677) that prevents the GIE template from returning until the container/service's ports are returned by Docker, since this may not actually succeed immediately after container/service creation. We should instead return immediately and allow /interactive_environments/ready to update the proxy map with the ports once they are available.
  2. Right now we perform a readiness check to see if the container is running. Once that's done, we stop checking the container state and immediately attempt to connect to the notebook. However, this may never succeed due to proxy/network/whatever problems, and the container may shut down. The user has no way of knowing that it's shut down and that it will never succeed and instead just stares at the spinner until they get fed up and quit. So instead of having the readiness check stop once it's ready, it would be better to continue checking state concurrently with the connection attempts (and keepalive requests?).
  3. Instead of the toast notifications, it'd be nice to have something like a persistent box centered in the middle pane that has a "row" for each state change and our familiar colors/icons to indicate what's happening.

For example (mocked as a table using some github emoji, where each table shows the progression, but hopefully you get the idea):

| Launching Jupyter |
| --- |
| :curly_loop: Starting a container in which to run Jupyter |
| :clock5: Connecting to container |

| Launching Jupyter |
| --- |
| :curly_loop: Starting a container in which to run Jupyter |
| :information_source: Container state is PENDING |
| :information_source: Container state is ASSIGNED |
| :clock5: Connecting to container |

Once running, the state progression can collapse:

| Launching Jupyter |
| --- |
| :heavy_check_mark: Starting a container in which to run Jupyter: RUNNING v |
| :curly_loop: Connecting to container |

| Launching Jupyter |
| --- |
| :heavy_check_mark: Starting a container in which to run Jupyter: RUNNING v |
| :x: Connecting to container |
| :exclamation: The container is no longer running, click here to attempt to launch again, or report this error |

Was this page helpful?
0 / 5 - 0 ratings