Galaxy: Performance issue in release_18.01

Created on 25 Apr 2018  ·  22Comments  ·  Source: galaxyproject/galaxy

Hi,

Introdution

In one of our project with @Mataivic, we generate rather huge dataset collections (3 x 10000 files) easily. We had some performance issues with the 17.05 when the form rendering (to populate the drop list I guess). It could take several minutes to display a tool form which required some fasta as input (we had 30000 of them in the history)
The timing with this 18.01 was perfect because the project was stuck.

But but but, it's worse with the 18.01. I sure or at least I hope that it's because I did a mistake during the migration.

Symptoms with this 18.01

Even with less than 6000 files in the history, Galaxy give up after 59s when displaying the tool form.
A red box appears eventually

Cannot connect to Galaxy
Galaxy is currently unreachable. Please try again in a few minutes. Please contact a Galaxy administrator if the problem persists.

With the module Web developer in Chrome, we get in network:

#Name: build?version=2.0.2&__identifer=5f1cbjxk366&tool_version=2.0.2
#Status: (failed)
#Type: xhr
#Initiator: jquery.js:9175
#Size: 0B
#Time: 1.0 min

We also reach this "timeout" when we want to display the job within the admin interface or when we want to get a shared history.

We (myself, @mhoebekesbr and @pbordron) don't get so much info in either the uwsgi or the nginx logs. But maybe, we don't watch at the good place.
We tried to skip NGINX, and we get the same result

uwsgi:
   http: :8080

Context

Hardware

It's a rather small dev instance hosted in a VM:

  • 6 cores
  • 12 Go RAM

Software / Configuration

galaxy.yml

uwsgi:
  http: :8080
  processes: 4
  threads: 4
  offload-threads: 1
  static-map: /static/style=static/style/blue
  static-map: /static=static
  master: true
  virtualenv: .venv
  pythonpath: lib
  module: galaxy.webapps.galaxy.buildapp:uwsgi_app()
  die-on-term: true
  hook-master-start: unix_signal:2 gracefully_kill_them_all
  hook-master-start: unix_signal:15 gracefully_kill_them_all
  py-call-osafterfork: true
  enable-threads: true
  mule: lib/galaxy/main.py
  farm: job-handlers:1

supervisord

[program:galaxy_uwsgi]
command         = /w/galaxy/galaxydev/galaxy/.venv/bin/uwsgi --yaml /w/galaxy/galaxydev/galaxy/config/galaxy.yml --logto /tmp/uwsgi_logto.log
directory       = /w/galaxy/galaxydev/galaxy/
umask           = 022
autostart       = true
autorestart     = true
startsecs       = 10
user            = galaxydev
environment     = PATH="/w/galaxy/galaxydev/galaxy/.venv/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",PYTHONPATH="/w/galaxy/galaxydev/galaxy/lib",HOME="/w/galaxy/galaxydev/galaxy/",USER="galaxydev",DRMAA_LIBRARY_PATH=/opt/sge/lib/lx24-amd64/libdrmaa.so.1.0
;numprocs        = 1
stopsignal      = INT
log_stdout      = true
log_stderr      = true
loglevel        = blather
logfile         = /tmp/supervisord_galaxy_uwsgi.log

nginx

upstream galaxy {
    server localhost:8080;
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;
    server_name _;

    client_max_body_size 10G; # max upload size that can be handled by POST requests through nginx

    # use a variable for convenience
    set $galaxy_root /w/galaxy/dev/galaxy;

    location / {
        proxy_pass          http://galaxy;
        # pour debug
        proxy_read_timeout 300;
        proxy_send_timeout 300;
        proxy_connect_timeout 300;
        # end pour debug
        proxy_set_header    X-Forwarded-Host $host;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
    }

    # serve framework static content
    location /static {
        alias $galaxy_root/static;
        expires 24h;
    }
    location /static/style {
        alias $galaxy_root/static/style/blue;
        expires 24h;
    }
    location /static/scripts {
        alias $galaxy_root/static/scripts;
        expires 24h;
    }
    location /robots.txt {
        alias $galaxy_root/static/robots.txt;
        expires 24h;
    }
    location /favicon.ico {
        alias $galaxy_root/static/favicon.ico;
        expires 24h;
    }

    # serve visualization and interactive environment plugin static content
    location ~ ^/plugins/(?<plug_type>.+?)/(?<vis_name>.+?)/static/(?<static_file>.*?)$ {
        alias $galaxy_root/config/plugins/$plug_type/$vis_name/static/$static_file;
        expires 24;
    }

    location /_x_accel_redirect/ {
        internal;
        alias /;
    }

Questions:

  • Since we couldn't get any informative messages, do you any advice to get some?
  • Do you see any obvious mistake in our configuration?

We can provide you some extra information if those above aren't enough.

Many thanks but advance.

Most helpful comment

✨ Magic @jmchilton

The forms now display almost immediately even with 45,000 datasets

You once again save my life project 🍻

All 22 comments

You weren't using uwsgi prior to this right - with 17.05?

Do you have histories with a large number of items visible or are they mostly hidden under collections in the history panel?

I'll spent a long time optimizing the precursor to the build endpoint but that was orders of magnitude before this 😅. I should try again with this newer endpoint and with larger histories.

I'm finding some exciting low hanging fruit that might really help - I did a bunch of other database optimization stuff recently so I'm maybe better at this than I was in the past. If I give y'all a patch against 18.01 any chance you can you test it for me?

I surely can help testing a patch, thanks!

@nsoranzo Out of curiosity, does your use case have a mix of lists and nested lists (list:list or list:paired) or is the problematic history just flat lists?

Just lists, for a total of around 50,000 datasets among 6 lists.

@nsoranzo https://github.com/galaxyproject/galaxy/pull/5977 should help a great deal for this usage pattern ... I'm not sure we have tests for the kinds of things that might break with though. I'd love some feedback on this if you are willing though - these results are often counter-intuitive.

Some notes I've been taken based on profiling that API endpoint are here. https://gist.github.com/jmchilton/d68565662f7f4b7ee2640f09fbb92962.

In addition to just raw timings of things in that branch - it'd be extra bonus cool to know how each commit affects the timing as well as having sql_debug log of the queries.
These logs can be generated by applying https://github.com/galaxyproject/galaxy/pull/5539 to your instance and hitting the build endpoint for the tool form that the browser does with a sql_debug=1 in the query parameter and then excavating them from your Galaxy web logs. Obviously collecting all that different data is a large project - even just before and after timings on the open PR would be super helpful and anything on top of that is just bonus.

You weren't using uwsgi prior to this right - with 17.05?

We already used uwgsi with the 17.05.
I didn't really catch the real difference between the way Galaxy deal with uwsgi in the 17.05 and the 18.01. I have really little knowledge in admin stuff.

Do you have histories with a large number of items visible or are they mostly hidden under collections in the history panel?

Mostly hidden under collections in the history panel

I'm finding some exciting low hanging fruit that might really help - I did a bunch of other database optimization stuff recently so I'm maybe better at this than I was in the past. If I give y'all a patch against 18.01 any chance you can you test it for me?

When you want! This instance is currently dedicated to this project and can crash for some good purpose 😁

I really don't know if they are some constructive clues but:

  • My colleague under his session can manage to display after few long second the tool form.
  • My tests were done using a copy of the same history, shared with me
  • I tested my session on his computer, on the same browser and got the same issue
  • Curiously, under my session, I don't have the number of dataset displayed in grey. I just have a list instead of a list of 810 items
  • Maybe it's just an impression of a coincidence, but another colleague observed that the performance on the display where better after a cleanup in his saved histories (delete permanently)

I will test your patch tomorrow. Should I just have to jump to this branch jmchilton:1801_db_opt?

Anyway many thanks for your interest and your quick response.

It is interesting and concerning that different users see different performance. Is it possible that one of you is in admin_users and one of you is not? If there are security checks and such skipped for an admin that might explain things?

New branch I'm thinking will be a bit better is jmchilton:1801_tool_state_opt_2 - for the new PR https://github.com/galaxyproject/galaxy/pull/5983.

I am admin and not my colleagues.
I can check easily this hypothesis with a couple of other rats :)

Le mer. 25 avr. 2018 à 20:41, John Chilton notifications@github.com a
écrit :

It is interesting and concerning that different users see different
performance. Is it possible that one of you is in admin_users and one of
you is not? If there are security checks and such skipped for an admin that
might explain things?

New branch I'm thinking will be a bit better is
jmchilton:1801_tool_state_opt_2 - for the new PR #5983
https://github.com/galaxyproject/galaxy/pull/5983.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/galaxyproject/galaxy/issues/5975#issuecomment-384392139,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKjyFvCEH85YyIEcLspwkRp7lAm8zItyks5tsMM8gaJpZM4TjFdF
.

So I spent a long time today working https://github.com/galaxyproject/galaxy/issues/5987 - my initial testing shows it can speed up things for tool forms over that don't use dynamic filters (so that should be most tools forms will see the improvement I think). I'll try to polish it up and open yet another new pull request in the coming days.

And here it is - https://github.com/galaxyproject/galaxy/pull/5997. It is much more performant in my tests - let me know though.

✨ Magic @jmchilton

The forms now display almost immediately even with 45,000 datasets

You once again save my life project 🍻

Can not agree more with @lecorguille!

Hum ... @jmchilton can you do the same magic on the tool
Collection Operations > newFilter failed datasets from a list

(The other tools of Collection Operations feel not better)

Many many thanks

@lecorguille Oh crap, good catch - I've pushed a bug fix into https://github.com/galaxyproject/galaxy/pull/5997 that fixes the performance for Failed Failed and other tools without any data input parameters.

@jmchilton Sorry, but I can't see any improvements

After 59s

Cannot connect to Galaxy
Galaxy is currently unreachable. Please try again in a few minutes. Please contact a Galaxy administrator if the problem persists.
Uncaught error.

Ugh - the problem with filter failed is conditional on the order of large vs. small collections in your history 😑. https://github.com/galaxyproject/galaxy/pull/6046 should however "fix" it, any chance I can get you to test it @lecorguille?

I will be happy to test that but on Friday (@ home today)
Many thanks for your celerity

Le mer. 2 mai 2018 à 21:19, John Chilton notifications@github.com a
écrit :

Ugh - the problem with filter failed is conditional on the order of large
vs. small collections in your history 😑. #6046
https://github.com/galaxyproject/galaxy/pull/6046 should however "fix"
it, any chance I can get you to test it @lecorguille
https://github.com/lecorguille?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/galaxyproject/galaxy/issues/5975#issuecomment-386090663,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKjyFitDYtEBbGoKiwa0UmmH7Zg0XN_Mks5tugbZgaJpZM4TjFdF
.

It's a Bird...It's a Plane...It's Superman... It's @jmchilton

@lecorguille can this be closed?

We still have some timeouts sometimes but we need to search a little on our
side.
The different PR definitly improve the UI :+1
I can reopen this issue later if needed.

Le ven. 18 mai 2018 à 20:46, Björn Grüning notifications@github.com a
écrit :

@lecorguille https://github.com/lecorguille can this be closed?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/galaxyproject/galaxy/issues/5975#issuecomment-390298137,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKjyFk5U-biAxtktGS-w0NmXiQaw_cf9ks5tzxb4gaJpZM4TjFdF
.

Was this page helpful?
0 / 5 - 0 ratings