Galaxy: Workflows not being scheduled when workflow handlers set to db-skip-locked

Created on 20 Jun 2019  路  19Comments  路  Source: galaxyproject/galaxy

Running web (calling uwsgi directly) and job handlers (using scripts/galaxy-main) separately and having the following config/workflow_schedulers.xml (or not having that file at all), leads to workflow invocations never being scheduled (they remain in new state):

<?xml version="1.0"?>
    <workflow_schedulers default="core">
    <core id="core" />
    <handlers assign_with="db-skip-locked" />
</workflow_schedulers>

Changing the handlers assignment method as follows triggers job scheduling.

   <handlers assign_with="db-self" />

Pouring through the logs with @natefoo, everything looks like is should, including the database values in the workflow_invocations table (which is _default_) the missing link is somewhere deeper.

areworkflows kinbug

All 19 comments

Ok, so, same issue I saw @natefoo. Cool, glad it's a bug and not just our weird setup.

@afgane for an interim solution I just have the following bash script running which makes things work well enough.

#!/bin/bash
while true; do
        psql -c "update workflow_invocation set handler = 'handler_main_' || (random() * 10)::integer where state = 'new' and handler = '_default_';" | grep -v 'UPDATE 0'
        sleep 1;
done

I've added this to gxadmin

bump this issue again. It seems to be a severe bug or we need to pull this option from the documentation.

Is it safe to use db-self for workflows while using db-skip-lock for normal job handlers in a multi master webless setup with dynamic handlers? I came accross this issue on current tip of release_19.05. Thanks

apparently the same happens when using db-transaction-isolation instead of db-skip-lock :-(.

Ok, I'm using the gxadmin call. However, I wonder, if this is being called from multiple hosts at the same time (because handler prefixes are host dependent) so that the workflows are balanced to handlers in different host, is this transactionally safe from the database point of view? Thanks!

I thought I did a better job documenting the deal with workflow schedulers and assignment methods but the only thing I see is what's in the sample config.

db-skip-locked and db-transaction-isolation are both supposed to work but are discouraged because they can't guarantee serial workflow execution in a single history. Either using mules or db-preassign with a statically configured <handlers> solution are preferred for that reason. If you can run a single static workflow scheduler with --server-name=whatever and <handlers><handler id="whatever"/></handlers> in your workflow schedulers config, that should solve the issue.

That said, this bug ought to be addressed, and I'll try to find the time this week to look at it.

I walked into this trap when doing the update to 20.01 and following the documentation 鈽癸笍

The preferred method depends on your deployment strategy:

    uWSGI + Mules - uWSGI Mule Messaging is preferred.
    uWSGI + Webless - Either Database SKIP LOCKED or Database Transaction Isolation is preferred.
    uWSGI + Hybrid - Either Database SKIP LOCKED or Database Transaction Isolation is preferred. If your mule and webless handlers are in non-overlapping pools (i.e. tags, or untagged), you can alternatively use both uWSGI Mule Messaging followed by either Database SKIP LOCKED or Database Transaction Isolation. If pools overlap, using uWSGI Mule Messaging would prevent any non-mule handlers in that pool from being assigned jobs.

@natefoo

So then with a job conf like this:

        <handlers assign_with="db-skip-locked" max_grab="8">
                <handler id="handler_main_0"/>
                <handler id="handler_main_1"/>
                <handler id="handler_main_2"/>
                <handler id="handler_main_3"/>
                <handler id="handler_main_4"/>
                <handler id="handler_main_5"/>
                <handler id="handler_main_6"/>
                <handler id="handler_main_7"/>
        </handlers>

this is wrong? There should only be a single workflow scheduler? Then it works or?

<?xml version="1.0"?>
    <workflow_schedulers default="core">
    <core id="core" />
    <handlers default="schedulers">
        <handler id="workflow_scheduler_main_0" tags="schedulers"/>
        <handler id="workflow_scheduler_main_1" tags="schedulers"/>
    </handlers>
</workflow_schedulers>

still an issue for EU

@natefoo do you have any ideas here?

Is the following a valid and recommended config?

<?xml version="1.0"?>

<workflow_schedulers default="core">
    <core id="core" />
    <handlers assign_with="db-self" default="schedulers">
        <handler id="workflow_scheduler_main_0" tags="schedulers"/>
        <handler id="workflow_scheduler_main_1" tags="schedulers"/>
    </handlers>
</workflow_schedulers>

Use assign_with="db-preassign" rather than db-self. You can use multiple workflow schedulers (.org does).

@hexylena we figured out in Barcelona what the issue was but I am not sure if we recorded that revelation - do you recall? Is the issue that a db-skip-locked job conf without a workflow scheduler conf is broken?

Here is .org's workflow scheduler conf, job conf handlers section (individual handlers are only defined here for plugin loading restrictions), and the workflow scheduler and handler supervisor configs.

Was this page helpful?
0 / 5 - 0 ratings