Awx: Reducing dynamic inventory update locking

Created on 23 Sep 2019  路  14Comments  路  Source: ansible/awx

ISSUE TYPE
  • Feature Idea
SUMMARY


We're running into a situation, wherein we have an inventory of a few thousand hosts. This inventory is dynamic, there is a source script for the inventory inside of a project. This inventory is configured to overwrite variables and hosts, as our infrastructure can change a lot during the day, and all the variables are delivered through this dynamic source. We, as an organization / team, often need to run orchestration across the entire fleet. Thousands of hosts. These jobs can often last many hours, ensuring fleet health as the job continues through.

What we notice is that while such a job is executing, AWX will refuse to update that dynamic inventory. The inventory update job will remain pending while the original task continues.

This can be a pretty big problem for us, as our inventory can change throughout the day. New hosts can show up many times throughout the day, or be removed. Data about those hosts can change too, so we do want to have a fairly updated picture of inventory each time we launch a job.

What we'd like to see is the ability to execute the inventory update job while a job that USES the inventory is running. This will allow us to configure our inventory with "update on execution", with a reasonable cache. It will allow new hosts to reliably use provisioning call backs to get jobs.

Thanks!

api medium bug

Most helpful comment

With 9.1.1 I've finally had a moment to test this out. I have confirmed that while a long job that is configured to use an inventory source that updates on runs, I am able to trigger additional inventory updates.

I would consider this closed at this point. Thanks all!

All 14 comments

As a follow up, I can see that AWX also refuses to update a _project_ when there is a job running that uses a template from that project. That's another source of locking that I'd like to see removed.

cc @AlanCoding

I can agree on both those issues, in our case we have multiple running jobs at 8:00 am til noon, in this time there's no chance of getting a free slot where there's no dependency blocking the job.

The main job running on 1200 hosts, sometimes one host starts hanging in a state where no timeout of Ansible seems to take care of (missing routes), then we immediately get a whole bunch of pending jobs stack up til AWX throws only 500 errors.

Some fake numbers, consistent with the 2 reports we have here:

  • inventory update: 5 minutes
  • job run: 4 hours

We have 2 animals here - inventory updates and job runs. Ordered combinations of 2 items gives 4 distinct tuples.

Inventory update submitted while job in progress

I see no reason to block this. The job generates a static version of the inventory before the Ansible subprocess starts. The inventory update may delete some hosts by the time the playbook hits it on_stats event, in which case the event will have a missing host. That doesn't bother me.

We may still need to apply the inventory advisory lock during the step where we generate that static inventory.

job submitted while inventory update in progress

If the job was allowed to start, behavior would depend on how the transaction for the inventory update is managed. Anyway, with your timings, this wouldn't buy much anyway (5 min delay).

inventory update submitted while another inventory update in progress

We had many issues with deadlock with this case. These now block each other with an inventory-wide database lock, recent change from #3529

job submitted while another job in progress

This does not present much technical problem on the server side.

However, this is possibly very undesirable to the user. Because the state (state as in the objective state according to Ansible modules) of each server becomes inconsistent.

Now, if limit could be established to be mutually exclusive between two jobs, would could loosen this restriction. But right now we have no mechanism by which we could do this.

job submitted while another job in progress

This does not present much technical problem on the server side.

However, this is possibly very undesirable to the user. Because the state (state as in the objective state according to Ansible modules) of each server becomes inconsistent.

Now, if limit could be established to be mutually exclusive between two jobs, would could loosen this restriction. But right now we have no mechanism by which we could do this.

Is this not the toggle for Enable Concurrent Jobs? Decide on a template by template basis whether or not other templates should be allowed to run at the same time?

That does make sense. That still leaves open the question of multiple JTs targeting the same inventory. I am liable to be wrong about any of these beliefs about existing behavior.

@omgjlk any chance you're interested in trying out this PR: https://github.com/ansible/awx/pull/5328

Yeah I should be able to set something up today or tomorrow.

@fosterseth is this ready to be moved to needs_test?

cc @omgjlk we've merged this ("this" being https://github.com/ansible/awx/pull/5519)

Thanks. I plan on upgrading next week and will be able to validate.

With 9.1.1 I've finally had a moment to test this out. I have confirmed that while a long job that is configured to use an inventory source that updates on runs, I am able to trigger additional inventory updates.

I would consider this closed at this point. Thanks all!

Thanks for helping us test this out, @omgjlk cc @elyezer @fosterseth.

We'll do some final verification on our end and then close this out.

I was able to verify this one by doing the following:

1) Created a job template that ran a playbook that took many minutes to complete.
2) While the previous job was running, the related inventory update was triggered and started running immediately.

With that said I think we can close this one.

Was this page helpful?
0 / 5 - 0 ratings