Jobs running on container instance groups never start, stuck in pending state. I think this is related to the recent PR #8333.
This appears to be the line that the scheduler is erroring on:
https://github.com/ansible/awx/blob/d550487bc8cbdfd333e5881b9fc6192b63252d3a/awx/main/scheduler/task_manager.py#L287
In the PR an instances argument was added to that function:
https://github.com/ansible/awx/blob/d550487bc8cbdfd333e5881b9fc6192b63252d3a/awx/main/models/ha.py#L265
Container is created and job is completed
Job remains in pending state indefinitely
The following keeps getting output to the awx_task log every few seconds:
2020-10-26T10:20:05.725390912Z 2020-10-26 10:20:05,722 DEBUG awx.main.dispatch task c105fdd9-7077-4594-99da-1703ebc58db7 starting awx.main.scheduler.tasks.run_task_manager(*[])
2020-10-26T10:20:05.725420112Z 2020-10-26 10:20:05,724 DEBUG awx.main.scheduler Running Tower task manager.
2020-10-26T10:20:05.732995992Z 2020-10-26 10:20:05,729 DEBUG awx.main.scheduler Starting Scheduler
2020-10-26T10:20:05.831512837Z 2020-10-26 10:20:05,830 ERROR awx.main.dispatch Worker failed to run task awx.main.scheduler.tasks.run_task_manager(*[], **{}
2020-10-26T10:20:05.831538437Z Traceback (most recent call last):
2020-10-26T10:20:05.831543637Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
2020-10-26T10:20:05.831547937Z result = self.run_callable(body)
2020-10-26T10:20:05.831551637Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
2020-10-26T10:20:05.831555537Z return _call(*args, **kwargs)
2020-10-26T10:20:05.831559637Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/tasks.py", line 16, in run_task_manager
2020-10-26T10:20:05.831563537Z TaskManager().schedule()
2020-10-26T10:20:05.831587738Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 644, in schedule
2020-10-26T10:20:05.831601338Z self._schedule()
2020-10-26T10:20:05.831604738Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 632, in _schedule
2020-10-26T10:20:05.831608038Z self.process_tasks(all_sorted_tasks)
2020-10-26T10:20:05.831611038Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 598, in process_tasks
2020-10-26T10:20:05.831614238Z self.process_pending_tasks(pending_tasks)
2020-10-26T10:20:05.831635138Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 511, in process_pending_tasks
2020-10-26T10:20:05.831639038Z self.start_task(task, rampart_group, task.get_jobs_fail_chain(), None)
2020-10-26T10:20:05.831642338Z File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 287, in start_task
2020-10-26T10:20:05.831645738Z match = group.fit_task_to_most_remaining_capacity_instance(task)
2020-10-26T10:20:05.831649238Z TypeError: fit_task_to_most_remaining_capacity_instance() missing 1 required positional argument: 'instances'
Hey @paulstaffs,
This was an oversight on our part from some recent optimizations to the task manager. It should be addressed in the next release (this PR contains the fix):
https://github.com/ansible/awx/pull/8457
https://github.com/ansible/awx/pull/8457/files#diff-a37220424979b0075fa6e25bf8c309d671f30641e64830c27cd046272c73a703R288
Going to close this as our downstream tests are now passing, and fix will go out in next release
I'm seeing this issue here on the latest pull from the devel branch. I'm still getting this error here below:
2020-12-08 17:05:28,678 DEBUG awx.main.dispatch task f800c829-7790-4052-b434-08b535ca89c4 starting awx.main.scheduler.tasks.run_task_manager([])
2020-12-08 17:05:28,680 DEBUG awx.main.scheduler Running Tower task manager.
2020-12-08 17:05:28,685 DEBUG awx.main.scheduler Starting Scheduler
2020-12-08 17:05:28,774 ERROR awx.main.dispatch Worker failed to run task awx.main.scheduler.tasks.run_task_manager([], *{}
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
result = self.run_callable(body)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
return _call(args, **kwargs)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/tasks.py", line 16, in run_task_manager
TaskManager().schedule()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 644, in schedule
self._schedule()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 632, in _schedule
self.process_tasks(all_sorted_tasks)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 598, in process_tasks
self.process_pending_tasks(pending_tasks)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 511, in process_pending_tasks
self.start_task(task, rampart_group, task.get_jobs_fail_chain(), None)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 287, in start_task
match = group.fit_task_to_most_remaining_capacity_instance(task)
TypeError: fit_task_to_most_remaining_capacity_instance() missing 1 required positional argument: 'instances'
Please advise on what I need to do to get this resolved. Thank you!
@emanuelferguson you might want to make sure you're actually deploying the latest AWX release. The line number you've referenced (287) doesn't match this source line in devel:
https://github.com/ansible/awx/blame/devel/awx/main/scheduler/task_manager.py#L288
Most helpful comment
Hey @paulstaffs,
This was an oversight on our part from some recent optimizations to the task manager. It should be addressed in the next release (this PR contains the fix):
https://github.com/ansible/awx/pull/8457
https://github.com/ansible/awx/pull/8457/files#diff-a37220424979b0075fa6e25bf8c309d671f30641e64830c27cd046272c73a703R288