Airflow: Celery autoscaling overrides normal worker_concurrency setting

Created on 21 Apr 2020  路  18Comments  路  Source: apache/airflow

Airflow 1.10.9
Celery 4.3.0

Setting Celery 'worker_concurrency' is overriden by the autoscaling no matter what you configure.

Problematic code is here:

https://github.com/apache/airflow/blob/6db66eaf927a50f2dcd85d0ae6033f7ea3699099/airflow/cli/commands/celery_command.py#L121

(this is head but it's the same bug in the 1.10.x version of this file)

You always get the default autoscaling setting with head if you don't set autoscaling at all but do set worker concurrency (which is 12 workers).

Not adding "autoscale" to the dictionary when autoscaling is not set is enough to solve this (at least, I tested the worker_concurrency half of things, and the asked for concurrency is back with that change).

bug

Most helpful comment

Autoscaling isn't supposed to be supported in Celery 4.x, so I wouldn't be surprised to see it behaving erratically anyways. I just was surprised to see it always defaulting to 12 workers even with "-c 1" :thinking:

All 18 comments

Thanks for opening your first issue here! Be sure to follow the issue template!

I just ran into this too. Airflow 1.10.9 / Celery 4.4.0.

This may be unrelated, but I also noticed that the worker wasn't actually autoscaling: worker_autoscale defaults to 16,12 (max,min), but concurrency stayed put at 12, even when there were tasks waiting with all dependencies met.

Autoscaling isn't supposed to be supported in Celery 4.x, so I wouldn't be surprised to see it behaving erratically anyways. I just was surprised to see it always defaulting to 12 workers even with "-c 1" :thinking:

Duplicate of https://github.com/apache/airflow/issues/8044

@jsmodic you are right, autoscaling is weird in Celery in 4.x and there are issues reporting this problem in Celery itself. If you want to perform production-grade autoscaling and you are using k8s cluster this may be interesting to you: https://www.astronomer.io/blog/the-keda-autoscaler/

Indeed -c 1 spawns 12 workers in 1.10.9 ... I tested it on master and it seems to be fixed (-c n spawns main celery process and n child processes). Same on 1.10.10:

root       412 48.3  2.0 660108 123128 pts/0   Sl+  14:07   0:02 [celeryd: celery@3715dff2587f:MainProcess] -active- (worker -c 2)
root       422 50.5  1.7 374028 107824 pts/0   S+   14:07   0:02 /usr/local/bin/python /usr/local/bin/airflow serve_logs
root       426  0.0  1.5 383524 92124 pts/0    S+   14:07   0:00 [celeryd: celery@3715dff2587f:ForkPoolWorker-1]
root       427  0.0  1.5 383528 92124 pts/0    S+   14:07   0:00 [celeryd: celery@3715dff2587f:ForkPoolWorker-2]

@jsmodic if you think we can close the issue - please do :)

@dimberman @ashb just an idea: maybe we should remove / deprecated the "celery autoscaling" option?

Isn't there no worker_autoscale option set by default:

# The maximum and minimum concurrency that will be used when starting workers with the
# ``airflow celery worker`` command (always keep minimum processes, but grow
# to maximum if necessary). Note the value should be max_concurrency,min_concurrency
# Pick these numbers based on resources on worker box and the nature of the task.
# If autoscale option is available, worker_concurrency will be ignored.
# http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html#cmdoption-celery-worker-autoscale
# Example: worker_autoscale = 16,12
# worker_autoscale =

So where is this default coming from?

So where is this default coming from?

I will take a closer look, but as reported on 1.10.9 changing concurrency of worker from cli level doesn't work so it's possible that the autoscaling option is also corrupted.

@turbaszek what celery version are you using? On 1.10.9 it starts up like this for me on Celery 4.3.0 with no autoscaling configured anywhere.

From Celery's stdout on startup:

[config]
app: airflow.executors.celery_executor:0x7fe57b350be0
transport: {my_transport}
results: {my_db}
concurrency: {min=12, max=16} (prefork)
task events: OFF (enable -E to monitor tasks in this worker)

On the machine:

ps aux | grep celery
[celeryd: celery@ip-my-ip:MainProcess] -active- (worker -q queue -c 4)
[celeryd: celery@ip-my-ip:ForkPoolWorker-1]
[celeryd: celery@ip-my-ip:ForkPoolWorker-2]
[celeryd: celery@ip-my-ip:ForkPoolWorker-3]
[celeryd: celery@ip-my-ip:ForkPoolWorker-4]
[celeryd: celery@ip-my-ip:ForkPoolWorker-5]
[celeryd: celery@ip-my-ip:ForkPoolWorker-6]
[celeryd: celery@ip-my-ip:ForkPoolWorker-7]
[celeryd: celery@ip-my-ip:ForkPoolWorker-8]
[celeryd: celery@ip-my-ip:ForkPoolWorker-9]
[celeryd: celery@ip-my-ip:ForkPoolWorker-10]
[celeryd: celery@ip-my-ip:ForkPoolWorker-11]
[celeryd: celery@ip-my-ip:ForkPoolWorker-12]

There's only 4 cores on that machine, that's what triggered me digging into how there could possibly be 12 workers.

@turbaszek what celery version are you using? On 1.10.9 it starts up like this for me on Celery 4.3.0 with no autoscaling configured anywhere.

I am testing this in breeze, and I've got celery==4.4.2.

There's only 4 cores on that machine, that's what triggered me digging into how there could possibly be 12 workers.

Of course, you can have more processes than cores. But it doesn't mean that everything will be executed at the same time :)

So where is this default coming from?

Well there's this...
https://github.com/apache/airflow/blob/6db66eaf927a50f2dcd85d0ae6033f7ea3699099/airflow/config_templates/config.yml#L1114-L1125

...but from what I can tell that file is only used for documentation, right?

I can't find anything about it in the Celery source either 馃

...but from what I can tell that file is only used for documentation, right?

Not exactly, this file is used to generate default_airflow.cfg :)

My airflow.cfg (and celery_config.py) doesn't reference autoscaling at all (since I haven't updated them really since 1.10.2 or so, before this feature landed in airflow). It's interesting you can't replicate the behavior though. When I test it with my config with and without 'autoscale' in the dictionary, it will show both the bad and then expected behavior.

(I'm not doing a very scientific test since I can't easily setup a clean environment, I'm editing the dist-package directly on a development machine)

It's interesting you can't replicate the behavior though.

@jsmodic I was able to verify that -w flag doesn't work on 1.10.9. I will take a deeper look at autoscaling over the weekend 馃憣

The autoscale config is commented out since 1.10.10 which is a change from 1.10.9 where it isn't and I think this accounts for the difference in the behavior that people are seeing.
https://github.com/apache/airflow/blob/1.10.9/airflow/config_templates/default_airflow.cfg#L480
https://github.com/apache/airflow/blob/1.10.10/airflow/config_templates/default_airflow.cfg#L516

Based on the comment above the setting, it makes sense that you wouldn't be able to set worker concurrency in 1.10.9:
If autoscale option is available, worker_concurrency will be ignored

As far as I tested it should be enough to remove 'autoscale' from the dictionary to be unpacked into Celery if autoscaling isn't configured in the airflow.cfg. But I haven't confirmed that works on other versions of Celery or how it behaves if you actually desire autoscaling.

@jsmodic from what I'm understanding you're editing this in the default configuration in airflow. From my understanding of it, I think without actually editing the package itself you are going to have trouble avoiding autoscale in 1.10.9 because it is set as a default. Given it is commented out in the package default config default_airflow.cfg as of 1.10.10, it is now up to the user whether they want to include it via airflow.cfg or by other means.

Following up on this, it's not ideal, but if in 1.10.9 you want to set a specific worker process count to, for example, 8 workers then just include:
worker_autoscale = 8,8
in your airflow.cfg.

As @eeshugerman pointed out, auto-scaling is not working so even if you set the upper bound to be something higher you would still sit at 8 workers. A caveat to this, I dug into auto-scaling not working and I think autoscaling is only broken in celery with the combination of settings: worker_prefetch_multipler=1 and task_acks_late=True
which are the settings in airflow. I submitted a patch to Celery that should fix this.

@turbaszek I think this can be closed, correct?

Was this page helpful?
0 / 5 - 0 ratings