Airflow: Negsignal.SIGKILL error on macOS

Created on 20 Aug 2020  路  16Comments  路  Source: apache/airflow

Apache Airflow version:
1.10.9

Environment:
macOS Catalina, 10.15.6

What happened:
When I am running my workflow in Apache Airflow using docker (especially using Local Executor) I am sometimes getting this error: INFO - Task exited with return code Negsignal.SIGKILL
This usually appears when the task consumes a lot of CPU resources.
I have tried to set the limit in docker-compose yaml file for CPUS and memory but it did not help.
The same process works fine on Windows/Linux machines of my colleagues.

What you expected to happen:
I would like to understand why it happens only on macOS and how can I handle that.

How to reproduce it:
I can share the code that is always failing with the above issue when I am using Local Executor but it just works when I switch to Sequential Executor.

Anything else we need to know:

core bug pending-response

All 16 comments

Thanks for opening your first issue here! Be sure to follow the issue template!

I am running into a similar issue with Airflow 1.10.10 using Docker on macOS.

One update:
My process does not always work, even with Sequential Executor. There are 1-2 tasks that usually fail and return code Negsignal.SIGKILL. Please help as I can't run this process on my local machine due to that error.

I'm running into the same issue using astronomer's local dev environment, which uses Airflow 1.10.7, on macOS Catalina 10.15.6.
I'm going to test other versions of airflow supported by astronomer.

Updating Airflow to 1.10.10 seems to have resolved the issue.

I have upgraded to 1.10.12 and still the same issue :(

Update:
This process does not work on Windows using docker. So it seems that it only works on Linux.

I encountered this issue - again - on another pipeline, using Airflow 1.10.10. Increasing docker resources resolved it.

We have same behavior.
Our environment: k8s, airflow 1.10.12 (in 1.10.9 was such error as well)
Sometimes watching: Task exited with return code Negsignal.SIGKILL

I have the same issue. My airflow(1.10.12) is running on AWS ECS(Fargate).
INFO - Task exited with return code Negsignal.SIGKILL

Experiencing the same on 1.10.12 running on AWS ECS(EC2).

@mariuszliksza-kinesso, Can you please share the procedure to reproduce this issue? Including the code.

This is also happening to me on AWS EC2 instances with airflow 1.10.12, ubuntu 18.04.5. It's not consistent but I've just started seeing it with some newly written tasks that use high %s of CPU & GPU for sustained periods. Will hunt for something that can reproduce this as I go on.

Are you also using a lot of memory? On Linux check dmesg or syslog and see if the OOMKiller is killing processes.

That's likely - indeed, switching to an EC2 instance with more RAM and running fewer concurrent tasks seems to have stopped it so far. Unfortunately I can't confirm since the logs are now gone, but I bet you're right. TIL about overcommit! https://www.etalabs.net/overcommit.html

I'm going to close this then -- if you have anyone has any more details we can look in to this -- commend (probably mention me directly) and I'll re-open this ticket.

Was this page helpful?
1 / 5 - 3 ratings

Related issues

ephraimbuddy picture ephraimbuddy  路  3Comments

JavierLopezT picture JavierLopezT  路  4Comments

hagope picture hagope  路  4Comments

Milchdealer picture Milchdealer  路  4Comments

mik-laj picture mik-laj  路  4Comments