I sometimes get (when quitting supervisor with CTRL+C) the following trace:
^C2018-06-13 22:32:21,219 WARN received SIGINT indicating exit request
2018-06-13 22:32:21,220 INFO waiting for web to die
2018-06-13 22:32:21,270 INFO stopped: web (exit status 0)
2018-06-13 22:32:21,270 INFO Closing socket tcp://0.0.0.0:8000
Exception ignored in: <bound method tail_f_producer.__del__ of <supervisor.http.tail_f_producer object at 0x7f599b85e2e8>>
Traceback (most recent call last):
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/http.py", line 649, in __del__
self._close()
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/http.py", line 675, in _close
self.file.close()
OSError: [Errno 9] Bad file descriptor
my supervisor config file:
[supervisord]
logfile = /tmp/supervisord.log
[inet_http_server]
port = 127.0.0.1:9001
[supervisorctl]
serverurl = http://127.0.0.1:9001
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[fcgi-program:web]
# Patched gunicorn supporting fcgi-program: https://github.com/benoitc/gunicorn/pull/1809
command = python -m gunicorn.app.wsgiapp --reuse-port --workers 4 --bind fd://0 test:application
directory = %(here)s
socket = tcp://0.0.0.0:8000
I cannot yet reliably reproduce :/
Duplicate of #856
It looks like this error comes from a different source than #856. Reopening.
I cannot yet reliably reproduce :/
From the traceback "bound method tail_f_producer.__del__", it looks like you were using supervisorctl tail -f before or during the crash. Perhaps try using that to reproduce.
Yes, I played with tail/maintail (and -f in both cases too). Seems to be timing sensitive too?
I managed to get more traces:
Exception ignored in: <bound method tail_f_producer.__del__ of <supervisor.http.tail_f_producer object at 0x7f7c7785cba8>>
Traceback (most recent call last):
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/http.py", line 649, in __del__
self._close()
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/http.py", line 675, in _close
self.file.close()
OSError: [Errno 9] Bad file descriptor
2018-06-13 22:54:52,298 INFO spawned: 'web' with pid 31532
Traceback (most recent call last):
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py", line 108, in emit
self.flush()
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py", line 67, in flush
self.stream.flush()
OSError: [Errno 9] Bad file descriptor
2018-06-13 22:54:52,518 CRIT uncaptured python exception, closing channel <POutputDispatcher at 140172558026008 for <Subprocess at 140172557991776 with name web in state STARTING> (stderr)> (<class 'OSError'>:[Errno 9] Bad file descriptor [/home/florian/.local/lib/python3.6/site-packages/supervisor/supervisord.py|runforever|218] [/home/florian/.local/lib/python3.6/site-packages/supervisor/dispatchers.py|handle_read_event|270] [/home/florian/.local/lib/python3.6/site-packages/supervisor/dispatchers.py|record_output|204] [/home/florian/.local/lib/python3.6/site-packages/supervisor/dispatchers.py|_log|173] [/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py|info|327] [/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py|log|345] [/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py|emit|226] [/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py|doRollover|263])
Traceback (most recent call last):
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py", line 108, in emit
self.flush()
File "/home/florian/.local/lib/python3.6/site-packages/supervisor/loggers.py", line 67, in flush
self.stream.flush()
OSError: [Errno 9] Bad file descriptor
I managed that by running:
supervisord in one shellsupervisorctl maintail -f in another (restart it when it ends, see next one)supervisorctl reload till the error occursWas this issue fixed or not yet?
https://github.com/Supervisor/supervisor/blob/master/supervisor/loggers.py#L73-L86
https://github.com/Supervisor/supervisor/blob/master/supervisor/loggers.py#L73-L86
It looks like those lines were added in 2014, but this issue was opened in 2018. No release version has run under Python 3 yet, so it would seem that it's still an issue unless @apollo13 was running a years-old git checkout recently.
I was running this with the latest master at the time of reporting.
Thanks for the clarification.
Note: This message was edited several times to fix some of my mistakes.
I managed to reproduce this while performing the same maintail and reload operations as @apollo13 described.
The exception happens and is ignored in both Python 3 and Python 2 (without trying to ^C supervisord).
Exception IOError: (9, 'Bad file descriptor') in <bound method tail_f_producer.__del__ of <supervisor.http.tail_f_producer instance at 0x7f8b2c69b0e0>> ignored
This traceback isn't a real one that's stopping the program. Python 3's output of ignored exceptions includes the traceback, and Python 2 does not.
I suggest fixing the bad file descriptor issue, which is unrelated to the python version. This can either be done by investigating the root of the error, or for now, just catching OSError while attempting the close the file in _close().
@salty-horse - if I read your comments correctly, it appears that this is NOT a python3 issue, but simply an issue where Python3 provides visibility to an existing "bug" -- in quotes as, it's not clear that it is a bug per-se --
If that is accurate, then this should not hold up a release that supports Python3, IMO.
As far as I know, this is the only remaining issue that is holding up such a release
I think what's happening is that two processes both try to close the same file descriptor. (Perhaps the main one and the one doing the tail. I didn't look into it.) This exception can safely be caught and ignored, as what it's trying to accomplish is already done.
I think I understand what's happening. When supervisor does reload it calls cleanup_fds which closes all file descriptors. Now depending on how Python does garbage collection, __del__ of tail_f_producer might be to late and therefore throw "Bad file descriptor".
So one option would be to just try/catch the error and ignore it. Another option would probably be do del Serveroptions at the end of the while 1 in https://github.com/Supervisor/supervisor/blob/a0ee8f1026c929ae4d9fc84741924414e8008f49/supervisor/supervisord.py#L345-L363
and then tell python to do garbage collection.
I sadly can no longer reproduce, but if you wanted @salty-horse , you could add this:
del options
import gc
gc.collect()
and see if the error is gone.
It can be trivially shown that my idea might be true, it is easy to trigger a similar error in python:
>>> import os
>>> test = open('/tmp/test_file', 'w')
>>> os.close(test.fileno())
>>> test.close()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor
I do not agree with @salty-horse that this error can be safely caught and ignored, in the worst case (if gc happens too late), supervisor did already open a new file with that file descriptor and then the tail producer would close the wrong file!
Edit, reproducer that shows what could happen if we ignore this error:
>>> test = open('/tmp/test_file', 'w')
>>> print(test.fileno())
3
>>> os.close(3)
>>> test2 = open('/tmp/test_file2', 'w')
>>> print(test.fileno())
3
>>> test.close()
>>>
As you can see test.close() now closes test2
Another option would be to just drop cleanup_fds during supervisor reloads -- if the supervisor instance does indeed leak fds, this is something that should be fixed anways :D maybe @mnaberez has more insights into this.
Also I think that
https://github.com/Supervisor/supervisor/blob/a0ee8f1026c929ae4d9fc84741924414e8008f49/supervisor/process.py#L284-L285
could be disabled on Python3 -- Python 3 does not set file descriptors as inheritable by default anymore. This means that the new process should not see those parent fds from http servers etc…
What about a model where only one of the processes is allowed to close FDs and the other processes merely mark FDs to be closed?
@cclauss This is all inside a single process...
Another option would be to just drop cleanup_fds during supervisor reloads -- if the supervisor instance does indeed leak fds, this is something that should be fixed anways :D maybe @mnaberez has more insights into this.
I don't, unfortunately. I can say that cleanup_fds existed since the earliest versions. I'm not aware of supervisord leaking FDs under normal conditions (or I'd try to fix it). There may be some conditions when it could leak an FD, e.g. an exception occurs before a close is reached, but is caught in the main loop and execution continues. It is probably good insurance to leave cleanup_fds around so a reload ensures that supervisord closes any FD it may have opened.
Most helpful comment
Note: This message was edited several times to fix some of my mistakes.
I managed to reproduce this while performing the same
maintailandreloadoperations as @apollo13 described.The exception happens and is ignored in both Python 3 and Python 2 (without trying to
^Csupervisord).This traceback isn't a real one that's stopping the program. Python 3's output of ignored exceptions includes the traceback, and Python 2 does not.
I suggest fixing the bad file descriptor issue, which is unrelated to the python version. This can either be done by investigating the root of the error, or for now, just catching
OSErrorwhile attempting the close the file in_close().