Core: Random Docker Container Restarts.

Created on 7 Feb 2019  ·  45Comments  ·  Source: home-assistant/core

Home Assistant release with the issue:
0.87.0 (and previous version from at least 0.85.X)

Last working Home Assistant release (if known):
0.84.X

Operating environment (Hass.io/Docker/Windows/etc.):
Docker (Synology Disktation)

Component/platform:
unknown component

Description of problem:
Randomly my docker install of Home Assistant is crashing out, there seems to be no direct component or platform at fault. It started a few versions ago and has gradually got worse where it restarts multiple times a day. I thought it was related to the way docker logging works (causing memory leaks) based on other issues raised on GitHub but even with logging disabled the issue is still happening. Recently I am seeing the following traceback (which I can't find reference to anywhere else in the HA issues) in the container logs just prior to the container restarting.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):
n/a

Traceback (if applicable):

python: src/unix/core.c:898: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

Additional information:
I am also seeing the following in my logs which may or not be related.

Traceback (most recent call last):
15:40:49      File "/usr/local/lib/python3.6/socket.py", line 713, in create_connection
15:40:49        sock.connect(sa)
15:40:49    OSError: [Errno 9] Bad file descriptor
15:40:49    During handling of the above exception, another exception occurred:
15:40:49    Traceback (most recent call last):
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 1318, in do_open
15:40:49        encode_chunked=req.has_header('Transfer-encoding'))
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 1239, in request
15:40:49        self._send_request(method, url, body, headers, encode_chunked)
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 1285, in _send_request
15:40:49        self.endheaders(body, encode_chunked=encode_chunked)
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 1234, in endheaders
15:40:49        self._send_output(message_body, encode_chunked=encode_chunked)
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 1026, in _send_output
15:40:49        self.send(msg)
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 964, in send
15:40:49        self.connect()
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 1392, in connect
15:40:49        super().connect()
15:40:49      File "/usr/local/lib/python3.6/http/client.py", line 936, in connect
15:40:49        (self.host,self.port), self.timeout, self.source_address)
15:40:49      File "/usr/local/lib/python3.6/socket.py", line 721, in create_connection
15:40:49        sock.close()
15:40:49      File "/usr/local/lib/python3.6/socket.py", line 417, in close
15:40:49        self._real_close()
15:40:49      File "/usr/local/lib/python3.6/socket.py", line 411, in _real_close
15:40:49        _ss.close(self)
15:40:49    OSError: [Errno 9] Bad file descriptor
15:40:49    During handling of the above exception, another exception occurred:
15:40:49    Traceback (most recent call last):
15:40:49      File "/usr/local/lib/python3.6/site-packages/smart_home/__init__.py", line 30, in postRequest
15:40:49        resp = urllib.request.urlopen(req, params, timeout=timeout) if params else urllib.request.urlopen(req, timeout=timeout)
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 223, in urlopen
15:40:49        return opener.open(url, data, timeout)
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 526, in open
15:40:49        response = self._open(req, data)
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 544, in _open
15:40:49        '_open', req)
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _call_chain
15:40:49        result = func(*args)
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 1361, in https_open
15:40:49        context=self._context, check_hostname=self._check_hostname)
15:40:49      File "/usr/local/lib/python3.6/urllib/request.py", line 1320, in do_open
15:40:49        raise URLError(err)
15:40:49    urllib.error.URLError: <urlopen error [Errno 9] Bad file descriptor>
core docker stale

All 45 comments

The CPU spikes seen here seem to corespondent to the restarts:
screen shot 2019-02-07 at 17 54 41

There is a similar issue reported in MagicStack/uvloop#125

It suppose be fixed in uvloop 0.11.1

Could you try to execute pip freeze | uvloop to check your uvloop version.

Another workaround, you can try to uninstall uvloop by pip uninstall uvloop. HA will fall back to use default asyncio implement.

Ok so it wouldn’t let me check the version with the command given but when I attempted to uninstall it said uvloop 0.12.0.

Then maybe you can report back to uvloop, the issue has not been fixed 😄

I’ll remove and test. If my restarts are fixed I’ll 100% know.

Will the fallback cause me any issues/speed problems?

asyncio is fast, uvloop is super fast.

Out of interest was the second trace back related to the first?

Very likely, error no 9 means try to operate on a closed file/socket

e5bbeb9b-71f7-4de7-afde-e436c886815f

I think I can safely say that uvloop was the issues as since uninstalling I haven’t a had a single crash/restart.

How would you suggest I get this investigated with the uvloop team? Is it something that the home assistant developer community can put some ‘weight’ behind?

cc @pvizeli

hass.io is one depends on uvloop in HA universe.

My install is still rock solid since removing uvloop. Really hoping someone can take a look at this. Cc: @pvizeli

:) I use never the latest uvloop on Hass.io because they work every time unstable. @balloob should know that

@pvizeli thanks for the comment. @balloob
Could this be removed or rolled back to a stable build for future versions of HA?

Yes, we should track whatever Hass.io does. PR welcome. Make sure to add a comment to the code to skip .0 releases.

@awarecan is that something you can do or assist me with?

I'm sorry to say that since updating to 0.88.1 my random restarts are happening again and its still pointing to an issue with uvloop:

python: src/unix/core.c:898: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

I will let my docker container run as is today with the intention of testing then will remove uvloop as before and compare.

Could this be re-opened @balloob @pvizeli ?

If the problem remains after a downgrade of uvloop, it might be related to the Docker container switching to Python 3.7.2

Wait a second. The original issue was on Python 3.6 and uvloop 12.0.

In 0.88 we run on Python 3.7.2 and uvloop 11.3, and the issue still persists? That is weird, as that means that the same issue would be introduced by either upgrading Python or uvloop?

I can hear the cogs whirling from here 🤗

BTW I did remove uvloop again and all stable since.

On 88, do you see the same stacktraces?

So it looks like smart_home is the package causing the trouble. That seems to be imported by Netatmo. Can you disable Netatmo and see if it persists on 88?

Also, what is the host you run this on? Any upgrades to SSL recently?

Traceback (most recent call last):
15:40:49      File "/usr/local/lib/python3.6/socket.py", line 713, in create_connection
15:40:49        sock.connect(sa)
15:40:49    OSError: [Errno 9] Bad file descriptor

I am starting to think that this has to do with your host machine.

Oops wrong button

I’m running HA via Docker on a Synology DS918+, no SSL in place at the moment. As
I’ve removed uvloop from 0.88.1 is it best to wait for a new beta/release and test with Netatmo disabled or re-install uvloop (?) and test?

Ok just noticed 0.88.2 is out so disabled Netatmo in config and upgraded. Will run today and report.

Ok so just had my first restart not 1 hr into testing with the above changes 🤔

FYI I have multiple docker containers running with no issues, not had a single one restart unexpectedly apart from Home Assistant.

Due to all the random restarts my database is locked/corrupted (again) which then has a knock on effect for recorder, history and logger components at restart. Only way to resolve tomto delivery DB and start again 🤦🏼‍♂️

I'm starting to think more and more that it's related to your system. In 3 weeks, there has been no other reports. I don't know what it is about your system that is breaking it and it would be good to find out. I have however no further leads to follow.

I understand. I will keep investigating. I’m thinking of automating the removal process of uvloop via home assistant itself after every update 😂

I have also met this issue on 0.91.0 and DSM docker.

I don’t seem to be experiencing the issue anymore (fingers crossed) since 0.91.x... has anything changed regarding uvloop in these releases?

I seem to also be experiencing this issue with 0.93.2, docker image sha256:fef041df9d48b6fc193d420dd1483ccd4d05cbe427aea43c43a8944c2d83411d

OSError: [Errno 9] Bad file descriptor

That means, there is to many loads for the kernel settings. You can change descriptor settings with sysctl.

I also experience constant crashes on 0.93.2

Happens within 12 hours runtime in docker on QNAP NAS.

Last entry in the logs is:
python: src/unix/core.c:898: uv__io_stop: Assertionloop->watchers[w->fd] == w' failed.`

yeah you need update ulimits for descriptors.

@pvizeli Is that something that hassio should manage by adding --ulimit args on the homeassistant docker start instead of asking end-users to do it?

@joerocklin that is some things that the host need handle. In must case HassOS. Maybe we can update the install script to set this on host? The parameter don't allow to use more as the host support, default all what is possible? You can try it out, if that will help, we can add a hack but at the end not sure if that work if the kernel has also not adjusted the settings

0.93.2 here with uvloop 0.12.2 on python 3.7.3

Trying to uninstall uvloop from the container to see if it helps avoid the crashes.

Have 14 other containers running without any issues whatsoever.

Have not had these issues since migrating to 0.93.2 and setting up Lovelace cards.

UPDATE: And now crashed in less than 6 hours :-(

@msj33 have you definitely confirmed that uvloop is uninstalled ? I have to go this on every release and it stops the crashes for me.

@aptonline: Entered the container console and did a uninstall of uvloop via pip, which was confirmed succesfully - Exited the container and here 6 hours later it crashed with this last entry in the logs:
python: src/unix/core.c:898: uv__io_stop: Assertion loop->watchers[w->fd] == w' failed.`

Starting the container have spawned uvloop again(of course) - Have just now removed uvloop again, and it does no longer appear from 'pip list installed'

The uptime of container itself seem to vary a bit - Anything specific I could monitor on the side with cadvisor or similar(RAM, CPU etc.)?

@aptonline: Entered the container console and did a uninstall of uvloop via pip, which was confirmed succesfully - Exited the container and here 6 hours later it crashed with this last entry in the logs:
python: src/unix/core.c:898: uv__io_stop: Assertion loop->watchers[w->fd] == w' failed.`

Starting the container have spawned uvloop again(of course) - Have just now removed uvloop again, and it does no longer appear from 'pip list installed'

The uptime of container itself seem to vary a bit - Anything specific I could monitor on the side with cadvisor or similar(RAM, CPU etc.)?

If you uninstall uvloop via pip while homeassistant is still running, uvloop is still loaded into the container's memory. So it was never truly removed, thats why you saw the error

Yep, that seems to be the case........so the remove uvloop "workaround" is not really applicable on docker.

Anything I can do from here to isolate the issue further? Please advice, thx

Just adding in here.
Docker on unRAID, HA 0.94.0 dev0
uvloop 0.12.2
python 3.7.3
crash with
python: src/unix/core.c:898: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

neonandu picture neonandu  ·  3Comments

sibbl picture sibbl  ·  3Comments

kirichkov picture kirichkov  ·  3Comments

i-am-shodan picture i-am-shodan  ·  3Comments

moskovskiy82 picture moskovskiy82  ·  3Comments