Core: Monitoring Ubiquiti network devices with UniFi integration unstable since 0.110.x

Created on 23 May 2020 · 15Comments · Source: home-assistant/core

The problem

Since upgrading to Home Assistant 0.110, the UniFi device tracker integration reports Ubiquiti network devices going away regularly.

Environment

Home Assistant Core release with the issue: 0.110.1
Last working Home Assistant Core release (if known): 0.109.6
Operating environment (Home Assistant/Supervised/Docker/venv): Docker
Integration causing this issue: UniFi
Link to integration documentation on our website: https://www.home-assistant.io/integrations/unifi/

Problem-relevant `configuration.yaml`

none

Traceback/Error logs

no errors in the log

Additional information

Here is an illustration of what's been happening the last few days. The green vertical line indicates (by approximation) when I upgraded HA:

Schermafbeelding 2020-05-23 om 08 07 38 copy

The 6 Ubiquiti network devices go away and return home frequently, irregularly and independently; while 2 Raspberry Pi's, 2 smartphones and a tablet show no such issues.

I believe the issue is with the integration itself, and not the controller or the devices:

The UniFi controller itself does not report any disconnects on any of the 6 network devices (no alerts or events).
The wired and wireless devices connected to the 6 network devices do not experience any actual connection issues (nor does HA report them as going away when they are home).

Good to know:

I'm running my UniFi controller as a Docker container using this image on the same host as my HA container.
The UniFi controller was updated to the latest version (5.12.72) when I updated HA from 0.109.6 to 0.110.1.
Updating the firmware of all network devices (USG, 4 switches and AP) did not resolve the issue.

unifi

Source

fanaticDavid

👍3

Most helpful comment

i have the same issue with Unify's Controller Add-on

federom on 25 May 2020

👍2

All 15 comments

Hey there @Kane610, mind taking a look at this issue as its been labeled with a integration (unifi) you are listed as a codeowner for? Thanks!
_{^{(message by CodeOwnersMention)}}

probot-home-assistant[bot] on 23 May 2020

Enable debugging and share logs please

Kane610 on 23 May 2020

Working on that now with the following config:

logger:
  default: critical
  logs:
    aiounifi: debug
    homeassistant.components.unifi: debug
    homeassistant.components.device_tracker.unifi: debug
    homeassistant.components.switch.unifi: debug

Any suggestions on how to share the logs? This is generating A LOT of lines. Or am I looking for something specific?

UPDATE: I collected logs (14k+ lines) for about an hour and 15 minutes during which the following events occured:

01:26: device_tracker.router went away, and came home 4 seconds later
01:40: device_tracker.router went away, and came home 9 seconds later
01:45: device_tracker.router went away, and came home 10 seconds later
01:58: device_tracker.router went away, and came home 5 seconds later
02:16: device_tracker.router went away, and came home 7 seconds later
02:22: device_tracker.switch_livingroom went away, and came home 6 seconds later
02:24: device_tracker.router went away, and came home 4 seconds later

At 02:24, I had a ping going from my NUC (my Docker host for HA, UniFi and a bunch of other services) to my router. Not a single packet was lost:

158 packets transmitted, 158 received, 0% packet loss, time 749ms
rtt min/avg/max/mdev = 0.268/0.406/1.227/0.134 ms

My UniFi controller also has no alerts or events for any of these "disconnections". In short: HA sees my Ubiquiti network devices disconnect frequently while no actual network disruptions take place, and the UniFi controller is not reporting any problems. Any device connected to my Ubiquiti network and tracked by HA is not affected.

fanaticDavid on 24 May 2020

👍2

I have the exact same issues! Some things of note for me which are maybe slightly different.

It is _only_ the APs that are flapping home/not_home. I have 4. It started as soon as I upgraded to 110.1 (from 109) and still does it on 110.2.

I was able to 'resolve' it the first time, by restarting the controller on my server. I have a FreeBSD server that runs the controller software (same network, etc. All very close) which is independent of HASS. The server is perfectly fine, no issue with it. Everything runs as expected. When I check the UI, they are permanently available.

After another restart of HASS, only 2 APs were doing it.... I then upgraded to 110.2, so another restart and now only one AP is doing it, so it is very much all over the place.

edit: just a note that it seems FreeBSD12 only has the Unifi Controller version 5.12.66.0 currently available.

jurgenweber on 25 May 2020

Same issue here, running 0.110.2 supervised on debian, the controller is 5.12.72 (Build: atag_5.12.72_13103) and running on a Cloud Key Gen2 Plus (which shows no signs of problems).
5 switches and 2 access points flapping

Only happens to Ubiquiti equipment, all clients reporting normal
Last known working was 0.109.6

gribber on 25 May 2020

👍1

The logic on home/away setting of UniFi devices is that on a message UniFi describes when in time to expect next message. The Integration then adds 10 seconds on top of that to justify for any possible system load, but it appears to not be enough.

Could you guys enable debug logging and verify that the time strings that this is what's happening, maybe I should change it to 30 seconds or something

https://github.com/home-assistant/core/blob/59fe5458d0466c4d6de8ea7f94e6a668690c8f8f/homeassistant/components/unifi/device_tracker.py#L295-L305

Kane610 on 25 May 2020

The debug log (with logger settings from fanaticDavid above) when this happened:

May 25, 2020
11:25:16 AM switch_closet is at home
11:25:11 AM switch_closet is away

debug2.log

gribber on 25 May 2020

The logic on home/away setting of UniFi devices is that on a message UniFi describes when in time to expect next message. The Integration then adds 10 seconds on top of that to justify for any possible system load, but it appears to not be enough.

Could you guys enable debug logging and verify that the time strings that this is what's happening, maybe I should change it to 30 seconds or something

https://github.com/home-assistant/core/blob/59fe5458d0466c4d6de8ea7f94e6a668690c8f8f/homeassistant/components/unifi/device_tracker.py#L295-L305

Can you make it configurable? I wish I saw this 10 minutes earlier, I would of updated the file for you and tested. :0 Maybe tomorrow. Thanks

jurgenweber on 25 May 2020

i have the same issue with Unify's Controller Add-on

federom on 25 May 2020

👍2

I changed line 304 to add 30 seconds instead of 10, and then I restarted my docker container. In an hour or so I should be able to tell the difference, if any.

UPDATE: This is what happened during about 2 hours after making the change:

01:53: device_tracker.switch_office went away, and came back 9 seconds later
01:53: device_tracker.switch_utilityroom went away, and came back 3 seconds later
02:55: device_tracker.switch_utilityroom went away and came back within the same second

So it doesn't get rid of the problem entirely, eventhough I expected it to. However, it is still a significant improvement over the old situation. The entity for my USG device_tracker.router hasn't flapped even once in 2+ hours, and that was the network device that was affected the worst by far before making the change.

fanaticDavid on 26 May 2020

👍1

it resolved it for me but I wonder at the implications.

jurgenweber on 27 May 2020

Upping it to 30 (which I did about 25 hours ago) has definitely improved the situation for me, but it has not resolved it. All of my network devices still flap away/home at least a few times a day. Before HA 0.110.x, it was rock solid.

fanaticDavid on 27 May 2020

👍1

yeah, I hear you. It doesn't feel like a fix but a work around.

jurgenweber on 27 May 2020

👍1

Its strange, but in my case, since yesterday at around 9pm CST it has become stable with the 2 items i am monitoring...

federom on 27 May 2020

In the same boat here on 0.110.3 with 5 switches, 2 APs and a USG all bouncing between home and not_home. I'm seeing at least one device drop and return every minute or so.
All are assigned fixed IPs outside DHCP range and no events or interruptions showing in the controller logs.
Currently trying the code modification and a controller restart.
Edit: looks like making the code change has decreased the frequency of bouncing slightly but still occurring every 3 to 4 minutes.