core 🚀 - Tradfri lights stop working after a couple of hours

I've been having the same issue.

I started looking into it and found it to be an issue with state observation, the lights are still able to be controlled via the light.* services but the actual state of the light is never updated in HA.

DanNixon on 12 May 2018

👍5

I've managed to work around it by putting the container in Host network mode, so I suspect it is port related?

winterscar on 12 May 2018

Odd, I would have expected that to be the solution to it not working at all, rather than a somewhat intermittent issue.

Plus I already have my container in host network mode.

DanNixon on 12 May 2018

another strange thing:
using the tradfri groups (created in the app) works when switching on/off it switches on/off all containing lights, but when switching the lights in the group, it doesn't flip the group (in Hassio .68.1)

creating the groups in hassio manually in the config files, containing the same lights, works just fine, both ways.

Mariusthvdb on 12 May 2018

Mine gets slow to react after a while, but if I then toggle one light (and it takes up to 10 sec to react) it's fast again after that.

sveip on 12 May 2018

👍1

So my issues also seemed to be network related.

I originally had my gateway on a powerline Ethernet adapter positioned sort of central on the property, I've recently moved it so it is connected directly to the same switch as the server running HA. Due to some legacy mains wiring powerline adapters are pretty hit and miss in this house so I'm assuming the frequent network dropouts were likely the issue previously.

Right now all my lights have been working fine for a couple of days without having to restart either HA or the Tradfri gateway.

DanNixon on 14 May 2018

I'm not sure we're experiencing the same issue, as my network configuration is (Hass server) --> ethernet --> (switch) --> ethernet --> (switch) --> tradfri hub. So I've got a pretty much direct connection between the server and the hub.

Is there a good way to monitor what a docker container is doing network wise? Like wireshark or something?

winterscar on 14 May 2018

I agree, it looks to be something different. It was just the observed effect that was very similar.

I think Wireshark may be the easiest option, I'm not sure Docker has anything built in for network monitoring or not.

DanNixon on 14 May 2018

Can confirm I'm also having this issue. Not using Docker. Looking to help if possible

ngdio on 30 May 2018

I provided my debug log for this here:
https://github.com/home-assistant/home-assistant/issues/14577#issuecomment-392935265

IVI053 on 1 Jun 2018

I am experiencing the same since 0.55 and have described it in #9822. I have a very hacky workaround for this issue, which has been working reliably for me: https://github.com/home-assistant/home-assistant/issues/9822#issuecomment-357539835

max-te on 29 Jul 2018

👍2

Have not been experiencing this issue since I switched from Arch without venv -> openSUSE with venv. Might be a dependency issue, that would also be a bug though, so I'll keep this open

ngdio on 29 Jul 2018

Seeing this issue as well on macOS, no docker container. It typically takes 10+ hours after a hass restart before I see it. My workaround is even cruder than @max-te , I restart hass daily with a script...

morberg on 31 Jul 2018

This issue still persists. Did someone happen to find a workaround (besides restarting HA frequently)?

dirkam on 30 Aug 2018

I gave up, switched to Deconz, which is better on stability.

Den tor. 30. aug. 2018 kl. 06:18 skrev Zs notifications@github.com:

This issue still persists. Did someone happen to find a workaround
(besides restarting HA frequently)?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-417184879,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFceLwcZw2xRmpWX-aSFc_MMrkqLJ52vks5uV2eUgaJpZM4T7hni
.

sveip on 30 Aug 2018

@sveip Can you please elaborate on this? How did you make it work?

dirkam on 30 Aug 2018

I'm not using the IKEA gateway anymore. I bought the
https://www.dresden-elektronik.de/conbee/ USB zigbee stick, and installed
Deconz (sw). There is an add-on for Deconz, so it should be easy to
install. I run it stand-alone. You then pair all the IKEA lights and
switches to Deconz instead of the IKEA app.

Den tor. 30. aug. 2018 kl. 10:17 skrev Zs notifications@github.com:

@sveip https://github.com/sveip Can you please elaborate on this? How
did you make it work?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-417231893,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFceL6_kXDqad3YRnPI_4IyCbjDY3sdRks5uV5-mgaJpZM4T7hni
.

sveip on 30 Aug 2018

I see, thanks. I hope that this issue can be fixed with the IKEA GW, too. Seems to be a common problem that everyone has.

dirkam on 30 Aug 2018

How did you install Home Assistant? The issue disappeared for me when I switched from the Arch AUR package to a virtualenv installation.

ngdio on 30 Aug 2018

Tried hassio and hassbian.

dirkam on 30 Aug 2018

@ngdio I'm running a virtualenv setup and still experiencing this problem :-(

IVI053 on 9 Sep 2018

It might also be related to the Linux distribution (and its packages) you're running Home Assistant on. I switched from Arch ARMv7 to openSUSE aarch64 and the problems were gone.

ngdio on 13 Sep 2018

For reference, I was running Ubuntu Server 18.04.1 LTS and seeing no problems.

winterscar on 13 Sep 2018

I'm running Debian Stretch with this problem.

IVI053 on 21 Sep 2018

@sveip How has the states been with the Deconz way for you? I have the same issue and contemplating doing the same.

I'm not using the IKEA gateway anymore. I bought the https://www.dresden-elektronik.de/conbee/ USB zigbee stick, and installed Deconz (sw). There is an add-on for Deconz, so it should be easy to install. I run it stand-alone. You then pair all the IKEA lights and switches to Deconz instead of the IKEA app.

TaroAM on 21 Sep 2018

I ended up using the workaround from @max-te described in #9822. Works fine, though it requires a reboot every several days, so I added an automation rule, which reboots the host if memory usage is above 80 percent.

dirkam on 21 Sep 2018

Deconz works well for lights and switches. The dimmer I've not had great
success with.

Peter

Den fre. 21. sep. 2018 kl. 10:45 skrev TaroAM notifications@github.com:

@sveip https://github.com/sveip https://github.com/sveip How has the
states been with the Deconz way for you? I have the same issue and
contemplating doing the same.

I'm not using the IKEA gateway anymore. I bought the
https://www.dresden-elektronik.de/conbee/ USB zigbee stick, and installed
Deconz (sw). There is an add-on for Deconz, so it should be easy to
install. I run it stand-alone. You then pair all the IKEA lights and
switches to Deconz instead of the IKEA app.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-423460741,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFceL2qadj2Afd5oIuUkHMnDOCE_8aH5ks5udKcugaJpZM4T7hni
.

sveip on 21 Sep 2018

I'm running Ubuntu Server 18.04.1 LTS as my host and then using the official docker container to run Home Assistant.

I get the same problem, typically after 30-60 minutes. Tried @max-te's fix and it worked although I have had a few stability issues in terms of RAM/CPU usage since I added it, so I'm still testing.

alexhardwicke on 5 Nov 2018

Also having this problem, docker on ubuntu, 0.82.1. Have to restart in order to get tradfri lights responding/updating again.

comatose-tortoise on 25 Nov 2018

Same issue here. The feedback from the Trådfri hub stops working. Setting lights via scripts from Home Assistant keeps working, the issue is just that the UI that prevents it since it's stuck in the wrong state (can't turn on a light it believe is on).

I have tried to set duration=3600 (and 600) in light/tradfri.py and switch/tradfri.py. It did not help me. I didn't manage to get the Tradfri module to start with the full patch from @alex3305. But if I didn't miss anything, it was just three values changed from 0 to 3600, besides using a define for it.

Example of my updated light/tradfri.py
$ diff -u /srv/homeassistant/lib/python3.5/site-packages/homeassistant/components/light/tradfri.py custom_components/light/tradfri.py --- /srv/homeassistant/lib/python3.5/site-packages/homeassistant/components/light/tradfri.py 2018-11-17 10:51:27.324283838 +0100 +++ custom_components/light/tradfri.py 2018-11-24 19:57:53.246888134 +0100 @@ -126,7 +126,7 @@ try: cmd = self._group.observe(callback=self._observe_update, err_callback=self._async_start_observe, - duration=0) + duration=600) self.hass.async_create_task(self._api(cmd)) except PytradfriError as err: _LOGGER.warning("Observation failed, trying again", exc_info=err) @@ -345,7 +345,7 @@ try: cmd = self._light.observe(callback=self._observe_update, err_callback=self._async_start_observe, - duration=0) + duration=600) self.hass.async_create_task(self._api(cmd)) except PytradfriError as err: _LOGGER.warning("Observation failed, trying again", exc_info=err)

magma1447 on 25 Nov 2018

@magma1447 According to @max-te you will also have to manually set a repeat trigger. I am still unsure if that is the case. But it seems according to your experience that is an issue.

I am currently wondering if it is an issue regarding the recently added switches to either Home Assistant or pytradfri.

Edit: I also created a dirty workaround:

- id: restart_home_assistant
  alias: Home Assistant restart
  trigger:
    platform: state
    entity_id: script.sleeping
    to: 'off'
  condition:
    condition: or
    conditions:
      - condition: state
        entity_id: switch.tradfri_1
        state: 'on'
      - condition: state
        entity_id: light.tradfri_2
        state: 'on'
  action:
    service: homeassistant.restart

When sleeping is set to off, it checks whether some of the lights/switches are still on. If that is the case, than Tradfri observations are most likely stuck and will restart Home Assistant. An action that I will otherwise do manually.

alex3305 on 25 Nov 2018

Could you please provide all of your system details (operating system, docker/venv/os package)? This issue most likely occurs only in certain environments but right now it's not really clear when exactly this is the case.

ngdio on 25 Nov 2018

@ngdio Debian Stretch x64 (KVM instance), Python virtual environment. I am happy to provide more information in case someone knows what to ask for. While I have been working with Linux for almost 20 years, I am not too familiar with Python and its virtual-env.

Home Assistant 0.82.1, if I recall correctly I was running 0.78 before this, with the same issue.

Adding some more information after reading @alex3305 post (after this one)
My Trådfri hub is quite new (summer 2018). I have 12 spotlights, 2 normal bulbs, 4 panels, no switches.
I always have the UI open on my workstation. I tend to turn on/off 12 spotlights at the same time. I don't know if it breaks when doing that, but it seems like @alex3305 suspects that.

@alex3305 Thanks, I will try to implement it the way @max-te did. I will get back with the result when I know if it worked. If not for anything else, as information for those that has the issue as well.

magma1447 on 25 Nov 2018

@ngdio HASS.io manual installation here on 0.82.1 (edit: on RasPi). I have bought the Tradfri hub in the last month and I have both lights and sockets connected to the hub.

Clues I could find regarding this issue so far are:

When this occurs, the log only shows decrypt_verify(): found 8 bytes cleartext in the logging. This message is from the underlying aiocoap library, but it seems no actual data is transfered anymore
Forcing a state through scripting, automation or service call still works. But the change isn't reflected in the UI
It mostly occurs when I change multiple lights/switches at once. For instance when using an automation
It seems (but I am not sure about this) that the bug does not trigger when I do not have the UI open.

Things that I could think of that can go wrong:

The number of file descriptors in Linux is reached, thus killing connections?
General bug with observations?
Bug with using both switches and lights?
The error callback is not called when something goes wrong?
No error is given, thus the error callback is never triggered?
A bug in aiocoap? But there wasn't any release for over a year

Anyway, quite hard to figure this one out. But glad to help!

alex3305 on 25 Nov 2018

👍1

Could this be related to the amount of updates sent to the lights? I see this problem when using the Flux component (which updates the lights quite often), but when I turn off Flux I can’t reproduce the issue.

morberg on 25 Nov 2018

@morberg I don't have any technical insight. But besides using 12 of my spotlights at the same time, from time to time, we seldom turn on or off our lights at all.

Right now I am testing the patch by @max-te. But after that I could very well pull the power from my 12 spots to see if that helps.

magma1447 on 25 Nov 2018

@ngdio The issue has persisted for me across several (x86) machines running Docker, most recently under Arch Linux. It also persisted through the replacement of multiple pieces of network equipment to the point where I'm confident to say that the only constants were my Tradfri hub and the fact that I'm using Docker on an x86 system.

max-te on 25 Nov 2018

@magma1447 It's only in my observation that it (mostly) breaks when operating mulitple lights at once. Of course I could be wrong.

For troubleshooting you can always try putting some additional debug logging or Python print() functies inside the _observe_update and/or _async_start_observe functions. But I suspect those will not be updated anymore when this issue occurs.

@morberg Sure. Because of how the current observation model works. Currently for every update a controller (ie. Home Assistant) sends, the observe callback will be triggered. But after some testing and debugging, I found that a single operation could easily trigger three observation events. Those events are then also passed through to the operator.

That's why I've submitted PR ggravlingen/pytradfri#208. This will at least check whether the current state changes. If the state has changed, the operation will be passed back to the operator. But when the state hasn't changed, the event will be silently dropped. Which would eliminate multiple, equal updates.

alex3305 on 25 Nov 2018

👍1

Small update for all the watchers. I've issued a new pull request. Which adds async locks on the places were multiple threads could possibly manipulate the state of the Tradfri objects. I've tested this change for more than an hour with mulitple browser sesseions, while operating almost all the lights in my home at the same time.

Since the change I could not reproduce my issues anymore. So I would like to ask if some of you may want to test these changes? This would be quite easy as you can just add the Python files in the custom_components directory.

alex3305 on 25 Nov 2018

👍2

just as a small but wary side note: I'm not experiencing any of these issues at all, and Tradfri has been rock solid in my setup like forever, both in the old setup, and now with the integration. Using many lights, of all available types, using the outlet switches and 3 types of remotes, and the motion sensors.

Not sure what you're fixing here, but sure hope you're not fixing anything that isn't broken....
If I could assist by checking anything in my setup please let me know, be glad to.

Mariusthvdb on 25 Nov 2018

@Mariusthvdb It would be great if you can run the above modifications on your own installation, just for testing.

I am currently running my modifications for about 12 hours now. It is working great so far and I did not have this issue at all anymore. Even when extensively operating lights and switches. Last night when I went to bed and operated all my switches and lights at the same time from an automation, one light got stuck. But that resolved itself after about half an hour or so. Also my unavailable lights now show up correctly (unavailable) instead of always being 'turned on'. So it seems that even another issue can be resolved with my change.

cc. @magma1447 @winterscar could you also test this change?

alex3305 on 26 Nov 2018

well, tbh, before testing any modifications, Id need to have symptoms of some sorts... My Tradfri Hub is rock solid, and never hangs. For almost over a year now. With many bulbs, sensors, and switches lately.
So testing this wouldn't improve on anything would it? I would not have any way to notice I am afraid. Thats why I asked not to change anything to a code that is actually working just fine.
It might well be something else in your configuration?

About the unavailable lights: Would indeed be nice, if Tradfri would indicate the lights to be unavailable automatically. I now have a very simple automation that takes care of that for me, but a native action would be preferable.

Mariusthvdb on 26 Nov 2018

@alex3305 I will definitely test it. Currently running the other patch and it has worked fine over night. I will replace that patch with yours later today. I just want to run the other one a bit more to be (more) sure of my result.

From what you have written with the duplicate packages coming, it sounds reasonable to add locks. I have high hopes that you managed to figure it out.

Maybe it's less/more likely to happen depending on how high the latency for the Tradfri network is. But if so, I believe it should happen to everyone, sooner or later.

magma1447 on 26 Nov 2018

I have the same issue. Lights not turning on when they have been turned off (after some hours. Reboot helps) unless change of brightness.

Running latest home assistant on docker and trådfri gw.
Raspberry pi 3. With rasbian stretch lite.

krito on 26 Nov 2018

I've been having the same problems (typically after 30-60m), plus as I previously mentioned, when using @max-te's fix, I get massive CPU spikes fairly frequently and every few hours home-assistant consumes 100% CPU until eventually the process ends (and I've tested not having the fix and the spikes disappear again). I'm honestly not sure I'd not rather just auto--restart every 30m than have the CPU fan come on at full speed and have HASS freeze up for about 5-10 minutes before restarting.

I've been running @alex3305's fix for about 16 hours now and everything is working flawlessly, and the CPU spiking is completely gone. It's fairly obvious from the attached picture when I moved from @max-te's to @alex3305's fix. Seems pretty flawless for now.

alexhardwicke on 26 Nov 2018

@Mariusthvdb You can also test without any symptons if you would like to. That way we can verify if the PR doesn't have any side effects... But if you don't want to, I completely understand.

@alexhardwicke Great to hear!

alex3305 on 26 Nov 2018

@alexhardwicke The fix from @max-te was quite aggressive with its 2 minute timer. While I didn't notice any cpu spikes while have it running, shutting Home assistant off after almost 24 hours took quite some time. It seemed to clean up a lot of lost objects (or similar). It even took 2-3 minutes on my virtual machine on a dual xeon server.

I have been running @alex3305 latest patch for an hour now. This far no issues. And the concept of his patch really makes sense to me, if the assumptions around it are true (I am definitely no expert). The patch from @max-te was more of a fast hackish workaround I would say.

magma1447 on 26 Nov 2018

@magma1447 Yeah. I've been running in a VM too, on my gaming PC rather than a traditional server, so I've had a fairly beefy CPU available (although I only assigned it one physical core).

I did try with 1 hour instead of 2 minutes and still had frequent spiking. Very strange. I suppose it doesn't matter why now that there seems to be a more "correct" fix.

alexhardwicke on 26 Nov 2018

In my instance @alex3305 patch does not prevent the problem, I just had it happen again.

max-te on 26 Nov 2018

@max-te Are you sure you are using the correct version? Yesterday I saw the same behaviour, but it sorted itself out after about half an hour. So if you can just wait it out, that would be great.

alex3305 on 26 Nov 2018

Sadly @alex3305 I have the same to report. After about 19-20 hours it's stopped responding. :(

(I won't restart and will see if it resolves itself)

alexhardwicke on 26 Nov 2018

I'm using the version from #18708. I restarted the system now but I'll try waiting it out next time.

max-te on 26 Nov 2018

@alexhardwicke @max-te Just for troubleshooting:

Is your Tradfri logging only outputting: decrypt_verify(): found 8 bytes cleartext?
Does service calls, automations or scripts keep working? ie. can you still 'force' some state upon a Tradfri device?
What happens when you force refresh your browser(app)? Does the ability to control come back?
Does this work for you:
1. Make a light unavailable (physically unplug)
2. Switch the light on/off a couple of times in Home Assistant
3. Force refresh the page
4. Does it report unavailable?
Does the state in Home Assistant update after operating the light in the official Tradfri app?

I suspect that the _refresh() method in the module is still bugging out, since I didn't implement any object locking there.

alex3305 on 26 Nov 2018

Although the stability improved for me __a lot__, after about 24 hours the fun was over... And the Tradfri integration had crashed again 😢 ... $#@!

I have found another clue though. When the gateway integration crashed, the aiocoap library in pytradfri only outputs decrypt_verify(): found 8 bytes cleartext, which suggests that only a CHANGED response has been received from aiocoap.

I want to try out a way to reset the APIFactory (and thus the observations) completely when only this response is coming trhough. That will at least reset the connection when something fails. Maybe @lwis can chime in on this?

Thanks everybody for trying and giving feedback on this issue! I really hope we can resolve this together!

alex3305 on 27 Nov 2018

There is no Tradfri output in my home-assistant.log. Service calls keep working, in the state panel and in templates the light keeps its old state. Refreshing the page does not help. Unplugging the light has no effect at all. Changes made in the app aren't reflected either.

I just realized I'm still on 0.82.0 but none of the changes in 0.82.1 should affect this.

max-te on 27 Nov 2018

I have found another clue though. When the gateway integration crashed, the aiocoap library in pytradfri only outputs decrypt_verify(): found 8 bytes cleartext, which suggests that only a CHANGED response has been received from aiocoap.

I want to try out a way to reset the APIFactory (and thus the observations) completely when only this response is coming trhough. That will at least reset the connection when something fails. Maybe @lwis can chime in on this?

I have seen a problem when running api_factory.shutdown() in pytradfri. Can this be related?

morberg on 27 Nov 2018

@morberg That's why I've opened ggravlingen/pytradfri#208. But good catch!

alex3305 on 27 Nov 2018

@alex3305 decrypt_verify(): found 8 bytes cleartext it would be handy if aiocoap (and/or DTLSSocket) could use this to work out if the connection has ended.

@max-te the non-observation requests are not made over the same connection as observations, so they will continue to work.

lwis on 27 Nov 2018

@lwis I'm aware. I just wanted to cover all of @alex3305 questions.
Where are you guys seeing the decrypt_verify(): found 8 bytes cleartext output? I have nothing like that in my home assistant log.

max-te on 27 Nov 2018

@lwis I am currently testing something based on top of my PR. I can already detect whether or not a connection is ended and I am currently working on a test to restart a connection when that has happened.

@max-te It is only showing up in the Docker logging, but not in the UI.

alex3305 on 27 Nov 2018

@alex3305 I have now been running https://raw.githubusercontent.com/home-assistant/home-assistant/479da89a7dab8ac88e9696e3b06142d9be6864fe/homeassistant/components/light/tradfri.py for a bit over 24 hours, no issues this far. But I haven't used the lights more than normal other. Aka, I haven't been trying to provoke issues. But I don't think it ever survived this long without your patch, for me.

UPDATE: 17 hours later, not working anymore.

magma1447 on 27 Nov 2018

@magma1447 Unfortunately the PR was closed before it could get merged. IMHO the locks should improve stability, but maybe I am just wrong on that. Still a Python noob ;) But glad I can help.

I (again) submitted a PR, which @lwis suggested. I try to check whether or not a response is received. If there is no response, the observation will be reset. I've tried it out for a couple of hours and seems to be working great so far, but than again, it seems that testing period is a bit too short anyways...

Ref. ggravlingen/pytradfri#212

Edit: I will not let this one go before I got a resolution 😄

alex3305 on 27 Nov 2018

❤2 👍2

For those who are interested. I have been working and testing out various fixes for this issue the last few days, and I now have a version that seems to be holding up. It's still rudimentary and can be improved upon, but it is something. The Tradfri component will restart observations after 5 missing responses, which seems to be a good magic number.

On my Pi the Tradfri connections have been stable for more than a day now (36h at least) and they have reset three times till now. But they always come back. CPU usage on my Pi is around 1-2%, so no odd spikes there.

__Example of a observation restart__

2018-11-30 22:15:40 WARNING (MainThread) [custom_components.pytradfri.api.aiocoap_api] Observation failed... Failed: 2 of 5
2018-11-30 22:15:41 WARNING (MainThread) [custom_components.pytradfri.api.aiocoap_api] Observation failed... Failed: 3 of 5
2018-11-30 22:15:43 INFO (MainThread) [homeassistant.components.http.view] Serving /api/history/period/2018-11-29T21:15:43.138Z to 127.0.0.1 (auth: True)
2018-11-30 22:15:44 WARNING (MainThread) [custom_components.pytradfri.api.aiocoap_api] Observation failed... Failed: 4 of 5
2018-11-30 22:15:45 WARNING (MainThread) [custom_components.pytradfri.api.aiocoap_api] Observation failed... Failed: 5 of 5
2018-11-30 22:15:45 ERROR (MainThread) [custom_components.pytradfri.api.aiocoap_api] Resetting observations...
2018-11-30 22:15:45 WARNING (MainThread) [homeassistant.components.sensor.tradfri] Observation failed for TRADFRI remote control
NoneType: None

__CPU Usage__

_the spikes are from a manual Home Assistant restart_

I've uploaded my current (modified) version to: https://github.com/alex3305/hass-components-debug. Copying all these files to the custom_components directory and restarting Home Assistant should do it. There is still some debug logging in there, but this version seems very stable for me, so please try and test it out!

alex3305 on 30 Nov 2018

👍2 👎1

I had this issue too, exactly how various have described it on here. Reading through the threads I started to see a theme, around networking.
My old setup which worked flawlessly was HASS on venv atop of an Ubuntu VM. I switched out to HASS.io running on docker on a CentOS 7 VM and this is when I started to have issues.

Seeing about connection drops, I started to look at what was in the path, aside docker... i knew I had firewalld on my CentOS box, so I have disabled that, and so far it’s been stable for 12 hours! I’ll keep monitoring, but so far it’s been much more stable than the 1-2 hours I had before.

mitsumaui on 1 Dec 2018

After working and testing some new changes out for over a week now, I've came to the conclusion that using the response of a request was not stable enough to observe when an observation has died. I've since moved on to testing with timeouts and that is showing much promise so far.

What I'm currently doing is storing the last updated time from an observation. When Home Assistant fires a request, I will check whether the last updated time was more than 15 seconds ago. If it is, all observations will be reset. 15 seconds is quite liberal, but should be enough for testing.

It's looking promising for now (again at the 36 hours mark!) and I am hoping anyone also wants to test it. Testing is still the same as in my last post (so copying the custom_components from https://github.com/alex3305/hass-components-debug). Again there is still some debug logging in there, but it is annotated with # FIXME comments, if you want to delete those.

Regarding @mitsumaui I'm not sure about that. I don't have any firewall running on my Pi. Also Docker is neatly forwarded in iptables and sustained connections aren't blocked. Furthermore I think that the Tradfri lib should return gracefully when something dies on the other end.

@morberg I've also fixed an issue where the stack piles up all observations, which causes all the error messages you reported earlier. Now it only throws a single error with some warnings. This logging seems to be a bug in the underlying aiocoap lib and cannot be solved easily in Pytradfri.

__edit:__ One small issue persists, that I will look into later (tomorrow probably). When forcefully setting a value upon a device (ie. an automation or service call) the observations will probably reset. This is not a really big deal, but is still a small bug.

cc. @magma1447

alex3305 on 5 Dec 2018

👍1

It's looking promising for now (again at the 36 hours mark!) and I am hoping anyone also wants to test it. Testing is still the same as in my last post (so copying the custom_components from https://github.com/alex3305/hass-components-debug). Again there is still some debug logging in there, but it is annotated with # FIXME comments, if you want to delete those.

I've tested this now for about 48 hours, running on docker centos with net host. Usually tradfri would stop working after roughly 12 hours, so far it's stable. Will let you know if anything changes.

renoutg on 6 Dec 2018

In my repo (https://github.com/alex3305/hass-components-debug) I've just removed the possibility to send duplicate commands to the backend of Tradfri. This happens when Home Assistant will do a service call (ie. from an automation) and the device is already in that state. For instance when you turn a light on, which is already on. Since the light will already report it's own state (through callback), this would make the implementation more stable, and also allows fully checking whether or not a callback has been missed.

This is the same issue I was talking about yesterday in my other reply and I will try to take this setup through the weekend. If it survives till than, I will put in a PR at pytradfri. If there are any issues you can open a issue in that repo or reply here.

@renoutg Thanks for testing, greatly appriciated!

alex3305 on 6 Dec 2018

HI again!

Please allow me this question, since I fear you are trying to remove a functionality which is much needed, at least as far as I understand your remarks above:

This happens when Home Assistant will do a service call (ie. from an automation) and the device is already in that state. For instance when you turn a light on, which is already on. Since the light will already report it's own state (through callback)

we need this functionality to be able to change lights while being on, by means of a scene call for example, with or without automation.

Example: image lights being on.
now need to change the lights to a different scenery setting in RGB, Brightness or what have you. And call:

- name: Home theater on
  entities:
    light.kist:
      state: on
      transition: 4
      brightness: 125
      hs_color: [14,83]
    light.home_theater:
      state: on
      transition: 4
      brightness_pct: 40
      rgb_color: [255,188,85]
    light.office_outside: on
    light.gym_audio:
      state: on
      transition: 4
      brightness_pct: 40
    light.lounge_chair_long:
      state: on
      transition: 4
      brightness_pct: 40
      hs_color: [21,65]

In short, we need te be able to turn_on a light to change its settings. Being Off And On
If I read your wording in the post above, this wouldn't be possible any longer..?

Also let me again emphasize you are maybe experiencing issues, not related to pytradfri, because no matter what I do, the Tradfri is as stable as a rock here, and believe me, I have had serious lighting issues before. They were related to Hue lights showing unavailable all the time. Tradfri has always been rock solid. If that is not the case in issues mentioned here, have you tried looking elsewhere in the setup?
Don't change a perfectly fine working component before being 100% certain it is the component causing your issues.

thanks for considering

Mariusthvdb on 6 Dec 2018

❤1 👍1

@alex3305 - I did eventually run into the same issue again, but it was much more prolonged than before - like 5 days later. I'm going to try your custom component to see how that goes.

@Mariusthvdb - Agree, that is a very good point. Hopefully @alex3305 can consider / address that. It should not matter repeating commands, it never has in the past, the Tradfri hub should be able to address that and return the correct state.

mitsumaui on 6 Dec 2018

❤1

@Mariusthvdb Thanks for your concerns. That's why I've been communicating and asking others to test my input. Instead of just going off on my experiences 😄 .

Regarding the issue about sending the same commands again, on/off was probably a poor example by me. I was trying to explain it a bit in layman terms instead of cutting to the chase. As English is not my primary language - and currently don't use it very often in writing - I will try to explain what I meant.

Currently in the Tradfri lib when you send a service call (or automation or script), for instance:

light.lounge_chair_long:
  state: on
  brightness: 128

It will actually send two commands in a single requset. It will send (pseudo code):

command = {'state': on, 'brightness': 128}
pytradfri.request('put', command)

What my change does, is filter out the duplicates. So when current brightness is like 10 and the light is on, the 128 will still come through, but the state will not. So what this does is:

command = {'state': on, 'brightness': 128}
command = filter_duplicates(command)
# command is now: {'brightness': 128} and state is filtered out
pytradfri.request('put', command)

So should the current state (in Pytradfri / HASS) is not correct, the full command will be send and that should not affect any current behaviour.

The reason I made this change, is that repeated commands do not trigger a observation callback. So when you send out exactly the same command twice, the second time the Tradfri hub will not respond with a state change. Because it does not respond, detecting whether or not the observation has timed out is impossible.

As far as you think these issues aren't related to pytradfri, they in fact are. Currently there is no detection in there to detect timed out connections, and IMHO that should be a feature of the library. Or at least throw an error when that happens. Home Assistant in turn will retry the connection when an error is thrown. That's why I think that is a valid use case to implement. Also, like I said earlier, I really want to keep testing this change before doing a proper PR.

Hopefully that eases your mind a little 😄

@mitsumaui Thanks for testing! Hopefully you can come up with some proper results.

alex3305 on 6 Dec 2018

I've updated my custom component. I accidently broke transitions in the pytradfri lib. Within the duplicates check I only handled already existing state values, such as on/off or current brightness. But transitions are an additional value on top of the current state. This version also passes through all values that are currently unknown in the state, such as transitions.

cc. @mitsumaui

alex3305 on 7 Dec 2018

I tried this yesterday and saw some transition problems I never got around to reporting. Running it now it seems a bit better, but I still get some error. I reported it here to avoid spamming this thread.

Btw, I really appreciate the effort you are putting in. I've had so much problems with tradfri - in particular in a docker container - that I've stopped using hass altogether. If this fixes the problems I will be happy to start using hass again.

morberg on 7 Dec 2018

Yes, thank you for the time you are spending.
I had the same problems running rasbian but when moving to hass.io the
problems were gone. What is the difference? Is it the distro?

Regards Kristofer

On Fri, 7 Dec 2018, 20:28 Niklas Morberg <[email protected] wrote:

I tried this yesterday and saw some transition problems I never got around
to reporting. Running it now it seems a bit better, but I still get some
error. I reported it here
https://github.com/alex3305/hass-components-debug/issues/2 to avoid
spamming this thread.

Btw, I really appreciate the effort you are putting in. I've had so much
problems with tradfri - in particular in a docker container - that I've
stopped using hass altogether. If this fixes the problems I will be happy
to start using hass again.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-445339506,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE3sb1gO3RKjfGuK5rA6kgkOdKh_9cZiks5u2sFXgaJpZM4T7hni
.

krito on 7 Dec 2018

👍1

Just loaded this for testing! Will let you know.

Agree with @krito - Appreciate all the hard work on this!

mitsumaui on 10 Dec 2018

👍1

Hey @alex3305 - I'm sorry to say the experience now is much worse than previously.
I installed the custom component, and now the state update is inconsistent.

So the 10:15 event is when the lights were turned off from HASS last night, then I turned on a bulb this morning from HASS at 05:33. Both these events updated the state fine. Since then the bulbs were toggled on / off multiple times using Tradfri remotes. None of these state changes have been recorded into HASS. Then when I tried toggling from HASS at 20:08 it then hung up for a while and eventually pulled new states. This is not how it used to work. I used to be able to toggle either internal or external to HASS and the states would follow instantly, and never time out. Since migrating to HASS.IO I've started to loose states. It was out days before updating, but since using the custom_component it's been a matter of hours.

Happy to help provide any further data to troubleshoot this!

Here's my logs:

Tradfri response_time -0.054440
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Already checking for observations...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Already checking for observations...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Already checking for observations...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Resetting Tradfri observations...
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Tradfri response_time 52117.468968
8:08 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Error doing job: Fatal error on transport TCPTransport (error status in uv_stream_t.read callback)
6:48 PM /usr/local/lib/python3.6/site-packages/homeassistant/core.py (ERROR)
Invalid service data for group.set: extra keys not allowed @ data['entity_id']. Got 'group.lr_ceiling' extra keys not allowed @ data['state']. Got 'on' required key not provided @ data['object_id']. Got None
3:56 PM core.py (ERROR)
Error doing job: Fatal error on transport TCPTransport (error status in uv_stream_t.read callback)
6:00 AM /usr/local/lib/python3.6/site-packages/homeassistant/core.py (ERROR)
Tradfri response_time 0.000306
5:39 AM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
5:39 AM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Tradfri response_time -0.016521
5:33 AM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Updating time...
5:33 AM custom_components/pytradfri/api/aiocoap_api.py (WARNING)
Error doing job: Fatal error on transport TCPTransport (error status in uv_stream_t.read callback)
December 10, 2018, 10:20 PM /usr/local/lib/python3.6/site-packages/homeassistant/core.py (ERROR)
Updating time...
December 10, 2018, 10:15 PM custom_components/pytradfri/api/aiocoap_api.py (WARNING)

mitsumaui on 11 Dec 2018

@mitsumaui If you can also post this to: https://github.com/alex3305/hass-components-debug I can further assist you with this, without unnecessarily spamming this thread. Thanks. Most messages you have posted are my (personal) debug messages and only have the Warning level associated, because I'm too lazy to change my configuration.yaml everytime 😄

__edit:__ Additional question: do you have Tradfri groups enabled?

I'm currently experiencing a rock solid setup for more than 4 days now. But I am only use Home Assistant for state updates. The observation reset only hit the this afternoon, instead of every 12 hours. But I want to try and iron out all these issues before opening a PR. Thanks for all the feedback!

alex3305 on 11 Dec 2018

❤1

@alex3305 thanks for your work!
I have rpi3+ with raspbian + home assistant on virtualenv. Tradfri hangs every few hours. I've tested your changes in repo https://github.com/alex3305/hass-components-debug
And Hass is now resetting observations once or twice per minute. I hope it'll work Rock solid.

KrzysztofHajdamowicz on 5 Jan 2019

@alex3305 I hope you haven't lost track and/or hope. I did for a while. But earlier this week I came back to look into if anything had happened. Well, not much had happened the last weeks. I then downloaded the components from https://github.com/alex3305/hass-components-debug to try it out.

I have been using it for 3.5 days now. Trådfri is still working. Which is much better than the usual "a few hours". I'll keep my fingers crossed.

magma1447 on 19 Jan 2019

👍1

@alex3305 tanks for your work. WIthout yout patch tradfri is nearly useless within home assistant. With your patch, i'm able to use it for more than a weak. Status updates stopped working after an hour, but with an "Trigger" tradfri got reset and working again...

raptor2101 on 23 Jan 2019

👍1

I'm using my version of the pytradfri library since about mid december and it's working great so far without any changes to the code. My HASS install is currently at the latest 0.86.3 and not having any issues. About once a day the observations are being reset, but I'm not really noticing any of it at all.

alex3305 on 2 Feb 2019

@alex3305 I've been trying your fix for the last few days. It is definitely better, but in my case sometimes the lights stuck in "ON" state. E.g. if I have lights ON, and just happen to turn off some, then sometimes one of these stuck and still show that the light is ON. All controls/automations work fine.
After some time the issue is resolved and in the logs it shows WARNING (MainThread) [custom_components.pytradfri.api.aiocoap_api] Resetting Tradfri observations...
Also, just while I was typing this I noticed that it also shows wrong states sometimes. E.g. at this very moment I have a light that is ON, but the UI shows that it is OFF.
Not sure, but I believe that it was like something like this:

Light1 and Light2 is ON, UI shows ON for both
Light1 turned OFF, Light2 still ON, UI shows ON for both
After some time Light1 gets updated on the UI and shows OFF
Probably along with this, or sometime later (might be related to the observations being resetted) Light2 on the UI also shows OFF, while it is actually still ON - this issues doesn't seem to resolve itself

I think I only had issues with lights that are ON for a long time. E.g. I have an automation for a light with motion sensor, which automatically turns the lights ON and OFF (usually this means that the light is ON only for a few minutes on avarage) and I never noticed this issue with this, UI always shows the good state. However, with lights that are used usually for hours I can see this issue more often than not.

PS: This is different from the initial issue, as with that the issue happened for me after a few minutes and the states were all wrong. E.g. if I turned ON some of the lights the UI would show the ON state and after a second or so and would just switch back to OFF.

dirkam on 5 Feb 2019

I'll also add my $0.03:
This fix works, but Tradfri still sucks. My ELK stack receives "Receiving Tradfri observations" every minute: Discover 2019-02-04T23_00_00.000Z - 2019-02-05T22_59_59.999Z.csv.zip

Also, bridge crashes every 1-5 days and needs a hard reboot because even Tradfri app can't connect to it.
I'm considering moving bulbs to a Philips Hue (which I don't want because Philips can pull the plug in any FW update) or powering Tradfri bridge through smart plug to write hass automation to reset it every day at 4AM.

KrzysztofHajdamowicz on 5 Feb 2019

I would recommend checking out Deconz or zigbee2mqtt if you want local,
stable alternatives to the Ikea hub.

On Tue, Feb 5, 2019, 11:02 KrzysztofHajdamowicz wrote:

I'll also add my $0.03:
This fix works, but Tradfri still sucks. My ELK stack receives "Receiving
Tradfri observations" every minute: Discover 2019-02-04T23_00_00.000Z -
2019-02-05T22_59_59.999Z.csv.zip
https://github.com/home-assistant/home-assistant/files/2831598/Discover.2019-02-04T23_00_00.000Z.-.2019-02-05T22_59_59.999Z.csv.zip

Also, bridge crashes every 1-5 days and needs a hard reboot because even
Tradfri app can't connect to it.
I'm considering moving bulbs to a Philips Hue (which I don't want because
Philips can pull the plug in any FW update) or powering Tradfri bridge
through smart plug to write hass automation to reset it every day at 4AM.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-460578878,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFceL5pxa9tT9JNMaM_GgVVDKqGvMvRYks5vKVawgaJpZM4T7hni
.

sveip on 5 Feb 2019

👍1

@KrzysztofHajdamowicz you don't need a smart plug for this. This one-liner will reboot the gateway:
coap-client -m post -u "your_client_id" -k "your_key" "coaps://your_ip_or_fqdn:5684/15011/9030"

dirkam on 5 Feb 2019

👍3

@KrzysztofHajdamowicz I don't know about your network topology, but I've noticed that the Tradfri gateway is way more stable when using a static IP address using DHCP reservations. Maybe that's something that you can try?

alex3305 on 6 Feb 2019

Hello @alex3305,
Just to tell you that I cloned your hass-component-debug custom-components and my Ikea Tradfri setup is finally a lot more reliable !

Thank you for the work you have done and please ask if you need any info, test, logs. I will do my best to provide you feedback.

Regards,

Laurent

lbrichet on 11 Feb 2019

👍1

To provide some insight into the issues, I have found that in my case after that time period the connections between tradfri and home assistant are being blocked with a ICMP 403 error and iptables is logging a "FINAL_REJECT" for the packets

ioangogo on 17 Feb 2019

@ioangogo blocked by whom?

lwis on 17 Feb 2019

@lwis On my system it seems to be a IP tables rule setup by firewalld but i cant find out anymore information or why it takes so long for this issue to manifest itself

Edit: Its not iptables its netfilters UDP timeout setting

ioangogo on 17 Feb 2019

Im going to try adding the coaps port to the firewall

Edit: So even adding coap to the firewall it seems to still close the listening port prematurely

ioangogo on 17 Feb 2019

Feb 19 08:25:03 TVPC kernel: FINAL_REJECT: IN=enp8s0 OUT= SRC=192.168.1.58 DST=192.168.1.11 LEN= TOS= PREC=0x00 TTL= ID= DF PROTO=UDP SPT=5684 DPT= LEN= so this is in the kernel logs from my system

Quick question, people with and without the issue could you run

sudo sysctl net/netfilter/nf_conntrack_udp_timeout and sudo sysctl net/netfilter/nf_conntrack_udp_timeout and also what distro are you on.

Also to reduce spam, please only share if it is not the same as this:

$ sudo sysctl net/netfilter/nf_conntrack_udp_timeout   
net.netfilter.nf_conntrack_udp_timeout = 30
$ sudo sysctl net/netfilter/nf_conntrack_udp_timeout_stream
net.netfilter.nf_conntrack_udp_timeout_stream = 180

If the output you get is the same :+1: this Comment(so we can keep count of who has the same values) if it is the same, but you are not having this issue react with :eyes:

ioangogo on 20 Feb 2019

👍4

@ioangogo Thanks for your input and additional investigation. I don't have any firewalls running on my system (neither firewalld or ufw). IPTables report that everything is set to forwarding, even in the Docker subnet, so that should be correct too. I'm not sure if those netfilter rules are part of this issue, since these timeouts are set in seconds, while my connection drops after a few hours / days.

I'm not sure if there is any way to check the elapsed time on an udp connection with something like netstat. Googling for a bit returned no relevant results to me. Maybe it's something to explore?

alex3305 on 24 Feb 2019

I found that doing sudo cat /proc/net/nf_conntrack | grep "<gateway IP>" works

ioangogo on 27 Feb 2019

👍1

Have I missed something, or why is this closed? Is this resolved or is there another ticket about it somewhere else? I haven't seen anything in Home Assistant's changelog on this topic.

We are still plenty of users with a Trådfri implementation that doesn't work at all. It's a lot better with Alex's patch though.

A comment of why the issue was closed would have been handy.

magma1447 on 22 Mar 2019

👍5

I guess they are closing the issues, as they have no clue. In my case this is not a firewall issue, everything runs internally and always have. It started after 0.88 for me.

ktpx on 22 Mar 2019

@ktpx you're not far off, none of us can reproduce the issue. There are issues on pytradfri which track the issue.

As it's unlikely to be something caused by pytradfri (or downstream libraries) this issue can probably be closed.

lwis on 22 Mar 2019

We should keep it open until we know the cause or it's solved. Otherwise there will just be more issues opened here. Better to keep one open.

For the record I'm also experiencing this.

MartinHjelmare on 22 Mar 2019

@ktpx you're not far off, none of us can reproduce the issue. There are issues on pytradfri which track the issue.

As it's unlikely to be something caused by pytradfri (or downstream libraries) this issue can probably be closed.
Well it's not IKEA..that works flawlessly. must be hass related.

ktpx on 23 Mar 2019

I'll also add my $0.03:
This fix works, but Tradfri still sucks. My ELK stack receives "Receiving Tradfri observations" every minute: Discover 2019-02-04T23_00_00.000Z - 2019-02-05T22_59_59.999Z.csv.zip

Also, bridge crashes every 1-5 days and needs a hard reboot because even Tradfri app can't connect to it.
I'm considering moving bulbs to a Philips Hue (which I don't want because Philips can pull the plug in any FW update) or powering Tradfri bridge through smart plug to write hass automation to reset it every day at 4AM.

I have no issues with the IKEA GW, the app works fine, and i rarely need to reset it.

ktpx on 24 Mar 2019

As an alternative, I recently ditched my Hue and Tradfri hubs in favor of directly connecting to the bulbs via the ZHA component. It works really well and I'm happy to report I no longer have any other hubs in the mix, everything is directly connected to HA.

robbiet480 on 25 Mar 2019

As an alternative, I recently ditched my Hue and Tradfri hubs in favor of directly connecting to the bulbs via the ZHA component. It works really well and I'm happy to report I no longer have any other hubs in the mix, everything is directly connected to HA.

What zigbee stick do you use for this? Haven't been able to find one..

comatose-tortoise on 25 Mar 2019

Hi,
I'm having the same sintomes , lack of connectivity after a few hours to GW hub
At the moment, everytime i reboot HA the bulbs are not available and then need to restart HA service again for them to be available.
it just started to happen after i start using the component instead the configuration.yaml setup.
any thoughts ?

migromao on 26 Mar 2019

@comatose-tortoise HUSBZB-1

robbiet480 on 28 Mar 2019

Tradfri is basically useless in homeassistant now. If you have xiaomi gw they should support tradfri bulbs now, but i havent tried, and really do not want to give the chinese goverment more than i have to ;)

ktpx on 28 Mar 2019

On the flip side - after I rebuilt my HASS docker environment back to Ubuntu, my TRÅDFRI setup has been rock solid.
Also moved the hub onto a different VLAN from HASS and no I’ll effects, it’s certainly network / socket based to an extent.

mitsumaui on 28 Mar 2019

I automated a restart of HA every night, TRÅDFRI has been working since then. Not optimal, but at least functionality is back.

comatose-tortoise on 28 Mar 2019

👍1

Latest failure was this morning, so definitely still happening on my installation (HomeAssistant 0.90.1, on HassOS 2.10, Raspberry Pi 3B+). I'm not sure how to figure out what version of pytradfri that involves, but if someone can explain how to pull that out from the docker container, I'll send that as well.

michthom on 31 Mar 2019

I had a feeling that it was 0.90.2 was the cause for my problems with tradfri, i downgraded to 0.90.1 again, but the problem persists. I did not have the issue when 0.90.1 was the current version. this is the log from my Home Assistant.
home-assistant.log
Restarting HA and Tradfri hub fixes the issue as always.

EDIT:
So far so good with 0.91. will have to wait and see how it behaves now.

erazor666 on 3 Apr 2019

Moved the tradfri GW to the switch downstairs which is same as hass server, and stuff seems better. No idea why that would help as these are unmanaged dumb switches, or if just unrelated.

ktpx on 5 Apr 2019

Still seeing this issue on 0.91.3 - after about a day or two Tradfri integration just stops working. Not sure if related but after few days I am also experiencing lag (about 3-10s) with all MQTT based devices.

From logs I can see following exception when Tradfri stops working,

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 95, in _get_response
    r = await pr.response
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 816, in _run_outer
    yield from cls._run(app_request, response, weak_observation, protocol, log, exchange_monitor_factory)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 865, in _run
    blockresponse = yield from blockrequest.response
aiocoap.error.RequestTimedOut

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 155, in request
    result = await self._execute(api_commands)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 114, in _execute
    await self._observe(api_command)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 172, in _observe
    pr, r = await self._get_response(msg)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 100, in _get_response
    await self._reset_protocol(e)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 78, in _reset_protocol
    await protocol.shutdown()
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 133, in shutdown
    for exchange_monitor, cancellable in self._active_exchanges.values():
AttributeError: 'NoneType' object has no attribute 'values'

slvwolf on 22 Apr 2019

👍1

Hi,

i've tried several potential fixes and the only one that is really working is this one:

https://github.com/home-assistant/home-assistant/issues/9822#issuecomment-357539835

In my opinion this should be merged into the official tradfri component.

Regards
Richard

darootler on 22 Apr 2019

@darootler that fix works. But it causes a huge memory leak within Home Assistant. Merging it wouldn't be the smartest thing to do.

I've made a different version which does about the same thing. The main difference is that Tradfri only times out when it cannot be reached. This will also cause a memory leak, but will be way smaller, since my timeout is larger. This is also a 'workaround' and not a permanent solution. A permanent fix should be implemented in the Python CoAP library though.

Also after looking into the Tradfri Android app it seems that the connection is reset every time the app is re-opened. That doesn't seem like a good solution for Home Assistant, because of delays and automations.

I'm also looking into a Zigbee USB stick because of several issues with both Tradfri and Xiaomi devices.

alex3305 on 23 Apr 2019

@alex3305

I've already tried your version and the problem is that the observations will only be reset if you switch a light on or off. So the states reported in Home Assistant aren't correct and differ from the states in the IKEA app. This will result in false historical data and it very confusing if the light is on and Home Assistant is reporting that the light is off.

Regards
Richard

darootler on 23 Apr 2019

@alex3305

I'm also looking into a Zigbee USB stick because of several issues with both Tradfri and Xiaomi devices.

How do you pair a Floalt Panel mounted on the ceiling with Home Assistant?

Regards

darootler on 23 Apr 2019

You bing one of them to the other and pair them? I did that, used an USB
powerbank to power the Rpi that has the Conbee attached.

tir. 23. apr. 2019 kl. 16:58 skrev darootler notifications@github.com:

@alex3305 https://github.com/alex3305

I'm also looking into a Zigbee USB stick because of several issues with
both Tradfri and Xiaomi devices.

How do you pair a Floalt Panel mounted on the ceiling with Home Assistant?

Regards

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-485839765,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABLR4LYDR2H4KNZDVRILD2LPR4PZFANCNFSM4E7ODHRA
.

sveip on 24 Apr 2019

You bing one of them to the other and pair them? I did that, used an USB powerbank to power the Rpi that has the Conbee attached. tir. 23. apr. 2019 kl. 16:58 skrev darootler notifications@github.com:
…

After pairing with Home Assistant, are you able to control the Floalt panel with the remote controller and Home Assistant?

Regards
Richard

darootler on 24 Apr 2019

Yes, you pair the remote and panel separately first, then link then with
touchlink afterwards.

ons. 24. apr. 2019 kl. 21:06 skrev darootler notifications@github.com:

You bing one of them to the other and pair them? I did that, used an USB
powerbank to power the Rpi that has the Conbee attached. tir. 23. apr. 2019
kl. 16:58 skrev darootler [email protected]:
… <#m_4657191271845820129_>

After pairing with Home Assistant, are you able to control the Floalt
panel with the remote controller and Home Assistant?

Regards
Richard

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-486385247,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABLR4LYYPJJTHVK7CHBTX3DPSCVT3ANCNFSM4E7ODHRA
.

sveip on 24 Apr 2019

Yes, you pair the remote and panel separately first, then link then with
touchlink afterwards.

ons. 24. apr. 2019 kl. 21:06 skrev darootler notifications@github.com:
…

Could you please explain that a little bit more? I wasn't able to pair the Floalt panel with HA.

1.) Pair the remote with the Floalt Panel --> Done
2.) Pair the Floalt panel with Home Assistant --> How does this work?

Am i missing anything here?

Regards
Richard

darootler on 25 Apr 2019

Did you reset it? 6 times on and off. Refer to the manual.

On Thu, Apr 25, 2019, 20:13 darootler notifications@github.com wrote:

Yes, you pair the remote and panel separately first, then link then with
touchlink afterwards.

ons. 24. apr. 2019 kl. 21:06 skrev darootler [email protected]:
…

Could you please explain that a Little bit more? I wasn't able to pair the
Floalt Panel with HA.

1.) Pair the remote with the Floalt Panel --> Done
2.) Pair the Floalt panel with Home Assistant --> How does this work?

Am i missing anything here?

Regards
Richard

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386#issuecomment-486781196,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABLR4L3Q56H7RJFY3O4M5XDPSHYDDANCNFSM4E7ODHRA
.

sveip on 25 Apr 2019

Did you reset it? 6 times on and off. Refer to the manual.
…

Thank you, got it now.

Regards

darootler on 25 Apr 2019

I can confirm that this issue still occurs in 0.92.1, running on the official docker image.

I can reproduce the error, get the same error messages as @slvwolf in my logs and can confirm that the updating of the states stops working exactly when this error occurs for the first time.

After some time, the states start updating again until the error occurs (still have to verify the duration).

However, any state that was not correctly updated while the state updates were not working only will be reported correctly after you toggle the light for which the state was wrong manually once.

morremeyer on 2 May 2019

My experience with 0.92 and the subsequent releases this problem has improved a lot, to the point i cant remember last i had to restart HA. I see nothing in changelog that indicates that there has been any fixes for it.

erazor666 on 3 May 2019

@erazor666 I can confirm the same. Not having issues for days.

migromao on 3 May 2019

👍1

It's the strangest thing… somewhere around 0.82 I stopped having these issues with Tradfri. I actually forgot about the problem for some months. That is, until I upgraded to 0.92. Now, I'm having these issues again.

There's also mention of these issues on the pytradfri project at https://github.com/ggravlingen/pytradfri/issues/205. Sadly, the project owner says:

… there's not much we can do here …
-- https://github.com/ggravlingen/pytradfri/issues/205#issuecomment-481795458

MikeChristianson on 4 May 2019

This is incredibly weird. Mine was working fine until the last Tradfri update. Now it requires a restart 10 times a day.

comatose-tortoise on 4 May 2019

I've completely switched to a Conbee stick with Deconz because of these (and Xiaomi gateway) issues. Deconz is working really, really well for me. My battery powered devices have far better range because of the complete mesh network with both Tradfri and Xiaomi. The battery powered devices have a better polling interval and the Tradfri lights now properly show when they are unavailable. I can also automate all my Tradfri remotes, which is lovely.

I think this is simply not feasable with the current Tradfri implementation. Something like polling should be implemented.

alex3305 on 4 May 2019

I really want to get away from the Tradfri gateway too. But there seem to be no standalone zigbee usb-stick to use with the HA ZHA. Conbee requires another layer of software which I'd rather not add to my HA setup. But if it's the only thing working I guess I'll have to switch to conbee..

comatose-tortoise on 4 May 2019

Weird - mine now seems stable - HA 0.92.1 running in Docker. I was definitely experiencing this until I updated HA a few days ago.

xlfe on 6 May 2019

I am using HA 0.92.2 in Hassio and tradfri stops working minutes later after reboot. The lights stop updating the correct status, but you are still able to switch them on or off if using the toggle twice.

However, this integration is not working correctly.

Regards
Richard

darootler on 6 May 2019

And the ugly beast reared its head again

2019-05-09 00:12:57 ERROR (SyncWorker_25) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 160, in _run
    message = yield from self._queue.get()
  File "/usr/local/lib/python3.7/site-packages/aiocoap/util/asyncio.py", line 40, in get
    priority, first = yield from self._inner.get()
  File "/usr/local/lib/python3.7/asyncio/queues.py", line 159, in get
    await getter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 169, in _run
    self.coaptransport.new_error_callback(-1, self)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 203, in _dispatch_error
    for key, (monitor, cancellable_timeout) in self._active_exchanges.items():
AttributeError: 'NoneType' object has no attribute 'items'

erazor666 on 9 May 2019

IMO let's just add option to tradfri component to allow user to specify how often observations should be resetted. never, on-timeout, x{s,m,h,d,w} :)

KrzysztofHajdamowicz on 9 May 2019

@darootler that fix works. But it causes a huge memory leak within Home Assistant. Merging it wouldn't be the smartest thing to do.

I've made a different version which does about the same thing. The main difference is that Tradfri only times out when it cannot be reached. This will also cause a memory leak, but will be way smaller, since my timeout is larger. This is also a 'workaround' and not a permanent solution. A permanent fix should be implemented in the Python CoAP library though.

Also after looking into the Tradfri Android app it seems that the connection is reset every time the app is re-opened. That doesn't seem like a good solution for Home Assistant, because of delays and automations.

I'm also looking into a Zigbee USB stick because of several issues with both Tradfri and Xiaomi devices.

I‘ve started monitoring my HA instance and i cannot find any memory peaks with the patch applied. Did you observe memory leaks while the patch was applied?

Switching to conbee/zha has the drawback of not having the option to update the firmware of the tradfri products.

In my opinion this problem should be fixed.

Regards
Richard

darootler on 9 May 2019

I can confirm same problem here : 0.92.2 x86 Ubuntu server 18.4 HA in Docker container, no status update after few hours.
Regards
Niksa

catcam on 12 May 2019

Here is my log (maybe this helps):

2019-05-11 23:19:00 ERROR (MainThread) [coap] Exception CancelledError() can not be represented as errno, setting -1.
2019-05-11 23:19:30 ERROR (MainThread) [coap] Exception CancelledError() can not be represented as errno, setting -1.
2019-05-11 23:19:30 ERROR (MainThread) [coap] Exception CancelledError() can not be represented as errno, setting -1.
2019-05-11 23:19:46 ERROR (SyncWorker_20) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 157, in _run
    yield from self._connecting
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 169, in _run
    self.coaptransport.new_error_callback(-1, self)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 192, in _dispatch_error
    request.response.set_exception(OSError(errno, os.strerror(errno)))
asyncio.base_futures.InvalidStateError: invalid state
2019-05-11 23:30:05 ERROR (SyncWorker_0) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 160, in _run
    message = yield from self._queue.get()
  File "/usr/local/lib/python3.7/site-packages/aiocoap/util/asyncio.py", line 40, in get
    priority, first = yield from self._inner.get()
  File "/usr/local/lib/python3.7/asyncio/queues.py", line 159, in get
    await getter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 169, in _run
    self.coaptransport.new_error_callback(-1, self)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 203, in _dispatch_error
    for key, (monitor, cancellable_timeout) in self._active_exchanges.items():
AttributeError: 'NoneType' object has no attribute 'items'
2019-05-11 23:30:05 ERROR (SyncWorker_0) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 160, in _run
    message = yield from self._queue.get()
  File "/usr/local/lib/python3.7/site-packages/aiocoap/util/asyncio.py", line 40, in get
    priority, first = yield from self._inner.get()
  File "/usr/local/lib/python3.7/asyncio/queues.py", line 159, in get
    await getter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiocoap/transports/tinydtls.py", line 169, in _run
    self.coaptransport.new_error_callback(-1, self)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 203, in _dispatch_error
    for key, (monitor, cancellable_timeout) in self._active_exchanges.items():
AttributeError: 'NoneType' object has no attribute 'items'

catcam on 12 May 2019

I am using HA 0.92.2 in Hassio and tradfri stops working minutes later after reboot. The lights stop updating the correct status, but you are still able to switch them on or off if using the toggle twice.

However, this integration is not working correctly.

Regards
Richard

Same here ...

catcam on 12 May 2019

Espacially if i have toggled lights on off intensitively (during testing, and i have 70+ lights) i get problem to control IKEA Trådfri from Home Asssistant. A reboot of HA restore functioanllity.

Here are two example of errors:

(1)

2019-05-12 16:58:24 WARNING (MainThread) [homeassistant.components.tradfri.switch] Observation failed for Vardagsrum Läslampor
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 95, in _get_response
    r = await pr.response
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 816, in _run_outer
    yield from cls._run(app_request, response, weak_observation, protocol, log, exchange_monitor_factory)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 865, in _run
    blockresponse = yield from blockrequest.response
aiocoap.error.RequestTimedOut

(2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 155, in request
    result = await self._execute(api_commands)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 114, in _execute
    await self._observe(api_command)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 172, in _observe
    pr, r = await self._get_response(msg)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 100, in _get_response
    await self._reset_protocol(e)
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 78, in _reset_protocol
    await protocol.shutdown()
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 133, in shutdown
    for exchange_monitor, cancellable in self._active_exchanges.values():
AttributeError: 'NoneType' object has no attribute 'values'
2019-05-12 17:57:59 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 95, in _get_response
    r = await pr.response
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 816, in _run_outer
    yield from cls._run(app_request, response, weak_observation, protocol, log, exchange_monitor_factory)
  File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 865, in _run
    blockresponse = yield from blockrequest.response
aiocoap.error.RequestTimedOut

ahd71 on 12 May 2019

I have been playing with Tradfri and Home Assistant a few days, and had generally a lot of issues with the state (not) being correctly updated in Home Assistant. It would work after the restart of HA, and then at some point (hours usually) it would stop updating.

Long story short: Although I am running a patch by @alex3305 , which could have improved things a bit, I got a significant improvement by explicitly opening a firewall on my HA box (I am using FreeBSD). I was initially blocking incoming connections through pf. pf was opening a IPv4 UDP connection to Tradfri Gateway based on outgoing requests from HA. However, as the UDP connection is essentially stateless, and if there are no packets coming from HA, pf would simply close the connection after a (short) timeout. Only once I explicitly allowed traffic from any source on port 5684 (coaps), I got Tradfri working without a hitch. pf rule for the interested:

pass in on $ext_if proto udp from any port 5684 to any flags any

where $ext_if represents my external interface.

Hence - while there may be other problems, you may want to ensure there is no firewall in between that would block notifications from Tradfri Gateway. Beware of stateful firewalls on UDP connections :)

kszys on 2 Jun 2019

👍2

~~I have made significant progress in the past 48 hours. I'll try drilling down on the progress soon, but I'm too happy with my stability so far. Would like to see how long I can keep it up.~~

I do run @alex3305 patch, but lately it hasn't been going very far to help with the timeouts. I just feel like perf is better than stock integration between crashes.

Changes:

I bought a new gateway. (but still had unavilable timeouts)
I switch to the stock 2A usb power + cables w/ UPS/conditioner to eliminate poor power quality or freak interference (still unavailable timeouts)
I took HA and GW off of a dumb switch and placed them directly on router (Still unavailable timeout)
I factory reset my Asus AC87U router to factory settings (Still unavailable timeout)
In one go, I enabled IGMP Snooping, IGMP Proxy, and Disabled NAT Acceleration on my router (none of these made much sense but in the past have helped my apple devices so I gave it a shot. It's been stable for two days now. The longest in literal months)

I decided since the Pytradfri Devs were stumped and couldn't recreate the timeouts it must be something hardware related. So I replaced all that, when it didn't help I looked at the most likely culprit that has been giving me headaches for years. My ASUS router. I disabled some performance features and things have been beautiful. Not a single tradfri error in my log even after trying my hardest to spam it into oblivion.

I also made a very heavy handed modification to pytradfri/api/aiocoap_api.py that may not be necessary anymore. I removed the reset_protocol from RequestTimeout and copied the observation cancelling into RequestTimeout from reset_protocol. I felt it was very spammy to call the reset for every pending observation. I actually got quite a bit of miles out of that one change before tackling the router but it was very unpredictable.

I'll try figuring out specifically what has helped at all and what hasn't if I can get an uptime of, say, a week. I figure any progress whatsoever is helpful so I'm providing all the info I have. If anyone else has ASUS routers having these timeouts maybe that's a good place to look next?

Edit: Errored out sometime after day 3. Upon restart the stability doesn't seem to be improved after all as I got a timeout within minutes that brought the whole thing down. What an absolute headache. I'll probably order a Conbee and try to get around to upending my entire setup. This integration is looking like a dead end for me.

tylrjspdma on 2 Jun 2019

I have been playing with Tradfri and Home Assistant a few days, and had generally a lot of issues with the state (not) being correctly updated in Home Assistant. It would work after the restart of HA, and then at some point (hours usually) it would stop updating.

Long story short: Although I am running a patch by @alex3305 , which could have improved things a bit, I got a significant improvement by explicitly opening a firewall on my HA box (I am using FreeBSD). I was initially blocking incoming connections through pf. pf was opening a IPv4 UDP connection to Tradfri Gateway based on outgoing requests from HA. However, as the UDP connection is essentially stateless, and if there are no packets coming from HA, pf would simply close the connection after a (short) timeout. Only once I explicitly allowed traffic from any source on port 5684 (coaps), I got Tradfri working without a hitch. pf rule for the interested:

pass in on $ext_if proto udp from any port 5684 to any flags any

where $ext_if represents my external interface.

Hence - while there may be other problems, you may want to ensure there is no firewall in between that would block notifications from Tradfri Gateway. Beware of stateful firewalls on UDP connections :)

Thak you very much @kszys .You pointed me in the right direction. I am now using the builtin Tradfri integration without any patches for about 24 hours without a problem, normally the integration stops working after minutes.

I searched through my firewall logs and saw some blocks from the IKEA gateway to HA, so you need the following firewall rules to get the Tradfri integration working:

ikea-gateway --> UDP 5684 --> HA
HA --> UDP 5684 --> ikea-gateway

Sometimes the ikea-gateway is initiating the connection, i think this is because of packet loss or connection reset.

Regards
Richard

darootler on 3 Jun 2019

👍1

This problem isn't alone a FW issue. Mine are on same subnet ans no packet filtering ..hass devs needs to get IKEA involved IMO. Its a major integration for hass.

ktpx on 3 Jun 2019

Hass.io runs Home Assistant in a container and presumably handles UDP replies similar to how a stateful firewall would. Could the users who don't have Home Assistant and the Tradfri gateway on separate networks weigh in on whether they are using hass.io or Home Assistant in some other containerised environment?

Personally they are on the same network and I am using hass.io and this issue affects me. I plan to wait for it to happen again (I just rebooted today) and then add a forwarding rule to see if it helps.

hjbotha on 3 Jun 2019

Sometimes the ikea-gateway is initiating the connection, i think this is because of packet loss or connection reset.

Actually, I think the Tradfri Gateway sends notifications about the state change to registered clients (i.e., HA). This is intended, working-as-designed behavior - see details for instance here:

https://medium.com/@farissyariati/understanding-coap-for-m2m-message-event-communication-fdcb778faccc

It is not related to packet loss or any reset (BTW: there are no resets on UDP connections). It is just the way it supposed to work :) If it was a TCP connection, the firewall could maintain it (almost) indefinitely by following the state of the connection (part of TCP protocol). Since it is UDP, the firewall has no way of knowing if the connection should be kept open, so the common strategy is to close it after a timeout since the last related packet. Depending on how often the packets are exchanged (i.e., actions generated on HA, or observations reported from the Tradfri Gateway), and depending on the timeout setting, this may happen earlier or later. Once there is a new action from HA, the connection is re-opened again and some observations from the gateway may make it through until the timeout happens again. This is why the behavior is rather unpredictable.... A restart of HA resets all internal HA states, re-registers observers and everything works - for a while :)

As explained earlier - one can avoid the issue by ensuring that all packets on UDP port 5684 are passed without questions. At least this is how I understand this works :) BTW: After a couple of days without any restarts of HA, it still works for me fine!

kszys on 3 Jun 2019

I'm using pip install on raspbian, no containers.

Hass.io runs Home Assistant in a container and presumably handles UDP replies similar to how a stateful firewall would. Could the users who don't have Home Assistant and the Tradfri gateway on separate networks weigh in on whether they are using hass.io or Home Assistant in some other containerised environment?

Personally they are on the same network and I am using hass.io and this issue affects me. I plan to wait for it to happen again (I just rebooted today) and then add a forwarding rule to see if it helps.

ktpx on 4 Jun 2019

I am using Hass.io and since the firewall change everthing is working since 70 hours online time. Seems like the problem in my case was only the firewall rule allowing UDP 5684 packets from the IKEA gateway to HA.

Before the firewall change the integration was working for about an hour after reboot.

Regards
Richard

darootler on 5 Jun 2019

I am using Home Assistant in Docker and behind a nftables firewall. I've set up my firewall to accept and log any incoming UDP conections with source or destination port 5684, and indeed it's logging conenctions from my gateway with source port 5684 and varying destination ports. So far I've seen ports 58161, 32949 and 44753. I suppose that port is random, so I cannot set it up as a forwarded port in Docker. Just exposing udp 5684 is not sufficient.

max-te on 5 Jun 2019

I am using Home Assistant in Docker and behind a nftables firewall. I've set up my firewall to accept and log any incoming UDP conections with source or destination port 5684, and indeed it's logging conenctions from my gateway with source port 5684 and varying destination ports. So far I've seen ports 58161, 32949 and 44753. I suppose that port is random, so I cannot set it up as a forwarded port in Docker. Just exposing udp 5684 is not sufficient.

You may notice that in my rule I allow all UDP traffic from port 5684 without regard to the destination port. BTW: the destination port (for packets coming from Tradfri Gateway) will depend on the port that HA used to register the observer (which is essentially random high port).

An equivalent rule for nftables would probably look like this (_note: I have not tested it!_):

nft add rule filter input udp sport 5684 accept

kszys on 6 Jun 2019

@kszys I know, setting up the nftables rule is not an issue, I have a rule exactly like that (with an additional log option for debugging purposes). The problem lies in setting up Docker: There I can only configure specific destination ports which will be forwarded to the container.

max-te on 6 Jun 2019

This issue (tradfri lights stopping to work after a few hours until a HA restart, but still being accessible via the IKEA app) was plaguing me for months and I was getting quite desperate. At some point I simply plugged my Tradfri gateway directly into the router rather than into the Ethernet port where it used to be, and the problem completely disappeared. I haven't had any issue in 2 weeks now! This proves that at least for some people affected by this issue, flaky network connections can be the cause.

DavidMStraub on 31 Jul 2019

For me the problem disappeared after i added the following firewall rules:

IKEA gateway --> UDP SOURCE PORT 1-65535, DEST PORT 5684 --> raspberry (hassio)
IKEA gateway --> UDP SOURCE PORT 5684, DEST PORT 1-65535 --> raspberry (hassio)
raspberry (hassio) --> UDP SOURCE PORT 1-65535, DEST PORT 5684 --> IKEA gateway
raspberry (hassio) --> UDP SOURCE PORT 5684, DEST PORT 1-65535 --> IKEA gateway

I recommend everyone to search through the firewall logs for blocked/rejected UDP packets on port 5684.

Regards
Richard

darootler on 31 Jul 2019

@kszys, @darootler
If iptables drops packets as they are not matched against any established connection (You explicitly allow for UDP packets originating FROM tradfri 5864/udp), then maybe raising conntrack timeout will help?

net.netfilter.nf_conntrack_udp_timeout = 30

I've ditched tradfri, but maybe removing those iptables rules and raising net.netfilter.nf_conntrack_udp_timeout will solve the issue?

KrzysztofHajdamowicz on 31 Jul 2019

Question, which might be a bit offtopic, but it can be useful for others, too:

Those who moved from Tradfri GW to another Zigbee device, how did you manage to get both the Zigbee device and the Tradfri remote controller to work?
The problem for me is that if I pair a light with CC2531 then it'll no longer work with the Trafri remote. If I pair the remote controller then CC2531 will stop working.
With Tradfri GW I could control the lights with both the Tradfri Zigbee GW (via HASS) and the remote controller (as a wall switch).

dirkam on 31 Jul 2019

At the risk of jinxing myself, this problem, for me, seems to have mostly gone away. I think it may have been due to hass performance issues, or possibly fw updates from IKEA fixes something...just a guess.

ktpx on 31 Jul 2019

Question, which might be a bit offtopic, but it can be useful for others, too:

Those who moved from Tradfri GW to another Zigbee device, how did you manage to get both the Zigbee device and the Tradfri remote controller to work?
The problem for me is that if I pair a light with CC2531 then it'll no longer work with the Trafri remote. If I pair the remote controller then CC2531 will stop working.
With Tradfri GW I could control the lights with both the Tradfri Zigbee GW (via HASS) and the remote controller (as a wall switch).

Problem is that the ikea remote does not support zigbee binding. As a workaround you could sniff the network and see to which group id the remote is sending commands to. Then add bulbs to a group with the same id.

renoutg on 31 Jul 2019

I am having this problem as well. Running Hass.io 0.96.5 in Docker on a Synology NAS. Just takes a few minutes after an HA restart until states are unable to update properly. IKEA app and Gateway seems to work fine. Trådfri is basically useless for me in HA now. Bummer since I just bought the Trådfri stuff.

I do not have an active firewall on my Synology NAS but I have no clue if there may be a firewall in the Hass-io Docker image? If so, can somebody please point me in the right direction to be able to open the correct ports 5684. Is there a terminal command to enter?

The Gateway and HA is connected to the same switch.

fireRoads on 4 Aug 2019

For me the problem disappeared after i added the following firewall rules:

IKEA gateway --> UDP SOURCE PORT 1-65535, DEST PORT 5684 --> raspberry (hassio)
IKEA gateway --> UDP SOURCE PORT 5684, DEST PORT 1-65535 --> raspberry (hassio)
raspberry (hassio) --> UDP SOURCE PORT 1-65535, DEST PORT 5684 --> IKEA gateway
raspberry (hassio) --> UDP SOURCE PORT 5684, DEST PORT 1-65535 --> IKEA gateway

I recommend everyone to search through the firewall logs for blocked/rejected UDP packets on port 5684.

Regards
Richard

I have the same problem as all of you, running hass.io.

I opened the ports as described above, but that did not help.

It always starts with Exception CancelledError() can not be represented as errno, setting -1. and then the observation fails for all tradfri devices.

gitmoeritz on 7 Aug 2019

I've been hitting this too on my Hassio installation, and I can't figure out what's wrong: both are on the same subnet, the same switch even. There's nothing special in between, but sporadically the connectivity gets lost between hassio and my Tradfri hub. Rebooting the pi where hassio is running on is the only way to restore connectivity.

jdeluyck on 16 Sep 2019

The problem is present for me as well. Works for a couple of minutes then it gets stuck. Both siri and Ikea app is still able to control the lights. Running them on the same subnet without any firewall what so ever in between. Running 0.98.1 in docker. Really want to avoid reconnecting all lights to a conbee...

Updated to 0.99 and the issue is still persists. Setting a scene seems to trigger the issue instantly.

ripxorip on 25 Sep 2019

The problem is probably Docker... do you forward all UDP traffic to Docker
container? As far as I know, the default config is to manually select which
ports are forwarded. The problem with Ikea lights (or COAPS in general) is
that the Home Assistant only registers a listener. Then, it is the Ikea
gateway that sends updates on the status of lights on randomly chosen
(during registration) high UDP port. This traffic probably does not reach
your Home Assistant running in the Docker container...

AFIK, the only solution is to get rid of Docker, or forward all UDP traffic
to Docker, or (I do not even know, how to do it) to dynamically configure
UDP ports forwarded to Docker container based on the outcome of the
listener registration with the gateway.

Good luck,
Krzysztof.

On Wed, 25 Sep 2019 at 07:26, ripxorip notifications@github.com wrote:

The problem is present for me as well. Works for a couple of minutes then
it gets stuck. Both siri and Ikea app is still able to control the lights.
Running them on the same subnet without any firewall what so ever in
between. Running 0.98.1 in docker. Really want to avoid reconnecting all
lights to a conbee...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386?email_source=notifications&email_token=ABV27XQIQD3ZCANZX3R33H3QLLY6VA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7QURYY#issuecomment-534857955,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABV27XR73IKNCP6JNR56W7DQLLY6VANCNFSM4E7ODHRA
.

kszys on 25 Sep 2019

The problem is probably Docker... do you forward all UDP traffic to Docker container? As far as I know, the default config is to manually select which ports are forwarded. The problem with Ikea lights (or COAPS in general) is that the Home Assistant only registers a listener. Then, it is the Ikea gateway that sends updates on the status of lights on randomly chosen (during registration) high UDP port. This traffic probably does not reach your Home Assistant running in the Docker container... AFIK, the only solution is to get rid of Docker, or forward all UDP traffic to Docker, or (I do not even know, how to do it) to dynamically configure UDP ports forwarded to Docker container based on the outcome of the listener registration with the gateway. Good luck, Krzysztof.
…
On Wed, 25 Sep 2019 at 07:26, ripxorip @.*> wrote: The problem is present for me as well. Works for a couple of minutes then it gets stuck. Both siri and Ikea app is still able to control the lights. Running them on the same subnet without any firewall what so ever in between. Running 0.98.1 in docker. Really want to avoid reconnecting all lights to a conbee... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14386?email_source=notifications&email_token=ABV27XQIQD3ZCANZX3R33H3QLLY6VA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7QURYY#issuecomment-534857955>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABV27XR73IKNCP6JNR56W7DQLLY6VANCNFSM4E7ODHRA .

I am running in Docker with --net=host so I would assume that all ports are already exposed? Why is it working for a few requests before hanging? I would have thought that if it was port related it would not have worked at all in the first place? Thnx for your reply.

ripxorip on 25 Sep 2019

I do not claim to be a Docker expert, so the problem may lay elsewhere.

In my case, the problem was that my OS (I use FreeBSD) would open a
bi-directional connection with the Ikea gateway during the registration of
the listener. For a certain time (defined by a timeout) this connection is
open, so the replies from the gateway are properly arriving to the
listener. Since the UDP connection is stateless, the OS cannot tell when
the connection is closed. Hence, it usually uses a pre-defined timeout
after which the connection is considered to be closed, if there were no new
packets exchanged. Once that pre-defined time passes, the packets from the
gateway would be blocked.

I presume some low-level investigation of how the UDP packets are treated
by the host OS and the Docker container could provide some more insight.
Sorry, but I cannot think of offering any more advice.

On Wed, 25 Sep 2019 at 11:20, ripxorip notifications@github.com wrote:

The problem is probably Docker... do you forward all UDP traffic to Docker
container? As far as I know, the default config is to manually select which
ports are forwarded. The problem with Ikea lights (or COAPS in general) is
that the Home Assistant only registers a listener. Then, it is the Ikea
gateway that sends updates on the status of lights on randomly chosen
(during registration) high UDP port. This traffic probably does not reach
your Home Assistant running in the Docker container... AFIK, the only
solution is to get rid of Docker, or forward all UDP traffic to Docker, or
(I do not even know, how to do it) to dynamically configure UDP ports
forwarded to Docker container based on the outcome of the listener
registration with the gateway. Good luck, Krzysztof.
… <#m_2605215267096344576_>
On Wed, 25 Sep 2019 at 07:26, ripxorip @.*> wrote: The problem is
present for me as well. Works for a couple of minutes then it gets stuck.
Both siri and Ikea app is still able to control the lights. Running them on
the same subnet without any firewall what so ever in between. Running
0.98.1 in docker. Really want to avoid reconnecting all lights to a
conbee... — You are receiving this because you were mentioned. Reply to
this email directly, view it on GitHub <#14386
https://github.com/home-assistant/home-assistant/issues/14386?email_source=notifications&email_token=ABV27XQIQD3ZCANZX3R33H3QLLY6VA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7QURYY#issuecomment-534857955>,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABV27XR73IKNCP6JNR56W7DQLLY6VANCNFSM4E7ODHRA
.

I am running in Docker with --net=host so I would assume that all ports
are already exposed? Why is it working for a few requests before hanging? I
would have thought that if it was port related it would not have worked at
all in the first place? Thnx for your reply.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386?email_source=notifications&email_token=ABV27XXZMM2DA72URDYFOTLQLMUNVA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7RGZYY#issuecomment-534932707,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABV27XWAKYNYLQDRHT4SNM3QLMUNVANCNFSM4E7ODHRA
.

kszys on 25 Sep 2019

I see, in my case I can see it happen more frequently the harder I push the gateway. Will try to run pytradfri without hassio tonight and see if I can make it desync. Would it be possible to add a slight delay before sending the next command in HA in order to reduce load to the gateway when commanding lots of events such as scenes?

ripxorip on 25 Sep 2019

I gave up, re-paired all my lights with conbee and now everything works flawlessly (with a lot less latency than the gateway).

ripxorip on 27 Sep 2019

I gave up, re-paired all my lights with conbee and now everything works flawlessly (with a lot less latency than the gateway).

With the Conbee setup, does your IKEA lights report as OFF in the HA GUI sometimes, despite actually being ON? This happens for me all the time. ON is never false though, and I never lose contact with them.

comatose-tortoise on 27 Sep 2019

I'm using latest Hyper-V images and the issue still persists.

KlausJokisuo on 1 Oct 2019

Who has MDNS enabled for their network? Since I turned multicast DNS off I have not had the timeout problem. I am running hassio 99.2 and had the exact same problem that many of you described.

cenx1 on 3 Oct 2019

Anyone has a fix for this? I don't use MDNS

TynThordin on 6 Oct 2019

Not working for me either in 0.99.3

quackpack on 7 Oct 2019

Just dropping my experience with this issue here. This might help with people who has the same setup. (Spoiler: firewall issue)

I have hassio setup within a docker running on CentOS.

Originally I had bridged network with port forwarding for 5684/udp, this doesn't help with tradfri notifications because home assistant randomizes the source port for coap.

Then I switched to host networking and it still wasn't reliable. Using tshark to packet capture, I see that after a while, notifications were received by the host but no acks were sent. Note that if the tradfri gateway doesn't receive acks after sending a certain number of notifications, it will no longer send notifications to home-assistant.

Figuring that it could be the host firewall blocking it, I added a firewall rule for SOURCE port 5684/udp, to always allow mesages from the tradfri gateway.

Final setup is:

firewall-cmd --permanent --new-service=home-assistant
firewall-cmd --permanent --service=home-assistant --add-source-port=5684/udp
firewall-cmd --permanent --zone=public --add-service=home-assistant
firewall-cmd --reload

Docker container set to host network.

yiding on 8 Oct 2019

👍1

I gave up, re-paired all my lights with conbee and now everything works flawlessly (with a lot less latency than the gateway).

With the Conbee setup, does your IKEA lights report as OFF in the HA GUI sometimes, despite actually being ON? This happens for me all the time. ON is never false though, and I never lose contact with them.

Yeah it has happened a few times for some of my lights. Although the same happens in the Deconz GUI so its not an issue regarding HA.

ripxorip on 8 Oct 2019

What software version of the Trådfri hub/bridge/gateway are you running?

cenx1 on 8 Oct 2019

I've diagnosed this as a conntrack problem where the UDP traffic from the gateway to HA is dropped and then there seems to be an "out of sync" problem.

I run home-assistant within Docker.

When HA starts it connects to the Tradfri gateway using COAP via UDP on port 5684.

Using conntrack -L I can see an entry relating to this COAP traffic. Everything works as expected.
The after approx 3 minutes (assuming ha doesn't do anything to the lights) this conntrack entry times out.

Now, one of two things can happen from here.

If HA triggers a change to my lights a new conntrack entry is created and everything continues to work as expected (this is because there is traffic from HA to the Tradfri gateway)
If the status of the lights change (eg they are switched on/off using the Tradfri app) then I see a UDP packet destined for HA but no ACK response from HA. If I then try to change any tradfri components from HA, it no longer works. I'm not sure why being out of sync with the gateway is a problem, but that seems to be when I need to restart HA (connection times out and the status of a Tradfri device changes but not triggered by HA).

There's a couple of ways to fix this problem.

I have bumped up the conntrack timeout on my docker host as follows :-

sysctl net.netfilter.nf_conntrack_udp_timeout_stream=2678400

or add net.netfilter.nf_conntrack_udp_timeout_stream=2678400 to /etc/sysctl.conf to persist across reboots.

Other options would be for HA to send a keep-alive request to the gateway every 20 seconds.

Or https://github.com/ggravlingen/pytradfri/issues/205 could have some logic added to reset the connection to the gateway if the client gets out of sync.

I think this has been so difficult to debug because the way UDP packets get from the Tradfri gateway to you HA instance can also be affected by changes to IP addresses (eg gateway changes IP so this also stops the conntrack entry from working).

Interested if this helps others resolve this problem too @alex3305

xlfe on 10 Oct 2019

Thanks, I will try this! I am running Docker on a Synology NAS. Does anyone know if I need to change the /etc/sysctl.conf or the /etc.defaults/sysctl.conf?

fireRoads on 14 Oct 2019

Facing the same issue with latest versions of hass.io (v 0.100.2) and IKEA gateway (v 9.27). All light states in the UI are stuck on "on", but service calls do still work! Service dev tool calls and automations that use light.turn_on and light.turn_off work just fine, but light.toggle only works when a light is already on, because HA always thinks the light is on and thus in turns always calls light.turn_off. I'm running hass.io on a rpi 3b, which is directly connected to my router, just like the IKEA gateway. Fixed IP for both devices. Restarting HA makes it work again for a while.

rutgerkra on 16 Oct 2019

👍1

Facing the same issue with latest versions of hass.io (v 0.100.2) and IKEA gateway (v 9.27). All light states in the UI are stuck on "on", but service calls do still work! Service dev tool calls and automations that use light.turn_on and light.turn_off work just fine, but light.toggle only works when a light is already on, because HA always thinks the light is on and thus in turns always calls light.turn_off. I'm running hass.io on a rpi 3b, which is directly connected to my router, just like the IKEA gateway. Fixed IP for both devices. Restarting HA makes it work again for a while.

I'm not familiar with hass.io sorry but I think the hassio devs would have to build the sysctl change in, or you might be able to ssh into the raspberry pi and make the change manually.

You can diagnose if it is the same problem by installing conntrack and running conntrack -L | grep 5684

Watch for the UDP conntrack entry that relates to your Tradfri gateway. If everything works until the conntrack entry is removed you've got the same problem (but the fix is just the sysctl changes I outlined above)

xlfe on 16 Oct 2019

7 days now and counting that my Tradfri integration has been rock solid. Most I ever got before was one or two days.

xlfe on 16 Oct 2019

xlfe, I'll give this a try, it's been driving me nuts. I'm running HA in a docker, should I change the timeout in the docker container AND the host system, or would the docker container be enough? I also can't seem to make the setting persistent on reboots in the docker container, any ideas? I tried loading the nf_conntrack kernel module from /etc/modules so it loads before sysctl.conf is loaded, but no luck on the docker, on the ubuntu host does keep it now..

jtonk on 20 Oct 2019

xlfe, I'll give this a try, it's been driving me nuts. I'm running HA in a docker, should I change the timeout in the docker container AND the host system, or would the docker container be enough? I also can't seem to make the setting persistent on reboots in the docker container, any ideas? I tried loading the nf_conntrack kernel module from /etc/modules so it loads before sysctl.conf is loaded, but no luck on the docker, on the ubuntu host does keep it now..

@jtonk - Sorry for the confusion - you only need to make the change on the docker host, not in the docker container.

xlfe on 20 Oct 2019

@jtonk - Sorry for the confusion - you only need to make the change on the docker host, not in the docker container.

@xlfe Thanks, then I'm good to go. I'll report back in a couple of days

jtonk on 20 Oct 2019

👍1

I also tried the workaround provided by @xlfe on the Hass.io host system by entering sysctl net.netfilter.nf_conntrack_udp_timeout_stream=2678400

For me this ended up in no controllable tradfri lights through HA. Automations, scripts and GUI commands not working.

Just for your information:

net.netfilter.nf_conntrack_udp_timeout_stream is set to 180 on the host system and on the HA container.

Regards
Richard

darootler on 20 Oct 2019

I've diagnosed this as a conntrack problem where the UDP traffic from the gateway to HA is dropped and then there seems to be an "out of sync" problem.

I run home-assistant within Docker.

When HA starts it connects to the Tradfri gateway using COAP via UDP on port 5684.

Using conntrack -L I can see an entry relating to this COAP traffic. Everything works as expected.
The after approx 3 minutes (assuming ha doesn't do anything to the lights) this conntrack entry times out.

Now, one of two things can happen from here.

If HA triggers a change to my lights a new conntrack entry is created and everything continues to work as expected (this is because there is traffic from HA to the Tradfri gateway)

If the status of the lights change (eg they are switched on/off using the Tradfri app) then I see a UDP packet destined for HA but no ACK response from HA. If I then try to change any tradfri components from HA, it no longer works. I'm not sure why being out of sync with the gateway is a problem, but that seems to be when I need to restart HA (connection times out and the status of a Tradfri device changes but not triggered by HA).

There's a couple of ways to fix this problem.

I have bumped up the conntrack timeout on my docker host as follows :-
sysctl net.netfilter.nf_conntrack_udp_timeout_stream=2678400
or add net.netfilter.nf_conntrack_udp_timeout_stream=2678400 to /etc/sysctl.conf to persist across reboots.

Other options would be for HA to send a keep-alive request to the gateway every 20 seconds.

Or ggravlingen/pytradfri#205 could have some logic added to reset the connection to the gateway if the client gets out of sync.

I think this has been so difficult to debug because the way UDP packets get from the Tradfri gateway to you HA instance can also be affected by changes to IP addresses (eg gateway changes IP so this also stops the conntrack entry from working).

Interested if this helps others resolve this problem too @alex3305

I did some investigation on my firewall. Seems like i am not having the excat same issue like you. If there is no UDP connection up and i switch a light on/off within the tradfri app i see a UDP connection from the tradfri gateway to HA and everthing is working fine.

Maybe i have some time to investigate this problem more further.

Regards
Richard

darootler on 20 Oct 2019

Today the status of the tradfri lights where not in sync so i started investigating again. Seems like HA is still opening UDP sessions but the ikea gateway doesn't answer anymore. But if i push a tradfri remote switch the ikea gateway isn't opening up a new UDP session.

So this is a similar behavior @xlfe described above. I am not able to bump the UDP conntrack timeouts but looks like there is a way to keep the UDP connection up. I created two automations, one is turning a light on if its on and the other one is turning the light off if its off. Sounds stupid but if a automation triggers (every minute) i see the packet count increases within the established connection.

Edit:

Here is a screenshot from my firewall. Every time a new packet is sent (automation) the timeout resets to 180 again and the connection stays alive:

I'll keep you up the date if that helps.

Regards
Richard

darootler on 21 Oct 2019

66 hours uptime and everthing regarding the tradfri component is working fine:

Seems like the initial UDP connection was closed at least one time, the actual connection is up since 39 hours (default timeout is 180 seconds):

So here is my working "workaround" (so far):

Update all turned on lights every 150 seconds with Circadian Lightning component (Flux is built in and does it similiar):

circadian_lighting:
  interval: 150
  max_colortemp: 4000
  min_colortemp: 2200

switch:
  - platform: circadian_lighting
    lights_ct:
      - light.abstellraum_licht
      - light.buero_licht
      - light.gang_bad_licht
      - light.gang_buero_licht
      - light.gang_eingang_licht
      - light.kinderzimmer_licht
      - light.schlafzimmer_licht
      - light.wohnkueche_doppelfenster_licht
      - light.wohnkueche_einzelfenster_licht
      - light.wohnkueche_kochinsel_licht
    name: Steuerung alle Lichter
  - platform: circadian_lighting
    lights_ct:
      - light.wc_licht
    name: Steuerung WC Licht
    sleep_brightness: 1
    sleep_colortemp: 2200
    sleep_entity: binary_sensor.nacht
    sleep_state: "on"

Turn off any light every 150 seconds if its off with an automation. This updates the light even though the status is not changed, so the UDP connection keeps up:

- id: '1571678776374'
  alias: Abschalten von Buero Licht wenn abgeschaltet alle 150 Sekunden
  description: ''
  trigger:
    platform: time_pattern
    seconds: /150
  condition:
  - condition: device
    device_id: 29cd8946d1d8441f8ac1d389290983af
    domain: light
    entity_id: light.buero_licht
    type: is_off
  action:
  - device_id: 29cd8946d1d8441f8ac1d389290983af
    domain: light
    entity_id: light.buero_licht
    type: turn_off

Seems like maintaining the UDP connection as long as possible is a reliable workaround. @xlfe already explained how to bump up the timeout for the UDP connection. If you cannot set the timeout for the UDP connection (for wichever reasons) just update the lights frequently within 180 seconds.

As i can see there is a problem with HA getting "out of sync" with the tradfri gateway (@xlfe already mentioned that), however this problem does not occur between the tradfri app and the tradfri gateway. So in my opinion this has to be fixed within the tradfri component.

If anyone needs details regarding the network traffic/settings please let me know.

Regards
Richard

darootler on 25 Oct 2019

🎉1 👍1

Nice workaround!

Perhaps if pytradfri or ha can add a keep alive message to the gateway that will solve the problem for most

xlfe on 25 Oct 2019

Is this discussion using the expertise of @ggravlingen already? Can’t seem to find him and he’s codeowner of the Tradfri Ha code?

Sorry if answer is yes, my bad then...

Mariusthvdb on 26 Oct 2019

Bad news, i just saw the "out of sync" problem again after 5 hours uptime...

@xlfe Is your environment still stable?

Regards
Richard

darootler on 26 Oct 2019

I haven't seen this thread before but am aware of people facing stability issues with the gateway, especially when running a lot of traffic against it. @darootlers solution was an interesting one though. Are you hitting the gateway with a lot of traffic, aside from continously turning the devices on/off.

ggravlingen on 26 Oct 2019

I haven't seen this thread before but am aware of people facing stability issues with the gateway, especially when running a lot of traffic against it. @darootlers solution was an interesting one though. Are you hitting the gateway with a lot of traffic, aside from continously turning the devices on/off.

I am using 22 devices with the ikea gateway (11 lights and 11 remotes). I am using circadian lightning component wich is updating the lights every 5 minutes (default). So i don’t think that this is much traffic.

I also noticed that the ikea app always starts a new UDP connection every time i restart the ikea app. Home Assistant (pytradfri i guess) uses the same UDP connection until the timeout is reached. Maybe that has something to do with the „in sync“ problem.

Let me know if i can help you with more information or testing.

Regards
Richard

darootler on 26 Oct 2019

@darootler when HA starts it fires up an internal task (I assume this translates to a thread) per device, which observes what's going on with that device. I've started a draft PR where I've added some error handling. If you have time to test it, it would be appreciated.

At the moment, I'm turning a light on every second to try to reproduce spamming the gateway. How have you configured your sensor for uptime?

ggravlingen on 26 Oct 2019

@darootler when HA starts it fires up an internal task (I assume this translates to a thread) per device, which observes what's going on with that device. I've started a draft PR where I've added some error handling. If you have time to test it, it would be appreciated.

At the moment, I'm turning a light on every second to try to reproduce spamming the gateway. How have you configured your sensor for uptime?

I will test it, where do i find your draft?

What do you mean with the sensor for uptime? Currently i am just manually checking if the status of the lights within the ikea app and HA is „in sync“. I have no automated monitoring for this problem.

Regards
Richard

darootler on 26 Oct 2019

Do you mean the uptime sensor in the screenshot above? —> https://www.home-assistant.io/integrations/uptime/

Regards
Richard

darootler on 26 Oct 2019

It’s here:
https://github.com/home-assistant/home-assistant/pull/28241

Are you running the docker install or stand alone?

ggravlingen on 26 Oct 2019

I am running hass.io, just copied your tradfri version to /config/custom_components/tradfri and rebooted HA:

2019-10-26 23:13:25 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for tradfri which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you do experience issues with Home Assistant.

Should i pay attention to something special?

Regards
Richard

darootler on 26 Oct 2019

Thank you!

I'm curious of what you can report in regards of stability. Any difference from before? I'm able to reproduce the states not updating but give it about one minute and it starts working again. States are updated in the GUI and I'm able to control the devices.

ggravlingen on 26 Oct 2019

@darootler one more question on your setup, are you importing IKEA groups from the gateway?

ggravlingen on 26 Oct 2019

I‘ll report back if there are any changes regarding stability. I am using the default setting for IKEA groups, so no i am not importing the groups.

Regards
Richard

darootler on 26 Oct 2019

👍1

It just happened again, HA reported a light turned on but it was not:

ha_light
ikea_light

If i turn off the light with the ikea app or the tradfri remote dimmer the state does not change in HA, if i turn off the light with HA the light turns off but the state in HA toggles to on again.

Edit:

Normally if i switch a light on or off with the ikea app there is a UDP connection opened (if there is no active one) from the ikea gateway to HA, but at this point the gateway doesn't even send any data to HA.

Regards
Richard

darootler on 27 Oct 2019

Thanks for the feedback! I flooded the gateway with traffic yesterday and was able to reproduce.

I’ll do some more work on my branch and post back here when more testing can be done.

ggravlingen on 27 Oct 2019

❤1 👍1

Just chiming in. I started experiencing the same, when I moved my HomeAssistant (running in docker) to another server. My home and the server running HA are connected via VPN (using ZeroTier). It works great apart from the IKEA problems.

TynThordin on 27 Oct 2019

I just had the out of sync problem but I think it was because the gateway
just updated itself and restarted

https://www.reddit.com/r/tradfri/comments/d33mfr/gateway_1927_patch_notes_are_up/

Otherwise my Tradfri integration has been rock solid...

On Sun, 27 Oct 2019, 05:06 darootler, notifications@github.com wrote:

Bad news, i just saw the "out of sync" problem again after 5 hours
uptime...

@xlfe https://github.com/xlfe Is your environment still stable?

Regards
Richard

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386?email_source=notifications&email_token=AAI7ROYDPBLPLLWGEIOXGTDQQSBLRA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKNU2A#issuecomment-546626152,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAI7ROY4NMHJ7KGX5OCZCNDQQSBLRANCNFSM4E7ODHRA
.

xlfe on 31 Oct 2019

I just had the out of sync problem but I think it was because the gateway just updated itself and restarted https://www.reddit.com/r/tradfri/comments/d33mfr/gateway_1927_patch_notes_are_up/ Otherwise my Tradfri integration has been rock solid...
…
On Sun, 27 Oct 2019, 05:06 darootler, @.*> wrote: Bad news, i just saw the "out of sync" problem again after 5 hours uptime... @xlfe https://github.com/xlfe Is your environment still stable? Regards Richard — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14386?email_source=notifications&email_token=AAI7ROYDPBLPLLWGEIOXGTDQQSBLRA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKNU2A#issuecomment-546626152>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI7ROY4NMHJ7KGX5OCZCNDQQSBLRANCNFSM4E7ODHRA .

Thx for the update.

I managed to set the timeout for the existing UDP connection from HA to the ikea gateway:

conntrack -U -s 192.168.XXX.XXX -d 192.168.XXX.XXX -p UDP -t 2678400

I'll report back if that helps.

Edit:

I just noticed that every new dataflow resets the timeout back to 180 :-(

But i managed to set the timeout via sysctl and reverted back again to 180:

udp 17 2678317 src=192.168.XXX.XXX dst=192.168.XXX.XXX sport=39915 dport=5684 packets=8 bytes=724 src=192.168.XXX.XXX dst=192.168.XXX.XXX sport=5684 dport=39915 packets=8 bytes=1652 [ASSURED] mark=0 delta-time=84 use=2

2678317 is the timeout value, but it seems like the connection got closed though.

Regards
Richard

darootler on 31 Oct 2019

Hi all!

I recently moved my database from mariadb running on the PI3+ to my QNAP NAS and from there on everything is working extremely smooth. The clicks from the ZHA integrated wireless switches are recognized instantly, there is nearly no delay if a motion sensor triggers a light, the historical states are loading very fast now and so on.

Besides that i configured the Z-Wave logs from default (7 i think) to 5 to reduce SD card writes.

The tradfri integration is also still working without any issues since 89 hours, this is my personal record.

Maybe these "out of sync" problems are "just" perfomance issues? Maybe someone is able to test if offloading the database would help others too. A second option is to completely turn off the recorder.

If someone needs help let me know.

Regards
Richard

darootler on 7 Nov 2019

An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe and everything has been working completely stable since then.

rutgerkra on 21 Nov 2019

An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe and everything has been working completely stable since then.

Could you please describe where you edited /etc/sysctl.conf? Did your change survive a host reboot?

Regards
Richard

darootler on 21 Nov 2019

An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe and everything has been working completely stable since then.

Could you please describe where you edited /etc/sysctl.conf? Did your change survive a host reboot?

Regards
Richard

I edited the file /etc/sysctl.conf that is on the device on which hass.io is installed, which in my case is a rpi 3b. To do so I used the HASS Configurator (https://www.home-assistant.io/addons/configurator/), but you could have just as well done it through an ssh connection or whatever.

The change survives a complete device reboot.

rutgerkra on 21 Nov 2019

An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe and everything has been working completely stable since then.

Could you please describe where you edited /etc/sysctl.conf? Did your change survive a host reboot?
Regards
Richard

I edited the file /etc/sysctl.conf that is on the device on which hass.io is installed, which in my case is a rpi 3b. To do so I used the HASS Configurator (https://www.home-assistant.io/addons/configurator/), but you could have just as well done it through an ssh connection or whatever.

The change survives a complete device reboot.

What i mean is that there are more than one location for sysctl.conf. On the host system (ssh port 22222) or on the container system (ssh port 22). Seems like you edited sysctl.conf on container system. Well, that's not an option for me because my firewall is sitting between HA and the IKEA gateway, i assume your HA sits on the same subnet as your IKEA gateway?

I am not a network specialist but i think maintaining a UDP stream "forever" isn't a good way to "fix" this problem. In addition this setting will increase all UDP streams from HA.

Regards
Richard

darootler on 21 Nov 2019

An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe and everything has been working completely stable since then.

Could you please describe where you edited /etc/sysctl.conf? Did your change survive a host reboot?
Regards
Richard

I edited the file /etc/sysctl.conf that is on the device on which hass.io is installed, which in my case is a rpi 3b. To do so I used the HASS Configurator (https://www.home-assistant.io/addons/configurator/), but you could have just as well done it through an ssh connection or whatever.
The change survives a complete device reboot.

What i mean is that there are more than one location for sysctl.conf. On the host system (ssh port 22222) or on the container system (ssh port 22). Seems like you edited sysctl.conf on container system. Well, that's not an option for me because my firewall is sitting between HA and the IKEA gateway, i assume your HA sits on the same subnet as your IKEA gateway?

I am not a network specialist but i think maintaining a UDP stream "forever" isn't a good way to "fix" this problem. In addition this setting will increase all UDP streams from HA.

Regards
Richard

I did not know there were two files, but yes HA and the IKEA gateway are on the same subnet. Perhaps this indeed is more of a "hack" to get it working than a nice permanent solution. If there are nicer solutions I'm willing to try those out.

rutgerkra on 21 Nov 2019

An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe and everything has been working completely stable since then.

Could you please describe where you edited /etc/sysctl.conf? Did your change survive a host reboot?
Regards
Richard

I edited the file /etc/sysctl.conf that is on the device on which hass.io is installed, which in my case is a rpi 3b. To do so I used the HASS Configurator (https://www.home-assistant.io/addons/configurator/), but you could have just as well done it through an ssh connection or whatever.
The change survives a complete device reboot.

What i mean is that there are more than one location for sysctl.conf. On the host system (ssh port 22222) or on the container system (ssh port 22). Seems like you edited sysctl.conf on container system. Well, that's not an option for me because my firewall is sitting between HA and the IKEA gateway, i assume your HA sits on the same subnet as your IKEA gateway?
I am not a network specialist but i think maintaining a UDP stream "forever" isn't a good way to "fix" this problem. In addition this setting will increase all UDP streams from HA.
Regards
Richard

I did not know there were two files, but yes HA and the IKEA gateway are on the same subnet. Perhaps this indeed is more of a "hack" to get it working than a nice permanent solution. If there are nicer solutions I'm willing to try those out.

Do you use HA's internal database for the recorder? Do you have a chance to move the database to another host like a NAS?

I think the problem is performance related (i am using a Pi3B+ with Hass.io), the raspberry responds very slow if there is much load on the disk and i think that's the main (not the only) problem here.

Regards
Richard

darootler on 21 Nov 2019

Hi,

I don't think it is performance related, I also have the problem and I'm
running HA in a docker on a chromebox intel i5, SSD drive and 12 gb RAM.

regards,

Jasper

On Thu, Nov 21, 2019 at 10:24 AM darootler notifications@github.com wrote:

An update on this issue that I was experiencing while using hass.io: I
have edited /etc/sysctl.conf and added
"net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by
@xlfe https://github.com/xlfe and everything has been working
completely stable since then.

Could you please describe where you edited /etc/sysctl.conf? Did your
change survive a host reboot?
Regards
Richard

I edited the file /etc/sysctl.conf that is on the device on which hass.io
is installed, which in my case is a rpi 3b. To do so I used the HASS
Configurator (https://www.home-assistant.io/addons/configurator/), but
you could have just as well done it through an ssh connection or whatever.
The change survives a complete device reboot.

What i mean is that there are more than one location for sysctl.conf. On
the host system (ssh port 22222) or on the container system (ssh port 22).
Seems like you edited sysctl.conf on container system. Well, that's not an
option for me because my firewall is sitting between HA and the IKEA
gateway, i assume your HA sits on the same subnet as your IKEA gateway?
I am not a network specialist but i think maintaining a UDP stream
"forever" isn't a good way to "fix" this problem. In addition this setting
will increase all UDP streams from HA.
Regards
Richard

I did not know there were two files, but yes HA and the IKEA gateway are
on the same subnet. Perhaps this indeed is more of a "hack" to get it
working than a nice permanent solution. If there are nicer solutions I'm
willing to try those out.

Do you use HA's internal database for the recorder? Do you have a chance
to move the database to another host like a NAS?

I think the problem is performance related (i am using a Pi3B+ with
Hass.io), the raspberry responds very slow if there is much load on the
disk and i think that's the main (not the only) problem here.

Regards
Richard

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/14386?email_source=notifications&email_token=AAFSTBTOKAI2KWSGGKL6Z3TQUZHVDA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEZRFJI#issuecomment-556995237,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAFSTBUXLPR323Z3B2UKRH3QUZHVDANCNFSM4E7ODHRA
.

jtonk on 21 Nov 2019

Hi, I don't think it is performance related, I also have the problem and I'm running HA in a docker on a chromebox intel i5, SSD drive and 12 gb RAM. regards, Jasper
…
On Thu, Nov 21, 2019 at 10:24 AM darootler @.*> wrote: An update on this issue that I was experiencing while using hass.io: I have edited /etc/sysctl.conf and added "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" as mentioned by @xlfe https://github.com/xlfe and everything has been working completely stable since then. Could you please describe where you edited /etc/sysctl.conf? Did your change survive a host reboot? Regards Richard I edited the file /etc/sysctl.conf that is on the device on which hass.io is installed, which in my case is a rpi 3b. To do so I used the HASS Configurator (https://www.home-assistant.io/addons/configurator/), but you could have just as well done it through an ssh connection or whatever. The change survives a complete device reboot. What i mean is that there are more than one location for sysctl.conf. On the host system (ssh port 22222) or on the container system (ssh port 22). Seems like you edited sysctl.conf on container system. Well, that's not an option for me because my firewall is sitting between HA and the IKEA gateway, i assume your HA sits on the same subnet as your IKEA gateway? I am not a network specialist but i think maintaining a UDP stream "forever" isn't a good way to "fix" this problem. In addition this setting will increase all UDP streams from HA. Regards Richard I did not know there were two files, but yes HA and the IKEA gateway are on the same subnet. Perhaps this indeed is more of a "hack" to get it working than a nice permanent solution. If there are nicer solutions I'm willing to try those out. Do you use HA's internal database for the recorder? Do you have a chance to move the database to another host like a NAS? I think the problem is performance related (i am using a Pi3B+ with Hass.io), the raspberry responds very slow if there is much load on the disk and i think that's the main (not the only) problem here. Regards Richard — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14386?email_source=notifications&email_token=AAFSTBTOKAI2KWSGGKL6Z3TQUZHVDA5CNFSM4E7ODHRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEZRFJI#issuecomment-556995237>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSTBUXLPR323Z3B2UKRH3QUZHVDANCNFSM4E7ODHRA .

As i said i think it's the main reason and not the only reason. After i migrated the database off the pi3 i do have problems with the tradfri integration about once a week. Before i migrated the database off the pi3 the tradfri integration stopped working hours or minutes after a reboot.

How often are you facing this problem or how long does the tradfri integration work in your environment?

Regards
Richard

darootler on 21 Nov 2019

Any progress on this one? I am having same issue with hassio on rbpi3 Ikea Gateway (1.9.27). However editing /etc/sysctl.conf and adding "net.netfilter.nf_conntrack_udp_timeout_stream=2678400" seems to help.

ronkeli on 3 Dec 2019

I'm also experiencing this, making my Trådfri setup quite useless in HASS. I've added net.netfilter.nf_conntrack_udp_timeout_stream=2678400 to my /etc/sysctl.conf and I'm using an external database but it usually only takes 10 - 30 minutes after a restart before the states stop updating.

Is there any solution to this on the horizon?

psvanstrom on 14 Dec 2019

I started using ZHA instead.

comatose-tortoise on 15 Dec 2019

I'm also experiencing this, making my Trådfri setup quite useless in HASS. I've added net.netfilter.nf_conntrack_udp_timeout_stream=2678400 to my /etc/sysctl.conf and I'm using an external database but it usually only takes 10 - 30 minutes after a restart before the states stop updating.

Is there any solution to this on the horizon?

In my case the problem is related to performance. Is your system under heavy load? Debug logging, much updates from sensors, anything that stresses the disk? What hardware are you using? Is there a firewall between HA and the IKEA gateway?

Regards
Richard

darootler on 15 Dec 2019

Is there a firewall between HA and the IKEA gateway?

Yes, there was 😄. Right after I wrote my comment above I tried disabling iptables and I haven't had a problem since, so I'm guessing the firewall is the culprit here, closing down the udp connection after a while?

psvanstrom on 15 Dec 2019

Is there a firewall between HA and the IKEA gateway?

Yes, there was 😄. Right after I wrote my comment above I tried disabling iptables and I haven't had a problem since, so I'm guessing the firewall is the culprit here, closing down the udp connection after a while?

Yes, there is a default timeout for UDP streams set on 180 seconds. But i think that's okay and shouldn't be a problem.

I've the following firewall rule set up:

Where "coaps destination" and "coaps source" are the following objects:

I saw the IKEA gateway initiating a connection as well as Home Assistant. So i think you need to allow both directions on your firewall.

Regards
Richard

darootler on 15 Dec 2019

👍1

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 14 Mar 2020

FWIW, this is still an issue for me.

What complicates things in my case is that I'm running homeassistant on kubernetes, so UDP is a little trickier. Is there a way to fix the ports used for sending data to the tradfri hub so that the response data always comes back to the same port? That way I can make sure that UDP traffic is routed through to HA.

growse on 14 Mar 2020

Since last week I am experiencing this issue as well. Within less than an hour after restarting Home-assistant the communication with Tradfri stops working. I can still manage the lights with Homekit, but the automations defined in Home-Assistant don't work anymore.
Last week both HASS and Tradfri were updated, so it's not clear to me what has caused the issue.
Running HASS version 0.107.7 and Ikea gateway version 1.10.30

I see errors like this in the log:
2020-04-06 07:04:15 WARNING (MainThread) [homeassistant.components.tradfri.base_class] Observation failed for buitenlamp bij voordeur
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pytradfri/api/aiocoap_api.py", line 95, in _get_response
r = await pr.response
File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 816, in _run_outer
yield from cls._run(app_request, response, weak_observation, protocol, log, exchange_monitor_factory)
File "/usr/local/lib/python3.7/site-packages/aiocoap/protocol.py", line 865, in _run
blockresponse = yield from blockrequest.response
aiocoap.error.RequestTimedOut

ekooter on 6 Apr 2020

Same here. I thought it's my unifi Switch, but they're on the same L2/L3 Net and I don't see any issues in the logs.

LukasQ on 8 Apr 2020

I got a set of TRÅDFRI bulbs, a remote and the gateway around Christmas. I was experiencing problems controlling my lights from homeassistant and changed to a ConBee II. After struggling a bit with re-pairing things, everything has worked seamlessly for a couple of months. I am running homeassistant on my windows computer from a fairly new version pulled from github (and perhaps zigpy-deconz, not sure about my Python setup). My conclusion was that the gateway is bad, but maybe there is a firmware update for it?

Aldineyer on 8 Apr 2020

I got a set of TRÅDFRI bulbs, a remote and the gateway around Christmas. I was experiencing problems controlling my lights from homeassistant and changed to a ConBee II. After struggling a bit with re-pairing things, everything has worked seamlessly for a couple of months. I am running homeassistant on my windows computer from a fairly new version pulled from github (and perhaps zigpy-deconz, not sure about my Python setup). My conclusion was that the gateway is bad, but maybe there is a firmware update for it?

Are you running the Conbee II in ZHA or with the deconz integration?

comatose-tortoise on 8 Apr 2020

ZHA. I recall having problems with the deconz integration.

On Wed, Apr 8, 2020 at 10:56 AM comatose-tortoise notifications@github.com
wrote:

I got a set of TRÅDFRI bulbs, a remote and the gateway around Christmas. I
was experiencing problems controlling my lights from homeassistant and
changed to a ConBee II. After struggling a bit with re-pairing things,
everything has worked seamlessly for a couple of months. I am running
homeassistant on my windows computer from a fairly new version pulled from
github (and perhaps zigpy-deconz, not sure about my Python setup). My
conclusion was that the gateway is bad, but maybe there is a firmware
update for it?

Are you running the Conbee II in ZHA or with the deconz integration?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/core/issues/14386#issuecomment-610838428,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AOF7Q7RAJ6TUGWOCBLS7OGDRLQ32BANCNFSM4E7ODHRA
.

Aldineyer on 8 Apr 2020

Does controlling many lights with one command work well for you? I have a script that turns off all lights, but it doesn't work for the Tradfri lights connected to the Conbee II using ZHA. The first time I activate it, it turns some of them off, another activation turns some more off, and after maybe 4-5 tries, all lights are finally off.

Something you have experienced as well?

comatose-tortoise on 8 Apr 2020

I think I've experienced not all three lamps turning on or off when
controlling them by one command. I need to reproduce that and at some
point, I will make a virtual lamp that controls all the three bulbs in my
living room, ceiling lamp as one.

[image: image.png]

On Wed, Apr 8, 2020 at 2:10 PM comatose-tortoise notifications@github.com
wrote:

Does controlling many lights with one command work well for you? I have a
script that turns off all lights, but it doesn't work for the Tradfri
lights connected to the Conbee II using ZHA. The first time I activate it,
it turns some of them off, another activation turns some more off, and
after maybe 4-5 tries, all lights are finally off.

Something you have experienced as well?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/core/issues/14386#issuecomment-610921089,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AOF7Q7RFZ2MQRAJP5XNHU7TRLRSS7ANCNFSM4E7ODHRA
.

Aldineyer on 8 Apr 2020

I noticed that if I try to turn a light on or off several times it will eventually work, it takes some time and a lot of taps or clicks. when checking with tcpdump I noticed that home-assistant is sending packets to the Tradfri gateway but there is no response, and no action. After some tries home-assistant wil start using a different source port. It looks like the process restarts or initiates a new session with Tradfri.

ekooter on 10 Apr 2020

Could the people who still have problems drop me a line, how your setup is set up? I feel a Infrastructure issue.
Mine: tradfri gateway (v1.10.30) -> UniFi Switch PRO 48 POE (v4.0.80.10875) -> Hyper-V 2019 Host (w. 10GbE Mellanox) -> official vhdx image

LukasQ on 20 Apr 2020

Tradfri gateway (1.10.30) -> Unifi Switch 8 PoE -> Unifi Switch 24 PoE -> 4-node Kubernetes cluster -> homeassistant/home-assistant docker image (v0.108.0).

The dynamic port jiggery-pokery won't play nicely with k8s, and I'm pretty sure that's the problem.

growse on 20 Apr 2020

Tradfri Gateway →Synology RT2600AC → old laptop running Proxmox →Home Assistant.
Previously I had centos 7 with firewall and native installation.

AFAICT, problem is related to firewall and connection tracking of UDP traffic in Linux Kernel. Maybe Tradfri integration should send some dummy traffic to Tradfri Gateway to keep connection in conntrack table with default TTLs?

KrzysztofHajdamowicz on 20 Apr 2020

👍2

Tradfri Gateway →Synology RT2600AC → old laptop running Proxmox →Home Assistant.
Previously I had centos 7 with firewall and native installation.

AFAICT, problem is related to firewall and connection tracking of UDP traffic in Linux Kernel. Maybe Tradfri integration should send some dummy traffic to Tradfri Gateway to keep connection in conntrack table with default TTLs?

Agree.

tynor88 on 20 Apr 2020

Trådfri Gateway (v1.10.30)-> Unifi Switch 16P -> HPE microserver (Fedora server 31) -> Docker (19.03.8)-> Home-Assistant (0.108.6). Trådfri and Home Assistant use the same vlan.
I have been running tcpdump and have seen icmp rejects from server to Trådfri so I agree with krzysztof that it might be related to the linux firewall, although the reject could also come from Home Assistant. Was there some kind of keep alive / heartbeat before or has the timing changed?

ekooter on 20 Apr 2020

Running the following script every minute via cron on the host solved the connection problems I had with Home Assistant inside Docker:

#!/bin/bash -ex

docker exec homeassistant netstat -un | egrep ^udp | while read -r LINE
do
  LINE_ARR=($LINE)

  SOURCE=${LINE_ARR[3]}
  SOURCE_ARR=(${SOURCE//:/ })

  DEST=${LINE_ARR[4]}
  DEST_ARR=(${DEST//:/ })

  /usr/sbin/conntrack -U -p UDP -s ${SOURCE_ARR[0]} --orig-port-src ${SOURCE_ARR[1]} -d ${DEST_ARR[0]} --orig-port-dst ${DEST_ARR[1]} -t 300
done

rusitschka on 22 Apr 2020

👍1

Running the following script every minute via cron on the host solved the connection problems I had with Home Assistant inside Docker:

#!/bin/bash -ex

docker exec homeassistant netstat -un | egrep ^udp | while read -r LINE
do
  LINE_ARR=($LINE)

  SOURCE=${LINE_ARR[3]}
  SOURCE_ARR=(${SOURCE//:/ })

  DEST=${LINE_ARR[4]}
  DEST_ARR=(${DEST//:/ })

  /usr/sbin/conntrack -U -p UDP -s ${SOURCE_ARR[0]} --orig-port-src ${SOURCE_ARR[1]} -d ${DEST_ARR[0]} --orig-port-dst ${DEST_ARR[1]} -t 300
done

What does it exactly do. Any is there any other tool than conntrack I could try with?

tynor88 on 22 Apr 2020

Running the following script every minute via cron on the host solved the connection problems I had with Home Assistant inside Docker:
#!/bin/bash -ex

docker exec homeassistant netstat -un | egrep ^udp | while read -r LINE
do
  LINE_ARR=($LINE)

  SOURCE=${LINE_ARR[3]}
  SOURCE_ARR=(${SOURCE//:/ })

  DEST=${LINE_ARR[4]}
  DEST_ARR=(${DEST//:/ })

  /usr/sbin/conntrack -U -p UDP -s ${SOURCE_ARR[0]} --orig-port-src ${SOURCE_ARR[1]} -d ${DEST_ARR[0]} --orig-port-dst ${DEST_ARR[1]} -t 300
done
What does it exactly do. Any is there any other tool than conntrack I could try with?

It pulls the "connected" UDP client sockets from the container running Home Assistant and uses conntrack to extend the connection timeouts to 5 minutes. Run by a cron job every minute that gives an "infinite" timeout for these connections. I don't know if it can be done without conntrack command.

rusitschka on 23 Apr 2020

Running the following script every minute via cron on the host solved the connection problems I had with Home Assistant inside Docker:
#!/bin/bash -ex

docker exec homeassistant netstat -un | egrep ^udp | while read -r LINE
do
  LINE_ARR=($LINE)

  SOURCE=${LINE_ARR[3]}
  SOURCE_ARR=(${SOURCE//:/ })

  DEST=${LINE_ARR[4]}
  DEST_ARR=(${DEST//:/ })

  /usr/sbin/conntrack -U -p UDP -s ${SOURCE_ARR[0]} --orig-port-src ${SOURCE_ARR[1]} -d ${DEST_ARR[0]} --orig-port-dst ${DEST_ARR[1]} -t 300
done
What does it exactly do. Any is there any other tool than conntrack I could try with?
It pulls the "connected" UDP client sockets from the container running Home Assistant and uses conntrack to extend the connection timeouts to 5 minutes. Run by a cron job every minute that gives an "infinite" timeout for these connections. I don't know if it can be done without conntrack command.

Interesting. Would really like to try it out. I don't have conntrack on my host system (and no way to install it), however I tried pulling a conntrack docker (https://github.com/golights/conntrack) and running the command. However no luck (i get invalid parameters). Probably because the docker doesn't see the UDP connection.

tynor88 on 23 Apr 2020

I experience the same issue.
Tradfri GW -> Wired eth -> HP uServer -> Ubuntu 18.04 -> Docker 19.03.6 -> HA 0.109.0

--- edit
I have just added a following automation:

- id: 'tradfri_keep_alive'
  trigger:
  - minutes: /1
    platform: time_pattern
  action:
  - service_template: light.turn_{{ states('light.tradfri_panel') }}
    entity_id: light.tradfri_panel

Time will only tell if it helps. But so far so good, the connections don't get dropped from the conntrack table.

grogi on 3 May 2020

👍1

I experience the same issue.
Tradfri GW -> Wired eth -> HP uServer -> Ubuntu 18.04 -> Docker 19.03.6 -> HA 0.109.0

--- edit
I have just added a following automation:
- id: 'tradfri_keep_alive'
  trigger:
  - minutes: /1
    platform: time_pattern
  action:
  - service_template: light.turn_{{ states('light.tradfri_panel') }}
    entity_id: light.tradfri_panel
Time will only tell if it helps. But so far so good, the connections don't get dropped from the conntrack table.

Hi Grogi,
So this automation requests the state of a light every minute and sets it to the current state?
I will try this. If it keeps working depends on how Home-Assistant processes automations. If automations can be processed simultanioulsy it can have surprising results. My system should turn lights on/off depending on sun and time. So multiple times per day these automations could coincide with this automation.
Another solution could be changing the docker network setting from host to mac_vlan. But have not yet tried that.

ekooter on 5 May 2020

I've also noticed a problem with the automation. At first, it worked great! But after a few days, the state of the lights don't make it back into HA.

Controlling the lights still works: If I select the light to "on", it turns on. But the new state never makes it back to HA, so within 1 minute, HA automatically turns the light off again.

growse on 13 May 2020

As some of you have a Unifi Setup, I changed below timeouts - lets see the upcoming days, what happens...
grafik

LukasQ on 14 May 2020

The Conntrack timeout happens at the Docker NAT. The traffic does not leave
your local network and doesn't go through the Unify firewall.

On Thu, 14 May 2020 at 12:28, Lukas notifications@github.com wrote:

As some of you have a Unifi Setup, I changed below timeouts - lets see the
upcoming days, what happens...
[image: grafik]
https://user-images.githubusercontent.com/5947912/81929118-b468b880-95e6-11ea-9628-7a15644c07ca.png

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/core/issues/14386#issuecomment-628571404,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADXT6RFB3DF35U3S2PYC5N3RRPIWJANCNFSM4E7ODHRA
.

grogi on 14 May 2020

The Conntrack timeout happens at the Docker NAT. The traffic does not leave your local network and doesn't go through the Unify firewall.
…
On Thu, 14 May 2020 at 12:28, Lukas @.*> wrote: As some of you have a Unifi Setup, I changed below timeouts - lets see the upcoming days, what happens... [image: grafik] https://user-images.githubusercontent.com/5947912/81929118-b468b880-95e6-11ea-9628-7a15644c07ca.png — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14386 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXT6RFB3DF35U3S2PYC5N3RRPIWJANCNFSM4E7ODHRA .

crap.

LukasQ on 14 May 2020

I experience the same issue.
Tradfri GW -> Wired eth -> HP uServer -> Ubuntu 18.04 -> Docker 19.03.6 -> HA 0.109.0

--- edit
I have just added a following automation:
- id: 'tradfri_keep_alive'
  trigger:
  - minutes: /1
    platform: time_pattern
  action:
  - service_template: light.turn_{{ states('light.tradfri_panel') }}
    entity_id: light.tradfri_panel
Time will only tell if it helps. But so far so good, the connections don't get dropped from the conntrack table.

I can confirm that this script works. Just a pitty the Docker users have to create these "hacks" to make it work properly...

tynor88 on 14 May 2020

I have since attached my home-assistant container to macvlan network, as @ekooter suggested. No complains at all.

grogi on 14 May 2020

I have since attached my home-assistant container to macvlan network, as @ekooter suggested. No complains at all.

Could you share the docker run command you use for this?

tynor88 on 14 May 2020

First you create docker network:

docker network create -d macvlan --subnet=192.168.0.0/24 --gateway=192.168.0.1 --ip-range=192.168.0.240/28 -o parent=eno1 my-macvlan

Your setting will be different here of course. Containers attached to my-macvlan network will not get IP from DHCP, but will get one assigned by Docker internally - it is good idea to reconfigure your DHCP server and limit its address range, so there is no overlap. In the example above Docker will assign IPs in range 192.168.0.240-192.168.0.254.

Then you run with --network option:

docker run --network=my-macvlan ......

Alternatively, if you deploy with docker-compose, it would be:

services:
  home-assistant:
    container_name: home-assistant
    restart: unless-stopped
    image: homeassistant/home-assistant
    volumes:
      - ${CONFIG_ROOT}/hass/config:/config
      - /etc/localtime:/etc/localtime:ro
      - ${CONFIG_ROOT}/hass/media:/media`
    networks:
      - macvlan
      - default 
      # Comment the above out if you don't run other services inside your stack and don't need ha to talk to those containers inside the stack network

networks:
  macvlan:
    external:
      my-macvlan

With macvlan you don't need to provide any port mappings (no -p options).

grogi on 14 May 2020

Running the following script every minute via cron on the host solved the connection problems I had with Home Assistant inside Docker:

#!/bin/bash -ex

docker exec homeassistant netstat -un | egrep ^udp | while read -r LINE
do
  LINE_ARR=($LINE)

  SOURCE=${LINE_ARR[3]}
  SOURCE_ARR=(${SOURCE//:/ })

  DEST=${LINE_ARR[4]}
  DEST_ARR=(${DEST//:/ })

  /usr/sbin/conntrack -U -p UDP -s ${SOURCE_ARR[0]} --orig-port-src ${SOURCE_ARR[1]} -d ${DEST_ARR[0]} --orig-port-dst ${DEST_ARR[1]} -t 300
done

@rusitschka I'm having what seems like an identical issue to this. When I run your script via cron each minute, is the expected output in logs/bash?

+ read -r LINE
+ docker exec Home-Assistant-Core netstat -un
+ egrep '^udp'

I'm running this via user scripts in Unraid if that makes a difference. My docker is called "Home-Assistant-Core" hence that changing.

Nelinski on 22 Aug 2020

I had the same experience. Seems like some kind of effect my pfSense - Avahi mdns had. So now i have a working tråd-fri gateway and not a working sonos vlan mdns bridge. Oh well....

/T

tpihl on 5 Oct 2020

For those who are running Hass.io on a Synology NAS I also figured out a workaround.
Unfortunately the Linux modified by Synology comes without the conntrack module, so keeping those connections alive by that method does not work. But you can frequently restart the Tradfri-Integration to keep the states updated. How?

Find the IKEA TRÅDFRI integration on your integrations page
Open your browser developer tools by pressing F12 and navigate to the "Network" pane.
Click the three-dots menu on the IKEA TRÅDFRI integration and select "Reload"
A request called "reload" with a status 200 shows up in the developer tools under the network pane. Right-click it and copy the requested URL (in Chrome: right-click -> Copy ->Copy link address)
Edit your configuration.yaml and add the following lines:
rest_command: tradfri_restart: url: [Paste the request URL from step 4 here] method: post content_type: "application/json" headers: Authorization: !secret tradfri_workaround_token and paste the request URL from step 4 at the url field.
Create a long-lived access token by going to your profile page, scroll down and under "Long-Lived Access Tokens" click "Create token". Enter a proper name and copy the token.
Create a line in your secrets.yaml file:
tradfri_workaround_token: Bearer [your token goes here] and paste the token after the word "Bearer" (there must be a space between "Bearer" and your token.
Test your newly created rest command by opening the Developer Tools in HomeAssistant, browse to the services page, enter: rest_command.tradfri_restart and click "Call service". Wait 10 seconds and browse to Configuration -> Logs and make sure no error shows up. This is important as you risk of being locked out when you use wrong credentials when authenticating at the HomeAssistant API
Save your files an restart HomeAssistant
Create an automation which calls rest_command.tradfri_restart every minute.

mp68 on 11 Oct 2020

👀1 🚀1

For those who are running Hass.io on a Synology NAS I also figured out a workaround.
Unfortunately the Linux modified by Synology comes without the conntrack module, so keeping those connections alive by that method does not work. But you can frequently restart the Tradfri-Integration to keep the states updated. How?

Find the IKEA TRÅDFRI integration on your integrations page

Open your browser developer tools by pressing F12 and navigate to the "Network" pane.

Click the three-dots menu on the IKEA TRÅDFRI integration and select "Reload"

A request called "reload" with a status 200 shows up in the developer tools under the network pane. Right-click it and copy the requested URL (in Chrome: right-click -> Copy ->Copy link address)

Edit your configuration.yaml and add the following lines:
rest_command: tradfri_restart: url: [Paste the request URL from step 4 here] method: post content_type: "application/json" headers: Authorization: !secret tradfri_workaround_token and paste the request URL from step 4 at the url field.

Create a long-lived access token by going to your profile page, scroll down and under "Long-Lived Access Tokens" click "Create token". Enter a proper name and copy the token.

Create a line in your secrets.yaml file:
tradfri_workaround_token: Bearer [your token goes here] and paste the token after the word "Bearer" (there must be a space between "Bearer" and your token.

Test your newly created rest command by opening the Developer Tools in HomeAssistant, browse to the services page, enter: rest_command.tradfri_restart and click "Call service". Wait 10 seconds and browse to Configuration -> Logs and make sure no error shows up. This is important as you risk of being locked out when you use wrong credentials when authenticating at the HomeAssistant API

Save your files an restart HomeAssistant

Create an automation which calls rest_command.tradfri_restart every minute.

That's a very clever workaround! IMO that's should be included in integration itself so this issue will be solved.

KrzysztofHajdamowicz on 12 Oct 2020

The automation solution provided earlier is much more simple and works well.

On Sun, 11 Oct 2020, 22:56 mp68, notifications@github.com wrote:

For those who are running Hass.io on a Synology NAS I also figured out a
workaround.
Unfortunately the Linux modified by Synology comes without the conntrack
module, so keeping those connections alive by that method does not work.
But you can frequently restart the Tradfri-Integration to keep the states
updated. How?

Find the IKEA TRÅDFRI integration on your integrations page

Open your browser developer tools by pressing F12 and navigate to
the "Network" pane.

Click the three-dots menu on the IKEA TRÅDFRI integration and
select "Reload"

A request called "reload" with a status 200 shows up in the
developer tools under the network pane. Right-click it and copy the
requested URL (in Chrome: right-click -> Copy ->Copy link address)

Edit your configuration.yaml and add the following lines:
rest_command: tradfri_restart: url: [Paste the request URL from step 4
here] method: post content_type: "application/json" headers: Authorization:
!secret tradfri_workaround_token and paste the request URL from step 4
at the url field.

Create a long-lived access token by going to your profile page,
scroll down and under "Long-Lived Access Tokens" click "Create token".
Enter a proper name and copy the token.

Create a line in your secrets.yaml file:
tradfri_workaround_token: Bearer [your token goes here] and paste the
token after the word "Bearer" (there must be a space between "Bearer" and
your token.

Test your newly created rest command by opening the Developer Tools
in HomeAssistant, browse to the services page, enter:
rest_command.tradfri_restart and click "Call service". Wait 10 seconds and
browse to Configuration -> Logs and make sure no error shows up. This is
important as you risk of being locked out when you use wrong credentials
when authenticating at the HomeAssistant API

Save your files an restart HomeAssistant

Create an automation which calls rest_command.tradfri_restart
every minute.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/core/issues/14386#issuecomment-706767000,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAV6P7ZA7BOGHRETYRBDCZDSKILXDANCNFSM4E7ODHRA
.

tynor88 on 12 Oct 2020

👍1

The automation solution provided earlier is much more simple and works well.

I agree that the other solution is much more simple. However it does not work under all conditions. So consider my solution as the solution of last resort ;-)

mp68 on 12 Oct 2020

Got a problem with:

Find the IKEA TRÅDFRI integration on your integrations page
Open your browser developer tools by pressing F12 and navigate to the "Network" pane.
Click the three-dots menu on the IKEA TRÅDFRI integration and select "Reload"
A request called "reload" with a status 200 shows up in the developer tools under the network pane. Right-click it and copy the requested URL (in Chrome: right-click -> Copy ->Copy link address)

Don't have a reload. I have Rename and the 3 dots. or do i look at the wrong page?

snhnic on 14 Nov 2020

Core: Tradfri lights stop working after a couple of hours

Most helpful comment

All 251 comments

Related issues