Core: Envisalink dropping connection - Fatal error on transport TCPTransport

Created on 10 Jul 2018  Â·  21Comments  Â·  Source: home-assistant/core

Home Assistant release (hass --version):
0.73.1

Python release (python3 --version):
3.6.6

Component/platform:
Envisalink

Description of problem:
Envisalink component drops connection with fatal error. Dropped connections are new with 0.73. Prior only errors but no dropped connections. I tested with a clean install with only Z wave and Envisalink installed, errors out.

Had to go back to 0.69.1 to stop fatal error (tried all releases between 0.73.1 and 0.69.1)

0.69.1 has warnings but no fatal errors. Maybe because python is 3.6.5?

Expected:
No error

Problem-relevant configuration.yaml entries and steps to reproduce:

homeassistant:
  name: Ranger Home
  latitude: !secret home_latitude
  longitude: !secret home_longitude
  elevation: !secret home_elevation
  unit_system: imperial
  time_zone: America/Los_Angeles


frontend:
  javascript_version: latest

config:

zwave:
  usb_path: /dev/ttyACM0
  polling_interval: 60000
  network_key: !secret zwkey

http: 
  api_password: !secret http_password
  base_url: !secret baseurl

logger:
  default: debug


conversation:

map:
sun:
tts:
  - platform: google
cloud:

#envisalink
envisalink:
  host: 192.168.1.132
  panel_type: HONEYWELL
  user_name: !secret envisausername
  password: !secret envisapass
  code: !secret envisacode
  port: 4025
  evl_version: 3
  keepalive_interval: 60
  zonedump_interval: 30
  panic_type: Police
  zones:
    1:
      name: 'Hall Smoke Alarm'
      type: 'smoke'
    2:
      name: 'Living Room Motion'
      type: 'motion'
    3:
      name: 'Hall CO Sensor'
      type: 'smoke'
    4:
      name: 'Patio Door'
      type: 'door'
    5:
      name: 'Front Door'
      type: 'door'
#    6:
#      name: 'Kitchen Window'
#      type: 'window'
    7:
      name: 'Guest BR Window'
      type: 'window'
    8:
      name: 'Master Bath Window'
      type: 'window'
    9:
      name: 'Boys BR Motion'
      type: 'motion'
    10:
      name: 'Playroom Motion'
      type: 'motion'
    11:
      name: 'Master BR Motion'
      type: 'motion'
    17:
      name: 'Hall Bath Window'
      type: 'window'
#    18:
#      name: 'Garage Entry Door'
#      type: 'door'
    19:
      name: 'Garage Rollup Door'
      type: 'door'
    20:
      name: 'Garage Side Door'
      type: 'door'
    21:
      name: 'Hall Motion'
      type: 'motion'
    22:
      name: 'Dining Room Window'
      type: 'window'
    23:
      name: 'Play Room Window'
      type: 'window'
    24:
      name: 'Master BR Large Window'
      type: 'window'
    25:
      name: 'Office Motion'
      type: 'motion'
    26:
      name: 'Kitchen Window'
      type: 'window'
    27:
      name: 'Garage Entry Door'
      type: 'door'

  partitions:
    1:
      name: 'Home Alarm'


Attached Logs and config file.
home-assistant (6).log

Line 42853

Most helpful comment

Okay i have pushed an update on my local branch: https://github.com/Cinntax/home-assistant/tree/fix_envisalink_reconnect

Can someone on the thread here also give it a test? Things look good from here- i was able to force this situation by pulling the network cable, and plugging it back in- i was able to recover from that without the loop.

All 21 comments

@Cinntax Possible for you to look at this?

Thanks for flagging me down- sure I’ll take a look- it may be tomorrow at this point though!

Issue still exists in 0.74.0

I'm running into what I think is the same problem. Envisalink connection works for a while (usually about half a day) before going into the same disconnect / reconnect loop. I think the loop is caused by the Envisalink thinking HA is still connected but HA not realizing that it is.

At the very least, there's a bug where code is used before being defined (https://github.com/Cinntax/pyenvisalink/blob/master/pyenvisalink/honeywell_client.py#L82), triggering an exception. The real question is what's the data that's Envisalink's sending back that resulted in code not being set. I just added some debug logging on my installation to see if I can catch it. After that, should be pretty straightforward to make sure that unexpected data doesn't throw an exception causing HA to try and reconnect.

Will report back.

Okay- I took a look, i think i know what's going on here. A couple of things:
1) I do agree that the pyenvisalink package is not properly "ignoring" messages that are invalid, and the debug message is trying to print out the result anyway, resulting in an exception- so I can go ahead and fix that- however I put some logic in my test environment to force that scenario, and it doesn't cause anything unrecoverable. So I don't think this is the root cause- even though the messages are not desirable.

2) I think the real issue here is that there is some logic in the envisalink hass component that is called upon connection made, login success, etc. and that logic currently is not handling a connection failure/reconnect scenario. It assumes that the only time we'll go through a connection sequence with the envisalink is upon initial start up- so I just need to put logic in to be able to handle that properly, and we should be good.

Now, what i cannot tell is- why you're losing connection to the envisalink in the first place. I haven't been able to recreate that. However by fixing these 2 things I think we can at least make the system more robust when it occurs.

Great news. Thanks

Okay i have pushed an update on my local branch: https://github.com/Cinntax/home-assistant/tree/fix_envisalink_reconnect

Can someone on the thread here also give it a test? Things look good from here- i was able to force this situation by pulling the network cable, and plugging it back in- i was able to recover from that without the loop.

I'll give it a shot tonight. It usually takes a day or two for the error to happen for me but I'll report any issues. Thanks for putting together a fix!

Actually, had some time now so running both pyenvisalink 2.3 and your homeassistant fork.

In case it's helpful, I had the error happen again (on the old, unpatched code) last night. There wasn't anything in the log that explained the server disconnect. I did notice that when I shutdown Homeassistant, it had a ton of event loops for envisalink:

2018-07-30 17:21:45 INFO (MainThread) [homeassistant.components.envisalink] Shutting down Envisalink 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.alarm_panel] Disconnecting from the envisalink... 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.envisalink_base_client] An event loop was given to us- we will shutdown when that event loop shuts down. 2018-07-30 17:21:45 INFO (MainThread) [homeassistant.components.envisalink] Shutting down Envisalink 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.alarm_panel] Disconnecting from the envisalink... 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.envisalink_base_client] An event loop was given to us- we will shutdown when that event loop shuts down. 2018-07-30 17:21:45 INFO (MainThread) [homeassistant.components.envisalink] Shutting down Envisalink 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.alarm_panel] Disconnecting from the envisalink... 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.envisalink_base_client] An event loop was given to us- we will shutdown when that event loop shuts down. 2018-07-30 17:21:45 INFO (MainThread) [homeassistant.components.envisalink] Shutting down Envisalink 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.alarm_panel] Disconnecting from the envisalink... 2018-07-30 17:21:45 INFO (MainThread) [pyenvisalink.envisalink_base_client] An event loop was given to us- we will shutdown when that event loop shuts down. ...

That repeated dozens and dozens of times. Maybe when HA got a disconnect, it wasn't cleaning up the old connection? It's also possible that the original connection with the Envisalink wasn't actually lost (or at least the Envisalink didn't realize it was lost). Anyway, thought I'd pass along in case helpful. Will let you know if it works or not :)

Giving it a try on 0.74.2 will advise.

Well after about an hour I’m about 99% sure this took care of it. I couldn’t even make it 5 mins without disconnect. It still disconnects occasionally but cleanly reconnects. Will advise again tomorrow.

Okay great to hear! Yeah not really sure why some people (you're not the only one), have had network connectivity issues with their envisalink device... but it's definitely important that we at least handle it gracefully.

Cameronr- i guess i haven't seen quite so many of those shutdown messages in a row, but i don't think it's related necessarily. The main bug here is that pyenvisalink was properly performing reconnection logic, but not properly informing HA about the various steps- so HA never really knew about the disconnect- it would just get a random "hey i'm connected" event when it wasn't expecting it. I'll take a look to see if there is a possibility of us re-creating new connections (that would be bad), but I don't think so- there is a framework for network connectivity that I'm using, and it pretty much takes care of things.

Although I had multiple disconnects over the night last night, they all recovered without issue. Looks like this fix took care of the problem for me. Will report if anything changes.

Also had a disconnect last night and it reconnected without any problem. Seems like it's fixed! Thanks again for jumping on it!

Looks like this change may have caused unintended consequences. Please see this:
https://community.home-assistant.io/t/envisalink-seems-spotty-after-latest-upgrades/62325/3

@bdf0506 I don't think the fixes for this issue have been merged into Home Assistant master yet. For example, pyenvisalink 2.2 is still referenced even in 0.75.0:

https://github.com/home-assistant/home-assistant/blob/0.75.0/homeassistant/components/envisalink.py

It might even be the case that these changes might help (if the issue is related to disconnects).

yes that's correct that this change has yet to make it into the mainline. not sure i would have made the cutoff. I do agree this needs to get in ASAP though, since if we're going to be losing connections, we need to be able to recover. I'm still a bit unclear as to how we're losing connection though, as i haven't been able to reproduce :(

I'm having this, too. Let me know if I can help.
77.3

2018-09-06 20:26:25 ERROR (MainThread) [pyenvisalink.envisalink_base_client] The server closed the connection. Reconnecting... 2018-09-07 00:27:29 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting 2018-09-07 02:00:23 ERROR (MainThread) [homeassistant.components.camera.generic] Timeout getting camera image 2018-09-07 02:00:33 ERROR (MainThread) [homeassistant.components.camera.generic] Error getting new camera image: Cannot connect to host 192.168.1.23:80 ssl:None [Connection refused] 2018-09-07 02:00:43 ERROR (MainThread) [homeassistant.components.camera.generic] Error getting new camera image: Cannot connect to host 192.168.1.23:80 ssl:None [Connection refused] 2018-09-07 02:19:29 WARNING (Thread-3) [pychromecast.socket_client] Heartbeat timeout, resetting connection 2018-09-07 02:19:45 ERROR (Thread-3) [pychromecast.socket_client] Failed to connect, retrying in 5.0s 2018-09-07 04:17:49 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting 2018-09-07 04:25:41 ERROR (MainThread) [pyenvisalink.envisalink_base_client] The server closed the connection. Reconnecting...

Hi @bmorgan99 - i know there is one more change in flight that's being tested right now that's related to this, where I need to be more robust within the pyenvisalink library in how I handle garbage data right around the time a network connectivity issue occurs. It's more of a symptom though, we haven't been able to pinpoint an issue that actually causes network to drop.

In your case, does everything sufficiently reconnect again after the network comes back up?

Yes, it comes back fairly quickly in working order. Another issue I have
that may be related is a node-red flow I have that sends an email when a
zone is triggered with the alarm set. I frequently get 2 or 3 emails from
one event.
Thanks for your efforts.

On Tue, Sep 11, 2018, 8:07 PM Dan Cinnamon notifications@github.com wrote:

Hi @bmorgan99 https://github.com/bmorgan99 - i know there is one more
change in flight that's being tested right now that's related to this,
where I need to be more robust within the pyenvisalink library in how I
handle garbage data right around the time a network connectivity issue
occurs. It's more of a symptom though, we haven't been able to pinpoint an
issue that actually causes network to drop.

In your case, does everything sufficiently reconnect again after the
network comes back up?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/home-assistant/home-assistant/issues/15407#issuecomment-420476981,
or mute the thread
https://github.com/notifications/unsubscribe-auth/Acl9GNiLgBGQ6ndPfz_061KIiBVlfAQPks5uaF5KgaJpZM4VJh5-
.

I'm having the same disconnect issue, it's actually been happening for a lot of releases but just haven't had a lot of time to dig into it and was just hoping upgrading would eventually fix it. Here's my log:

2018-12-01 15:21:38 INFO (MainThread) [homeassistant.loader] Loaded alarm_control_panel.envisalink from homeassistant.components.alarm_control_panel.envisalink
2018-12-01 15:21:39 INFO (MainThread) [homeassistant.components.alarm_control_panel] Setting up alarm_control_panel.envisalink
...
2018-12-01 16:39:38 INFO (MainThread) [homeassistant.components.envisalink] Envisalink sent a zone update event. Updating zones...
2018-12-01 16:39:55 INFO (MainThread) [homeassistant.components.envisalink] Envisalink sent a zone update event. Updating zones...
2018-12-01 16:39:55 INFO (MainThread) [homeassistant.components.envisalink] The envisalink sent a partition update event
2018-12-01 16:40:01 INFO (MainThread) [homeassistant.components.envisalink] Envisalink sent a zone update event. Updating zones...
2018-12-01 16:40:01 INFO (MainThread) [homeassistant.components.envisalink] The envisalink sent a partition update event
...
2018-12-01 18:52:38 INFO (MainThread) [homeassistant.components.envisalink] Envisalink sent a zone update event. Updating zones...
2018-12-01 18:53:08 INFO (MainThread) [homeassistant.components.envisalink] Envisalink sent a zone update event. Updating zones...
2018-12-01 18:53:37 ERROR (MainThread) [pyenvisalink.envisalink_base_client] The server closed the connection. Reconnecting...
2018-12-01 18:53:42 INFO (MainThread) [pyenvisalink.envisalink_base_client] Started to connect to Envisalink... at 192.168.1.10:4025
2018-12-01 18:53:42 INFO (MainThread) [pyenvisalink.envisalink_base_client] Connection Successful!

Nothing appears after it reconnects and I don't think it actually is connected.

Was this page helpful?
0 / 5 - 0 ratings