Home Assistant release with the issue:
0.79.0b1
Last working Home Assistant release (if known):
Operating environment (Hass.io/Docker/Windows/etc.):
Hassio
Component/platform:
MyQ
Description of problem:
Usually get two errors at the same time, this one and the one posted in the traceback. Not sure if the two errors are related.
MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get
Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):
cover:
- platform: myq
username: myemail
password: mypassword
type: chamberlain
Traceback (if applicable):
Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity.py", line 234, in async_update_ha_state
state = self.state
File "/usr/local/lib/python3.6/site-packages/homeassistant/components/cover/__init__.py", line 231, in state
if self.is_opening:
File "/usr/local/lib/python3.6/site-packages/homeassistant/components/cover/myq.py", line 109, in is_opening
return MYQ_TO_HASS[self._status] == STATE_OPENING
KeyError: False
Additional information:
CC @geekofweek
We should probably check the return value of self.myq.get_status(self.device_id) in update, and if False, set the entity to unavailable, and not update self._status.
the side effect of this is that with the current stable it gives false positives, i.e door open and closed when there is nothing happening with the door
Yeah this looks like the MyQ servers returning a 500 internal server error which returns a value of False that it doesn't expect.
@vajonam are talking about the current beta version or the current public release of 0.78.3? Beta has a fix that addressed not handling the OPEN state properly that was causing some issues, this looks like more like an error handling problem.
@geekofweek, i was still on 0.77.3 since the fix didn't make it into 0.79b3. I did a quick smoketest with dev, but there were other issues with nodered that made be go back to a stable env which was 0.77.3. I am now testing with 0.79b3 which has the OPEN state handing code, and see if this issues
comeback.
@geekofweek I can confirm that his doesn't cause the behaviour of resetting the cover state in 0.77b3. i still see this on a 500 returned from the server, still need some error handling I believe. have been seeing more of these errors from myq lately.
```2018-09-28 10:01:20 ERROR (SyncWorker_15) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get
2018-09-28 10:01:20 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/usr/src/app/homeassistant/helpers/entity.py", line 234, in async_update_ha_state
state = self.state
File "/usr/src/app/homeassistant/components/cover/__init__.py", line 231, in state
if self.is_opening:
File "/usr/src/app/homeassistant/components/cover/myq.py", line 109, in is_opening
return MYQ_TO_HASS[self._status] == STATE_OPENING
KeyError: False
````
The pymyq library is returning the False since the API request to Chamberlain failed to return data. Should the update simply check for not False and only update self._status in that instance? Or should we add the False key to the MYQ_TO_HASS with a state of UNKNOWN? I'm starting to believe that Chamberlain may have implemented some type of rate limiting on their API and the default update interval of 15 seconds may be causing some to hit up against it. I very rarely see the API error but I've been using an interval of 300 seconds.
I've noticed seeing those errors as well leading me to agree with @arraylabs that Chamberlain is doing some sort of rate limiting, or their servers are just failing (also believable).
Interesting though, if we go the route of returning and UNKNOWN that could show the problem for a lot of people having the device go into that state on and off, or we just ignore it.
I like the returning unknown route, agreed that automations based on those will have to change. but think its a necessary change.
I'll do some local testing and see how it works. I also updated my scan_interval to 30 seconds to see if I get less errors back.
I have an automation that announces garage door changes on a Google Home. I had to disable it since it kept announcing garage door changes randomly throughout the day and was driving my wife nuts. It seems that the 500 errors happen somewhat sporadically and recover quickly (again could be rate limiting), so it may be easier for automations if it simply doesn't update status when myq returns False, since you woudn't have to know to check for an UNKNOWN status. Just my $0.02.
FYI, I'm seeing the same error. Looking over the comments on how to handle this, I agree with both. If unable to get the status then it should not keep on reporting the old status. The old status for example might be that the garage is closed but instead it is open.
Then, I also see that changing the status from unknown to a known status and this happening regularly would be a problem as well from usability.
Would it not be possible instead to only set the status to unknown IF it has been unable to get the status for x amount of times?
For example, with the interval of 15 seconds. If setting the interval for unknown then to 4 then it means that HASS was unable to get the correct status for 4 times or 1 minute. Once it is able to get the status then that resets again.
This would prevent the randomly/regularly announcing the garage door throughout the day yet recovering quickly as mentioned by mmotley999 yet at the same time don't let the user in believe that the door is open or closed whereas it isn't but HASS has been unable to retrieve the correct status for a longer time.
My $0.02. :-)
I think there are two issues here getting crossed, one was a previous issue that it had similar behavior and didn't understand the OPEN state. That caused all kinds of issues. I had a pull request merged to fix that one and should show up with 0.79.
This is a similar type of error, except it doesn't know how to handle a failed state when MyQ sends back a 500 error. We could address it by setting the state to unknown or ignore getting a status update if the status if failed. If the issue is MyQ setting rate limiting, and the default being 15 seconds, it could flip in and out of an Unknown status regularly if we go with setting the state to UNKNOWN.
I would be afraid in trusting the status from HASS if the status is left "as-is" but HASS has been unable to get the actual status for a while. This means that if for whatever reason getting the status has been failing for hours, HASS might still report that the door is indeed closed whereas it is now open. If it states unknown then one at least knows to have it check some other way (i.e. walking in the garage and check).
But if the status is set to unknown after failing to get the actual status for x amount of time then at least one can see that HASS has been unable to get it, yet there is no constant flipping between unknown and known. :-)
@ehendrix23 see my pull request, run it as a custom component, that should fix your issue.
As the status 500 is being discussed. I also just had it happen that the status 500 occurred when trying to open the garage door. Not sure if something can be done so that if that status is received when trying to open/close; to wait a little bit and try again?
I can see the point about setting it to UNKNOWN for extended MyQ server outages, and can easily bake that into my automation. I will absolutely defer to the experts on this! :)
@geekofweek, issue I'm talking is not the "Open" issue mentioned in that pull request but rather the 500 Server Error issue with a traceback showing "KeyError: False" that sometimes happens as discussed here.
Following is that I am seeing:
2018-09-28 09:59:04 ERROR (SyncWorker_7) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get
2018-09-28 09:59:04 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/helpers/entity.py", line 235, in async_update_ha_state
state = self.state
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/components/cover/__init__.py", line 220, in state
if self.is_opening:
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/components/cover/myq.py", line 109, in is_opening
return MYQ_TO_HASS[self._status] == STATE_OPENING
KeyError: False
I've tested the below on my end, but typically not real fond of using sleep, but maybe this helps in the case @ehendrix23 mentioned:
def close_cover(self, **kwargs):
"""Issue close command to cover."""
self.myq.close_device(self.device_id)
close_interations = 0
while True:
self._status = self.myq.get_status(self.device_id)
if self._status == STATE_CLOSED:
break
if close_interations > 5:
_LOGGER.error(
"Failed to confirm closed status for %s "
"after 60 seconds", self._name)
break
close_interations += 1
sleep(10)
def open_cover(self, **kwargs):
"""Issue open command to cover."""
self.myq.open_device(self.device_id)
open_interations = 0
while True:
self._status = self.myq.get_status(self.device_id)
if self._status == STATE_OPEN:
break
if open_interations > 5:
_LOGGER.error(
"Failed to confirm open status for %s "
"after 60 seconds", self._name)
break
open_interations += 1
sleep(10)
I wrote this before the discrete states were added.
@arraylabs
It looks like open_device and close_device will return True if the status code = 200 and otherwise will return False if it failed to initiate the command to MyQ.
Thus, instead of waiting to determine if the garage is open or closed I think the following would have it re-try if it failed and do so every 10 seconds for 6 times. After that it will return a False (otherwise True will be returned).
def close_cover(self, **kwargs): """Issue close command to cover.""" iterations = 0 while True: if self.myq.close_device(self.device_id) break if iterations > 5: _LOGGER.error( "Failed to close for %s " "after 60 seconds", self._name) break iterations += 1 sleep(10) return iterations <= 5 def open_cover(self, **kwargs): """Issue open command to cover.""" iterations = 0 while True: if self.myq.open_device(self.device_id) break if iterations > 5: _LOGGER.error( "Failed to open for %s " "after 60 seconds", self._name) break iterations += 1 sleep(10) return iterations <= 5
A similar check is needed for update() as well, since the 500 error when updating status results in the garage door changing state.
Also quite sure myq is now sending an new state. I need to add the following code pymyq/__init__.py to handle that, that is why we get the keyError 0,
'''
STATE_OPEN = 'open'
STATE_CLOSED = 'closed'
STATE_STOPPED = 'stopped'
STATE_OPENING = 'opening'
STATE_CLOSING = 'closing'
STATE_UNKNOWN = 'unknown'
STATE_TRANSITION = 'transition'
STATE_MOVING = 'moving'
LOGIN_ENDPOINT = "api/v4/User/Validate"
DEVICE_LIST_ENDPOINT = "api/v4/UserDeviceDetails/Get"
DEVICE_SET_ENDPOINT = "api/v4/DeviceAttribute/PutDeviceAttribute"
USERAGENT = "Chamberlain/3773 (iPhone; iOS 11.0.3; Scale/2.00)"
REQUEST_TIMEOUT = 3.0
DOOR_STATE = {
'0': STATE_MOVING,
'1': STATE_OPEN,
'2': STATE_CLOSED,
'3': STATE_STOPPED,
'4': STATE_OPENING,
'5': STATE_CLOSING,
'6': STATE_UNKNOWN,
'7': STATE_UNKNOWN,
'8': STATE_TRANSITION,
'9': STATE_OPEN,
'''
@arraylabs can you test this out, am quite sure the now a new 0 state being returned by the API, maybe we need to check for key that doesn't exist and change to an unknown and send it back this should insulate us from other changes.
along with the following mqy.py code changes seems to work okay. I am using 0.79.0 as base.
'''
def update(self):
"""Update status of cover."""
self._status == self.myq.get_status(self.device_id)
open_interations = 0
while True:
self._status = self.myq.get_status(self.device_id)
if self._status == STATE_OPEN or self._status == STATE_CLOSED:
break
if open_interations > 5:
_LOGGER.error(
"Failed to confirm open status for %s "
"after 60 seconds", self._name)
break
open_interations += 1
time.sleep(10)
def close_cover(self, **kwargs):
"""Issue close command to cover."""
self.myq.close_device(self.device_id)
close_interations = 0
while True:
self._status = self.myq.get_status(self.device_id)
if self._status == STATE_CLOSED:
break
if close_interations > 5:
_LOGGER.error(
"Failed to confirm closed status for %s "
"after 60 seconds", self._name)
break
close_interations += 1
time.sleep(10)
def open_cover(self, **kwargs):
"""Issue open command to cover."""
self.myq.open_device(self.device_id)
open_interations = 0
while True:
self._status = self.myq.get_status(self.device_id)
if self._status == STATE_OPEN:
break
if open_interations > 5:
_LOGGER.error(
"Failed to confirm open status for %s "
"after 60 seconds", self._name)
break
open_interations += 1
time.sleep(10)
'''
@vajonam,
Only thing with the open/close like that is when the actual sending of the command for open/close failed. This is what I had seen on mine, in that case it will always just wait for 60 seconds and then give a error instead of trying to send the command for open/close again.
@ehendrix23 agreed, but the URL that sees to get the 500 and his the rate limit is get_devices call for the status update, so we can roll back the code the open and close methods and only leave it on the update method. I have never seen this happen in open/close API call, unless there is a larger network issue.
@vajonam,
I've had it happen, and there was no network issue or so:
2018-09-28 22:30:15 ERROR (SyncWorker_4) [pymyq] MyQ - API Error[set_state] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/DeviceAttribute/PutDeviceAttribute
2018-09-28 22:30:15 ERROR (SyncWorker_6) [pymyq] MyQ - API Error[set_state] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/DeviceAttribute/PutDeviceAttribute
2018-09-28 22:30:16 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/helpers/entity.py", line 235, in async_update_ha_state
state = self.state
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/components/cover/__init__.py", line 220, in state
if self.is_opening:
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/components/cover/myq.py", line 109, in is_opening
return MYQ_TO_HASS[self._status] == STATE_OPENING
KeyError: False
2018-09-28 22:30:16 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/helpers/entity.py", line 235, in async_update_ha_state
state = self.state
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/components/cover/__init__.py", line 220, in state
if self.is_opening:
File "/Users/Media/homeassistant/lib/python3.7/site-packages/homeassistant/components/cover/myq.py", line 109, in is_opening
return MYQ_TO_HASS[self._status] == STATE_OPENING
KeyError: False
And when looking at myq.py; it will return False if it failed to send the command otherwise it will send True. I can see why it mainly fails with the Get as that is mostly run and hence the change of error 500 is a lot bigger with that compared to the Open/Close.
@vajonam,
Just some feedback. I implemented your two changes to myq.py and pymyq.py. As I mentioned before, I have an automation that announces garage door changes on my Google Home, and was getting false announcements throughout the day each time Chamberlain's server would return a 500 response.
Been running them since yesterday, and even though I have a slew of '500' responses, no false announcements anymore!
Oct 3 10:03:27 hass1 hass[26067]: #033[31m2018-10-03 10:03:27 ERROR (SyncWorker_0) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get#033[0m
Oct 3 10:06:45 hass1 hass[26067]: #033[31m2018-10-03 10:06:45 ERROR (SyncWorker_15) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get#033[0m
Oct 3 10:07:18 hass1 hass[26067]: #033[31m2018-10-03 10:07:18 ERROR (SyncWorker_5) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get#033[0m
Oct 3 10:08:24 hass1 hass[26067]: #033[31m2018-10-03 10:08:24 ERROR (SyncWorker_14) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get#033[0m
Oct 3 10:17:44 hass1 hass[26067]: #033[31m2018-10-03 10:17:44 ERROR (SyncWorker_14) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get#033[0m
Oct 3 10:19:34 hass1 hass[26067]: #033[31m2018-10-03 10:19:34 ERROR (SyncWorker_13) [pymyq] MyQ - API Error[get_devices] 500 Server Error: Internal Server Error for url: https://myqexternal.myqdevice.com/api/v4/UserDeviceDetails/Get#033[0m
@geekofweek Do we know if this truly is rate limiting? If so, then, in addition to the robust design decisions regarding fault handling, should we also consider providing configuration of the query rate also?
For example, I know that for the Honeywell platform in home assistant climate that there is a configurable scan_interval. There doesn't appear to be a configuration item for cover. Perhaps there should be. I don't know what the default query is and how many queries are being rejected or rate limited. In my opinion, queries more often than 1 per minute seems excessive. Queries in 2-5 minutes seem reasonable. But the user should be allowed to configure as desired.
climate:
- platform: honeywell
username: YOUR_USERNAME
password: YOUR_PASSWORD
scan_interval: 600
NOTE: Scan interval is expressed in seconds. Omitting scan_interval may result in too-frequent polling and cause you to rate-limited by Honeywell.
I will help where I can to contribute to closing this issue.
@talljohnson,
If allow to increase the interval then I think that the coding to update the status after doing an open or close would need to be in there otherwise the status would still show for example "closed" when the garage was opened.
Besides that, would make sense to me to reduce the times to check. Right now it is 15 seconds. Maybe having that configuration in there with a default of 1 minute?
@schmittx this also works well, seems like a cleaner way. However doesn't help with if the service is unavailable to open/close but works with not flip/floping the state on a 500 error
I think PR 17207 was added to 80.0b3 but I still get errors after installing b3.
This PR doesn鈥檛 fix the 500 errors but the issue of the cover changing state because the 500 error.
Shouldn鈥檛 this issue remain open? Or should I create a new issue?
I would create a new one, but actually this would be done in the pymyq dependant component that @arraylabs seems to maintain AFAIK.
Sorry, been away on vacation out of the country for the last 1.5 weeks.
@vajonam @ehendrix23 @talljohnson would you have a chance to test out some changes?
Changes:
Let me know if you run into any issues or if something looks wrong, or there is just a better way to accomplish something.
https://gist.github.com/arraylabs/bd1a58fb44761aa968a8cb0569ac8710
@arraylabs Had to sent my hub back and got another one just in but still need to install. Will try to get it installed tomorrow and then I will definitely test out this code. :-)
Testing overnight. Just got the code installed. So far so good.
@arraylabs tried this out, still a lot of flip-floping between unknown and closed.. and hence massive amount of false notifications. I am guessing I need to handle this in the automation
@vajonam @ehendrix23 I updated the gist above to reference the pymyq library from beta_4 to beta_9. For beta_9 I added three retries(at 5 second intervals) to the status update request (get_status) before it gives up and throws False resulting in an Unknown state. Hoping this will at least reduce the 500 errors. I've only had it running for a few hours so not sure how effective it is yet.
@arraylabs cool i am testing it now
@arraylabs left it running overnight got one false unknown to closed switch. Much improved.
I'm going to test the latest revision as well now that I have a day or two home. Will let you know how it works out.
@geekofweek thanks, interested to see how its working for you!
@vajonam glad it made things a bit better
On my end I haven't had a single Unknown flip since the the changes from 2 days ago. I'm currently using a 2 minute scan interval but will probably switch back to 5 soon.
@arraylabs I had two flips from unknown to closed, one at 3am and one at 5pm. since I have testing I have seen 24 500 errors in my logs, so a certain improvement.
@vajonam Unfortunately there isn't too much we can do about 500 (server side) errors as its completely on the Chamberlain side. I can increase from 3 retries to maybe 4 or 5 and/or increase the delay between each retry but at some point it's kinda spamming the API server with requests. Anyone have any thoughts ideas to make it more robust?
Well I think the question is, that we'll probably not know the answer to, is Chamberlain rate limiting requests or is it because the server running their infrastructure is sitting in a broom closet somewhere (I would believe it) and just failing to keep up. If they're rate limiting we can't really do much but pull back on the requests and see if it levels out, if it's server failing pulling back requests might help but best you can do is try to compensate for the failures.
@vajonam Unfortunately there isn't too much we can do about 500 (server side) errors as its completely on the Chamberlain side.
@arraylabs, I personally don't think we need to worry too much about the 500 error itself. As you say, it is on the server end and not much we can do. However, it is about how it is being handled in HASS.
I think that right now by doing retries when opening/closing or when getting status update after x amount of seconds is the right way to go here.
Just finished installing my new MyQ. I have it configured in HomeKit directly but also in HASS. Got debug enabled, did add some _LOGGER.debug statements so that any issues are logged if set to debug.
Will check tomorrow to see if at any point I got 500 error and if I did what the result was then from a HASS perspective.
@arraylabs test update, looks good i haven't had any flip/flops in the last 2 days. but also haven't seen a 500 error in a while as well, maybe they have upgraded the server in the broom closet :-)
@arraylabs my tests are looking solid as well so far, haven't seen an unknown state in 24 hours.
@arraylabs think this solution is good for a PR back into the main branch.
agreed
Awesome job, hopefully the PR can get squeezed into the next HA beta.
@towerhand @geekofweek @vajonam @ehendrix23 I have submitted https://github.com/home-assistant/home-assistant/pull/17535 for this with the documentation update as well for scan_interval.
@arraylabs, excellent. I did not see any issues anymore after the updates to the code. :-)
Most helpful comment
@towerhand @geekofweek @vajonam @ehendrix23 I have submitted https://github.com/home-assistant/home-assistant/pull/17535 for this with the documentation update as well for scan_interval.