Esp-homekit-devices: General stability

Created on 6 Mar 2018  路  17Comments  路  Source: RavenSystem/esp-homekit-devices

I'm opening this as more of a time-based issue/discussion.

I have a few Sonoff Basics - I had put v1.1 on them a little time back and noticed that I'd only get 3-5 days before I'd have to unplug and re-plug them in for them to become responsive. As I'm not a HomeKit or expert on this repo or it's dependencies I've decided to install v2.1 on one device to see if the stability is any better.

I hope to run this experiment for the next week as a small ongoing test of stability - I know that sometimes as developers we are constantly flashing and re-flashing devices with upgrades/enhancements and miss having a long-running device to prove stability.

I looked at a high-level over the commits and didn't see anything specifically related to this. That's not to say one of the dependencies didn't address this.

Most helpful comment

Now id var is freed correctly in homekit_server_on_get_characteristics() function.

All 17 comments

Many thanks for your report.

I have connected a Sonoff with erase and fresh install with DEBUG enabled, and I'm going to test it during some days (at least a week). If you observe any thing wrong with your tests, please, comment it.

I麓m also testing the latest release DEBUG=1 after a complete erase flash an unpair-pair accessory in the HomeApp, so far stable, will report here as well.

I have made some changes to get more free memory adjusting task stacks.

My Sonoff 4ch stopped connecting to homekit after about 2-3 weeks of being fine. The device was still operating fine and I could operate the switch using the hard buttons on the device. Just the homekit part was disconnected. Reboot fixed. I can try to flash latest build this weekend with DEBUG enabled to see if I can discover the cause of the disconnect.

As I can see in logs, HomeKit server task sometimes causes a fault. I have increased its task stack to 2560.

I took a week long log and saw that Heap size was decreasing day by day, in my case crashed with a Fatal Exception but I deleted the log by mistake, I麓m running the test again, we may have a memory leak somewhere.

Also its not clear to me what the Client number means, I see it increasing from 4 to 14 clients after a couple of days, I don't have so many homekit clients at home (max 5 if AppleWatch counts), I guess the number is not being reset when a client is removed not sure if its the source of the memory leak.

The client number is the file descriptor created with the socket. But it is not normal to have it increasing from 4 to 14. I'm talking with the author of esp-homekit framework to try to solve the issue (And I think that Apple Watch doesn't count).

Confirmed, we have a memory leak that may trigger reset in the best case or completely hang the SonOff causing the non responding symptom.

Attached log: it took almos 1 week in my case but seems like it depends on how much you use it.

The heap decrease it size from 32324 after a boot to only 4268 just before the Fatal Exception (29), in this case the ESP was able to reboot and continue working but 1 week ago I had a similar crash but the ESP was unable to reboot.

Some interactions before the reset when the heap size was 7016 I got the message

[2018-03-25 09:23:41] >>> mdns_reply could not alloc 1460

100440-250318_log_reset.txt

Many many thanks for log! I'm going to pass it to Max (esp-homekit framework developer). The "mdns_reply could not alloc 1460" error is not the cause, it is a consequence of not having enough free memory to manage mDNS queries (1460 bytes is the size of the mDNS buffer).

Viewing log, I think that it 's possible that memory leak is in the pair_verify functions.

I think I nailed down the leak, I just discovered with your latest build that each time I open the HomeApp 48 Bytes are leaked, attached a short log showing the behavior.

How to reproduce:

1) Reset Sonoff and wait for Wifi connection an pair
2) Open HomeApp
3) Turn On Sonoff switch from the app, note the heap
4) Turn Off, note the heap
5) Close HomeApp
6) Repeat from step 2, and note how the heap reduces its size by 48-52 Bytes

103844-250318_leak48bytes.txt

I have sent it to esp-homekit framework developer too.

I just opened a bug in the esp-homekit framework git to track the issue properly, added also a new log I took nailing down the issue apparently to the homekit_server_on_get_characteristics function

Involve issue #35 Memory Leak when connecting with HomeApp from esp-homekit framework.

From what I can see, it's not a memory leak. There is an accumulation of connections no longer used by clients. Until they expire, the server keeps occupied the resources assigned to them, and even worse it begins caching their notifications. Here is the patch that solved the problem for me:
```
diff --git a/src/server.c b/src/server.c
index b3d5d48..d09ac8c 100644
--- a/src/server.c
+++ b/src/server.c
@@ -2900,6 +2900,24 @@ client_context_t homekit_server_accept_client(homekit_server_t *server) {
const struct timeval rcvtimeout = { 10, 0 }; /
10 second timeout */
setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &rcvtimeout, sizeof(rcvtimeout));

  • /* The SO_KEEPALIVE option causes a packet to be sent to the remote system if a long time passes with no other data being sent or received.
  • This packet is designed to provoke an ACK response from the peer.
  • This enables detection of a peer which has become unreachable. */
  • int yes = 1;
  • setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &yes, sizeof(int));
    +
  • // TCP_KEEPIDLE The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE has been set on this socket.
  • int idle = 1;
  • setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, &idle, sizeof(int));
    +
  • // TCP_KEEPINTVL The time (in seconds) between individual keepalive probes.
  • int interval = 1;
  • setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, &interval, sizeof(int));
    +
  • // TCP_KEEPCNT The maximum number of keepalive probes TCP should send before dropping the connection.
  • int maxpkt = 10;
  • setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, &maxpkt, sizeof(int));

  • client_context_t *context = client_context_new();
    context->server = server;
    context->socket = s;
    ```
    I do not know if it does not break the HomeKit specifications, but I can safely say that it causes some increase in energy consumption both on the clients and on the server.

Many thanks!

I'm testing it with these parameters to be less aggressive:

    uint8_t yes = 1; /* Enable keepalive */
    setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &yes, sizeof(uint8_t));

    uint8_t idle = 50; /* Idle timeout to begin sending keepalive packets, in seconds */
    setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, &idle, sizeof(uint8_t));

    uint8_t interval = 2; /* Interval between keepalive packets in seconds */
    setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, &interval, sizeof(uint8_t));

    uint8_t maxpkt = 5; /* Max keepalive packet count to send */
    setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, &maxpkt, sizeof(uint8_t));

@overbog that doesn't solve the memory leak of homekit_server_on_get_characteristics(), but now it's fixed.

Now id var is freed correctly in homekit_server_on_get_characteristics() function.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hmbarbosa picture hmbarbosa  路  4Comments

lizzus picture lizzus  路  3Comments

noobydp picture noobydp  路  5Comments

freddair picture freddair  路  3Comments

i3laze picture i3laze  路  3Comments