Esp-idf: [TW#26293] lwip_close() socket leak

Created on 12 Sep 2018  Â·  12Comments  Â·  Source: espressif/esp-idf

Environment

  • IDF version c5265b1

Problem Description

lwip_close() does not free socket resources leading to next lwip_socket() call failure with ENFILE due to alloc_socket() failure.

Seems like that it is related to ASIO work. Enabling free_socket() at line 890 resolves problem for me.

Steps to repropduce

std::list<int> sl;

int s;

while((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) >= 0) sl.push_back(s);

s = sl.back();
sl.pop_back();

close(s);

assert(socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) >= 0); // will fail here

All 12 comments

Hi Andrew,

I tested your snippet with default config and idf version you supplied and worked as expected. While added a couple of couts:

while((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) >= 0) sl.push_back(s);

for (auto i: sl) std::cout << i << ", "; std::cout << std::endl;

s = sl.back();
sl.pop_back();

for (auto i: sl) std::cout << i << ", "; std::cout << std::endl;

close(s);
std::cout << "Recently closed socket: " << s << std::endl;

int next_s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
std::cout << "Recently opened socket: " << next_s << std::endl;

This outputs:

54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
54, 55, 56, 57, 58, 59, 60, 61, 62,
Recently closed socket: 63
Recently opened socket: 63

I was able to recreate the behaviour you're describing when directly accessing lwip_ prefixed functions: lwip_socket, lwip_close.

In the IDF, the lwip stack is compiled with ESP_THREAD_SAFE enabled, which means that one should use _r suffixed methods (reentrant) if directly accessing lwip internals (in case of close, there's a closesocket utility alias)
Off course using standard system calls is prefered and recommended way of using sockets, might have been a little slower, though.

With that said changing the close invocation to either of the below shall resolve your problem.

lwip_close_r(s);

or

closesocket(s);

Please let me know if that was the case or if you need anything else.
Thanks & Regards,
David

Hi David,

thanks for looking into that.

What you say makes sense - _r API calls use reference counting and really
should free socket resources.

My code was using socket()/close() pair and these were calling non-r
variants. I'm sure. Finally I switched to closesocket() before submitting
bug report and it was also calling lwip_close().

Perhaps close()/closesocket() is redefined to lwip_close() somewhere. I'll
check my side, will try to narrow this case and get back with update.

Thanks,
Andrew

On 17 September 2018 at 13:27, david-cermak notifications@github.com
wrote:

Hi Andrew,

I tested your snippet with default config and idf version you supplied and
worked as expected. While added a couple of couts:

while((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) >= 0) sl.push_back(s);

for (auto i: sl) std::cout << i << ", "; std::cout << std::endl;

s = sl.back();
sl.pop_back();

for (auto i: sl) std::cout << i << ", "; std::cout << std::endl;

close(s);
std::cout << "Recently closed socket: " << s << std::endl;

int next_s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
std::cout << "Recently opened socket: " << next_s << std::endl;

This outputs:

54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
54, 55, 56, 57, 58, 59, 60, 61, 62,
Recently closed socket: 63
Recently opened socket: 63

I was able to recreate the behaviour you're describing when directly
accessing lwip_ prefixed functions: lwip_socket, lwip_close.

In the IDF, the lwip stack is compiled with ESP_THREAD_SAFE enabled,
which means that one should use _r suffixed methods (reentrant) if directly
accessing lwip internals (in case of close, there's a closesocket utility
alias)
Off course using standard system calls is prefered and recommended way of
using sockets, might have been a little slower, though.

With that said changing the close invocation to either of the below shall
resolve your problem.

lwip_close_r(s);

or

closesocket(s);

Please let me know if that was the case or if you need anything else.
Thanks & Regards,
David

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/espressif/esp-idf/issues/2403#issuecomment-421958901,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ANSZBSASe5HSSbgrHU4BPnuycFW5uoG3ks5ub3j-gaJpZM4Wkufq
.

Here is update.

1) in my case I was always trying to close socket using lwip_close_r() (which in turn calls lwip_close()). I missed that lwip_close() was always called from lwip_close_r() - had to trace that with debugger - sorry.
2) lwip_close_r() is leaking (or delaying socket release) because socket 'error' is EWOULDBLOCK - in this case lwip_close_r() resets socket state to LWIP_SOCK_OPEN and LWIP_API_UNLOCK() is not calling free_socket(). Will it ever free this socket, perhaps after TCP signaling?

IMO it is not correct if lwip_close()/lwip_close_r() does not do what it is expected to do (free resources for next immediate lwip_socket() call).

Non-modified lwip_close() does exactly what it should - unconditionally calls free_socket().

Hi,

In my application, there is 22 call to close() to close sockets.
The last esp-idf brokes the application with errno 23. (websockets are frequently closed and re opened)
I had to revert
$ git reflog
518942ec6 (HEAD -> master, origin/master, origin/HEAD) HEAD@{0}: pull: Fast-forward
7abed5fc9 HEAD@{1}: pull: Fast-forward
020ade652 HEAD@{2}: pull: Fast-forward
30545f4cc HEAD@{3}: pull: Fast-forward

to 7abed5fc9 in order to deliver a working application.
I tried to found the commit from where the problem appeared, but no success.
Do i have to change all close to closesocket ? Or the problem will be corrected soon?
Thanks.

Perhaps it will be useful if I describe my use case:
1) it is resource hungry app which tries to spare every byte of RAM
2) it is single threaded/single core app
3) it uses sockets in non-blocking mode
4) it is downloading a lot of files over TCP, one by one
5) if current download is stuck (no data arrives for some time and socket is not closed) then app calls closesocket() and proceeds to next file (URL)

Hi guys,

Indeed, this is a bug in esp-lwip which needs to be fixed. I wrongly assumed that you call lwip_close directly (partly from the title of the issue and that I wasn't able to reproduce it with the code provided).
Now I see that when closing a socket which EWOULDBLOCK it's not released.

@karawin No, change close to closesocket will not help.

Thanks @Andrew for providing additional application info.

Yes, line 3462 is what is causing problem, at least in my case. IMHO it should be safe to change it to 'if(0)'.

In fact there are some chances to workaround issue with getsockopt(s, SOL_SOCKET, SO_ERROR, ...);

Yes, line 3462 of sockets.c is the issue, however, changing it to if (0) would cause other issues.

It shall be safe to check lwip_close return code. the esp-lwip team has already prepared a bug fix (a bit more complex), but here is the gist of it:

if (__ret != 0  && EWOULDBLOCK == __sock->err)

This works on my end (testing code with closing non-blocking TCP socket while transaction active), so my __sock->err is EWOULDBLOCK but return code is 0.

Can you also check if that solves your issue, or perhaps that your close returns 0 ?

Confirming - works for me. Thanks!

@andrewvoznytsa Thanks for reporting the issue. If our reply helps resolve the issue, would you help close the issue? Thanks.

yes, ok

Was this page helpful?
0 / 5 - 0 ratings