Esp32-snippets: WebSocket.send() and Socket.send() have no return value

Created on 5 Jan 2018  ยท  77Comments  ยท  Source: nkolban/esp32-snippets

Currently I am working on my application which uses WebSocket communication. Here I am sending several hundred WebSocket telegrams to the browser in very short time. This works well when logging is enabled but the Socket connection gets reset when I switch logging off. I assume that I am pushing data too fast into the Socket so that it crashes.
In order to overcome this situation I would like to check if the Socket is free for new data or to check if the previous send was successfull. For this I could check the return value of WebSocket.send() or Socket.send(). But these functions are currently defined as void.

bug waiting

Most helpful comment

Thank you for the comprehensive information.
Based on this I started some analysis and it seems as if the WebSocketTask gets started twice. The second request from the browser goes to the port of the first connection and the int length = peerSocket.receive((uint8_t*)&frame, sizeof(frame), true); waits endless for this request.
Currently I am not sure if this problem is in my code or in the HttpServer classes. I will investigate and let you know.
Here is the log:

D (22425) WebSocketReader: WebSocketReader Task started, socket: fd: 4101
D (22426) WebSocketReader: Waiting on socket data for socket fd: 4101
D (31716) WebSocketReader: Received datta from web socket.  Length: 2
D (31716) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 0 freeptr 0 wptr 16
I (31717) WebSocketTask: data:getallparainfo
I (31717) WebSocketTask: WSPointer:1073683332
I (31717) WebSocketTask: ------
D (31718) WebSocketReader: Waiting on socket data for socket fd: 4101
D (31778) WebSocketReader: Received data from web socket.  Length: 2
D (31832) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 991 rptr 16 freeptr 0 wptr 32
D (31833) WebSocketReader: Waiting on socket data for socket fd: 4101
I (35523) WebSocketTask: ------ msg deleted
I (35523) WebSocketTask: ------ item returned from Ringbuffer
I (35524) WebSocketTask: data:
I (35524) WebSocketTask: WSPointer:1073683332
I (35524) WebSocketTask: ------
I (35524) WebSocketTask: ------ msg deleted
I (355525) WebSocketTask: ------ item returned from Ringbuffer
D (114741) paraWriteHandler: Request-Path: /api/v1/pw
D (114742) paraWriteHandler: count=4;P=990; DS=0; BA=1; VAL=0000
D (1114745) paraWriteHandler: Write : 0:
D (122806) WebSocketReader: WebSocketReader Task started, socket: fd: 4099
D (122807) WebSocketReader: Waiting on socket data for socket fd: 4099
D (129898) WebSocketReader: Received data from web socket.  Length: 2
D (129899) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 32 freeptr 32 wptr 48
I (129899) WebSocketTask: data:getallpparainfo
I (129899) WebSocketTask: WSPointer:1073621352
I (129900) WebSocketTask: ------
D (129901) WebSocketReader: Waiting on socket data for socket fd: 4099

At timestamp 22425 the WebSocketReader Task gets started the first time and has the fd:4101. At time instant 122806 the WebSocketReader Task gets started the second time; this time with fd:4099.
I think this is not a good constellation...

All 77 comments

Can you try make menuconfig->Compiler options->Optimization Level->Release ?

With Optimization Level Release it seems to work. But also in this case I see on the logging console sometimes an error message:

E (73536) Socket: send: socket=4099, Connection reset by peer

Do I need to bother about that message?

Edit: after experimenting a little bit I see that the connection is not so stable. Sometimes I get the error message:

E (181231) Socket: accept(): Too many open files in system, m_sock=4096
E (181236) HttpServerTask: Caught an exception waiting for new client!

after that the Webserver does not respond any more.

I had a look at the source ... when we send data down a socket, we call Socket::send() which appears to trap errors and (at a minimum) logs them to the trace. See:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L432

For a WebSocket send, not sure what to do if a send() fails. We get a callback notification when the partner socket is closed or otherwise detects an error and I can't yet imagine a scenario where we would attempt to perform a send() and not get a partner error indication if the send failed.

The ESP32 can only maintain a small number of concurrently open TCP/IP connections. I think 4 is the default. Can we check to see if we are creating new incoming WebSocket connections and not seeing the corresponding completions of them?

actively (in my code) I am initiating a WebSocket connection only once. This one I am using for the communication. I am not shure if some security layer in the lower level TCP/IP stack could initiate additional WebSocket connections; e.g. if there is something wrong with the ongoing communication. Is it possible to track this using wireshark?

E (181231) Socket: accept(): Too many open files in system, m_sock=4096

This message you can see if you are trying to connect from the same client too quick, before old request got processed and socket closed.

yes, before opening the WebSocket connection I initiate several (20?) requests via REST within short time (simply a for loop in the browser). Can this be the reason for the error message? How can I prevent this? Is there a possibility in Javascript to detect if the (ESP32-)server has problems with too many requests within short time?

@squonk11 Like mr @nkolban said, by default we have set only 4 open connections allowed. You can try to change this value in menuconfig:

CONFIG_LWIP_MAX_SOCKETS:                                                                                               
  โ”‚                                                                                                                      
  โ”‚ Sockets take up a certain amount of memory, and allowing fewer                                                       
  โ”‚ sockets to be open at the same time conserves memory. Specify                                                          
  โ”‚ the maximum amount of sockets here. The valid value is from 1                                                          
  โ”‚ to 16.                                                                                                               
  โ”‚                                                                                                                      
  โ”‚ Symbol: LWIP_MAX_SOCKETS [=4]                                                                                        
  โ”‚ Type  : integer                                                                                                      
  โ”‚ Range : [1 32]                                                                                                       
  โ”‚ Prompt: Max number of open sockets                                                                                   
  โ”‚   Location:                                                                                                          
  โ”‚     -> Component config                                                                                              
  โ”‚       -> LWIP                                                                                                        
  โ”‚   Defined at /home/esp32/esp/esp-idf/components/lwip/Kconfig:19 

I will check this tomorrow.

I now increased the number of allowed connections from 4 to 8 and I reduced the number of consecutive requests. By this the error message

E (181231) Socket: accept(): Too many open files in system, m_sock=4096

seems to be dissapeared. But the error messages:

E (73536) Socket: send: socket=4099, Connection reset by peer

still persist and I am still having stability problems with the communication. Am I sending data too fast? Actually I am sending chunks of data of approx. size of 500Bytes every 40ms via WebSocket. This is a datarate of 12,5kByte/s only...

My knowledge about sockets and wifi communication is rather basicaly, but it seems that for some reason client app is disconnecting or reseting connection. You may have issue with wifi Rx buffer. Check options in menuconfig regarding dynamic and static Rx buffer in wifi submenu. Its best lead i can tell you.

o.k. thanks. I will check that.
I now reduced the stringsize down to 100 --> this works. If I increase to 150 --> does not work.
When it does not work, Microsoft Edge directly detects a terminated WebSocket connection and triggers the websocket.onclose function and no further Websocket telegrams are recorded in wireshark. Firefox does not detect this condition, all telegrams are transmitted but wireshark marks them as TCP and not as WebSocket.

The way I would suggest we approach this puzzle is to write a detailed story that describes what is going on ... including topology diagrams and more. We can then shop this around as needed.

Lets see if I can get a handle on what I think you have and we'll see how close that might be. You have an ESP32 that is running the HttpServer classes (thank you for that and we'll try and give you the best support we can on using those). This then makes your ESP32 an HTTP server. Your ESP32 is presumably connecting to your home access point and as such, it has an IP address. Given that your ESP32 is being an HTTP Server, we now have a unique endpoint ... the IP address of the ESP32 and the port on which the HTTP server is listening.

Now comes some unknowns. You mention a browser ... that appears to be the client of the ESP32 hosted HTTP server. However, where is the browser running? Is it on a PC or a phone. In either case, please describe a little about the environment. If a PC, is it windows, mac or Linux? What flavor of browser is being used?

I am now suspecting that your browser reaches out to the ESP32 hosted HTTP server and forms a connection and asks for a WebSocket. At this point there is now a "pipe" between the ESP32 and the browser. Can I assume that the pipe is ESP32 to browser exclusive?

I now hear that you are sending messages of about 500bytes each 40ms. Now we get into the realm of some tests. If you reduce the frequency of ESP32 transmissions ... say to 1 a second ... can you maintain stability? If yes and you then increase the rate to 2 a second ... do we still have stability? How fast can we go before we reach trouble?

When we reach trouble ... now we have a test to work against. Is the problem consistent based on different browser types?

If we run wireshark ... can see any "interesting" traffic when a connection fails?

@squonk11 This is consistent with my previous post. In menuconfig you can read that can happen so wifi Rx buffers are too large and application buffer may be too small to procced all packets in time. Its long and hard way to make it right but at the end you will be happy when it works.

In menuconfig in the wifi configuration I only see parameters for the number of buffers - not for the size of them. Do I need to increase the nuber of Rx buffers? Why Rx buffers - I think I might have the problem on the Tx buffers because the connection gets lost when I am sending?

@squonk11 Ok then, i thought its issue with incomming messages. So you need to change Tx buffer.

Set the number of WiFi dynamic TX buffers. The size of each dynamic TX buffer is not fixed,                          โ”‚  
  โ”‚ it depends on the size of each transmitted data frame.                                                               โ”‚  
  โ”‚                                                                                                                      โ”‚  
  โ”‚ For each transmitted frame from the higher layer TCP/IP stack, the WiFi driver makes a copy                          โ”‚  
  โ”‚ of it in a TX buffer. For some applications, especially UDP applications, the upper layer                            โ”‚  
  โ”‚ can deliver frames faster than WiFi layer can transmit. In these cases, we may run out of TX  buffers.

Or you can try to change Tx buffer to static since you have found that data size is causing issues. Static buffer is approx. 1.6kB size so you need to decide if its enough for one packet. Then you can fiddle with number of buffers(up to 64):

Set the number of WiFi static TX buffers. Each buffer takes approximately 1.6KB of RAM.                              โ”‚  
  โ”‚ The static RX buffers are allocated when esp_wifi_init() is called, they are not released                            โ”‚  
  โ”‚ until esp_wifi_deinit() is called.                                                                                   โ”‚  
  โ”‚                                                                                                                      โ”‚  
  โ”‚ For each transmitted data frame from the higher layer TCP/IP stack, the WiFi driver makes a                          โ”‚  
  โ”‚ copy of it in a TX buffer.  For some applications especially UDP applications, the upper                             โ”‚  
  โ”‚ layer can deliver frames faster than WiFi layer can transmit. In these cases, we may run out                         โ”‚  
  โ”‚ of TX buffers.  

After switching to static Tx buffers with more buffers than default the WebServer does not work at all. Maybe I have also an issue with stack size; CPU crashes in "new_op.cc:54". I will do some changes here, observe the results and post them here if necessary.

Do you know if there is a limit concerning the transmission of Strings via WebSocket protocol using the SendType: WebSocket::SEND_TYPE_TEXT; e.g. to max 127 char?

New finding:
A transmission of strings with a length of 127 characters [0...126] work perfectly. A transmission of strings with a length of 128 (or more) does not work. Strange!
I am transmitting a buffer defined as char data[500]; using the command ws.send(data, WebSocket::SEND_TYPE_TEXT);. The first parameter to send is of type std:string; might there be a problem in the implicit conversion from char to std::string?

New finding:
If I transmit a std::string of length 126: it works
If I transmit a std:string of length 127: it does not work
So it is not related to the fact if it is char array or std::string passed to ws.send

New finding:
I now reduced the frequency of telegram transmissions from 40ms in the beginning down to 1second. The result is the same: 126 characters work 127 characters don't work.
Any suggestions what else I should test?

I think I found the problem:
the byte-order for the length information in the WebSocket header in case of >126bytes seems to be wrong. I modified in WebSocket::send the code as follows:

if (data.length() < 126) {
frame.len = data.length();
m_socket.send((uint8_t *)&frame, sizeof(frame));
} else {
frame.len = 126;
m_socket.send((uint8_t *)&frame, sizeof(frame));
uint16_t len = (uint16_t)data.length();
len = ((len & 0xff) << 8) | (len >> 8);
//m_socket.send((uint16_t)data.length());
m_socket.send(len);

So, basically I swapped the bytes of the 16bit length information (maybe in an archaic way...)
With this modification it works perfectly.

After doing some more tests it seems as if this was the problem. I think I deserve a beer!

maybe even 2. nice work

No kidding mr @squonk11 ... you most assuredly rock. That was fantastic detective work. What it means though is that not a soul before you has ever tried to send a packet of web socket data larger than 126 bytes using this library. That means that we must consider this library "unproven".

Because I like tinkering, this library was written from the ground up rather than attempting to leverage any other pre-existing applications. That's normally a "bad" thing because it means that we "recreate" the bugs that others may have solved.

The web socket protocol is documented in this specification:

https://tools.ietf.org/html/rfc6455

If we look at section 5.2 we find the wire level layout:

https://tools.ietf.org/html/rfc6455#section-5.2

Basically, if the length of data that can be sent is < 126 then the data is 7 bits of the length (because < 126 will fit in 7 bits). If the data length is > 126 but less than < 65536 the 7 bits of length are set to be 126 (as a flag) followed by 16bits of length followed by data. There is the ability to send data of length 64 bits .... but I haven't gotten around to implementing that as I haven't seen any example anywhere of sending a message > 64K.

Where I dismally failed was to convert the length from "host" bit order (little endian on an ESP32) to network bit order (big endian).

The code base has been changed to resolve that now just as you did in your code. However, I made one small change. I am using the "htons" function which will byte flip but ONLY if needed. If there was ever an ESP33 that used big endian format (doubt it ... but you never know) then the code would break.

Again ... let me echo what mr @chegewara said ... VERY nice work ... and beer and kudos to yourself.

acutally that is what I did: studied the framing of the WebSocket telegrams from the rfc6455 and compared the examples in section 5.7 (especially the example with 256 bytes binary message in a single unmasked frame) and compared to the wireshark trace.

Thats is example of good debuging with use of tools like wireshark. That proves skills.

Unfortunately I am now having another problem:
After sending all the above mentioned data from the ESP to the browser, I try to initiate this request again by hitting the same button in the website again. I see on wireshark that the request gets sent to the ESP32 and gets acknowledged ([ACK]-telegram) by the lwip stack. BUT: the onMessage handler of the WebSocketHandler class does not get triggered. It seems as if the telegram is being ignored inside the ESP32 SW somehow - there is absolutely no logging on the serial console.
Firefox after some time gives in the console an error message that the websocket connection was interrupted during the loading of the page. Microsoft Edge just closes the WebSocket connection after several minutes.
To be honest: I have no idea where I should start to search.

Do we want to consider this a new puzzle and hence a new issue? We seem to have moved away from send() having no return value.

I might be wrong, but i think we need to fix latest bugfix here:

The payload length
is the length of the "Extension data" + the length of the
"Application data". The length of the "Extension data" may be
zero, in which case the payload length is the length of the
"Application data".

https://tools.ietf.org/html/rfc6455#section-5.2

Since we are considering now option when Extension data is 2 bytes long, then dont we need to add 2 to data.length() in this line?
https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L342
and here:
https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L340

I do not think that we have extension data here:

The "Extension data" is 0 bytes unless an extension has been
negotiated. Any extension MUST specify the length of the
"Extension data", or how that length may be calculated, and how
the extension use MUST be negotiated during the opening handshake.
If present, the "Extension data" is included in the total payload
length.

Or do we negotiate the presence of extension data during the opening phase?
Just for a test I increased the value for the data length by 2; the result is that the transmission of the first telegram >126bytes fails.

I might read something wrong and mixed 'Extended data lenght' with 'extension data'. Sorry.

I am also observing that sometimes during the transmission of my data to the browser and sometimes after the transmissin when I click the second time the button to transmit the data a second time I get a strange behaviour. The log shows:

D (20081) HttpParser: << parse
D (20081) HttpRequest: Method: , URL: "", Version:
D (20081) HttpRequest: Body: ""
D (20081) HttpServerTask: >> processRequest: Method: , Path:
D (20082) HttpServerTask: No Path handler found
D (20084) HttpServer: Path /sdcard/http is a directory
D (20084) HttpResponse: >> sendData
D (20084) Socket: send: Binary of length: 36
D (20084) Socket: send: Raw binary of length: 36
D (20086) Socket: send: Binary of length: 12
D (20086) Socket: send: Raw binary of length: 12
D (20087) HttpResponse: << sendData
D (20087) HttpResponse: >> sendData
D (20087) Socket: send: Binary of length: 17
D (20088) Socket: send: Raw binary of length: 17
D (20088) HttpResponse: << sendData
D (20089) HttpResponse: >> sendData
D (20089) Socket: send: Binary of length: 13
D (20090) Socket: send: Raw binary of length: 13
E (20090) Socket: send: socket=4102, Connection reset by peer
D (20091) HttpResponse: << sendData
D (20092) HttpResponse: >> sendData
D (20092) Socket: send: Binary of length: 21
D (20093) Socket: send: Raw binary of length: 21
E (20093) Socket: send: socket=4102, Connection reset by peer

It looks like the ESP is receiving an "empty" telegram and interprets this as a request to list the directory. After that the WebSocket communication is not working any more. So, the problem might be somewhere else in the protocol stack?

@squonk11 Can you provide a larger amount of logs? It would be nice to see whole communication between esp32 and browser, or maybe wireshark data file? Because i would like to help, really, but i feel like im listening someones talking on phone. I hear just one part of conversation (i see code) but i dont hear counterpart (what browser client is talking about).

Unfortunately I am not able to upload files directly here so I pasted the ESP logdatat at pastebin: https://pastebin.com/Gh9m4L67. But I don't know how to paste files in order to show you the wireshark log. Can I send this by eMail?

Pastebin is excelent choice.
Lets start with this error message. Its right after sending last packet and before this empty request:
E (31563) Socket: receive: No more processes
This leads us to this line:
https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L366
or here:
https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L385

its my mail: [email protected], here you can send wireshark data, thanks for sharing data

I now sent the wireshark trace and the corresponding ESP32 log. If you need more info or tests, please let me know.

I am now trying to send you two more files but I get the error message

5.7.1 host ns1.diskstation.eu [78.10.98.102] said: github@esp32.eu.org: Relay access denied (H-BADRCPT).

from your eMail server...?

@squonk11 You are smart guy, can you inject logging in this loop to see amountToReadvalue:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L377-L394

I am now trying to send you two more files but I get the error message
5.7.1 host ns1.diskstation.eu [78.10.98.102] said: [email protected]: Relay access denied (H-BADRCPT).
from your eMail server...?

should have work, tested with gmail and i can send and receive message

I now added logging in the loop as requested. The output of the log is:

I (136115) sendAllParainfo: read: F97000000000000000300000001E0400000000010.06;;;
I (136120) sendAllParainfo: read: F98000000000000000300000001E0400000000010.07;;;
I (136125) sendAllParainfo: read: F99000000000000000100000000E4C00000063D10.08;;;
D (136130) WebSocket: >> send: Length: 363
D (136130) Socket: send: Raw binary of length: 367
D (136131) WebSocket: << send
I (136131) WebSocketTask: ------ msg deleted
I (136131) WebSocketTask: ------ item returned from Ringbuffer
D (1137592) HttpParser: >> parseRequestLine: "" [0]
D (137592) HttpParser: << parseRequestLine: method: , url: , version:
E (142592) Socket: receive: No more processes
D (142592) HttpParser: << parse
D (142592) HttpRequest: Method: , URL: "", Version:
D (142592) HttpRequest: Body: ""
D (142592) HttpServerTask: >> processRequest: Method: , Path:
D (142593) HttpServerTask: No Path handler found
D (142595) HttpServer: Path /sdcard/http is a directory
D (142595) HttpResponse: >> sendData
D (142595) Socket: send: Binary of length: 36
D (142595) Socket: send: Raw binary of length: 36
D (142597) Socket: send: Binary of length: 12
D (142597) Socket: send: Raw binary of length: 12
D (142598) HttpResponse: << sendData
D (142598) HttpResponse: >> sendData
D (142599) Socket: send: Binary of length: 177
D (142599) Socket: send: Raw binary of length: 17
D (142600) HttpResponse: << sendData
D (142600) HttpResponse: >> sendData
D (142600) Socket: send: Binary of length: 13
D (142601) Socket: send: Raw binary of length: 13
E (142602) Socket: send: socket=4101, Connection reset by peer
D (142602) HttpResponse: << sendData
D (142603) HttpResponse: >> sendData
D (142603) Socket: send: Binary of length: 21
D (142604) Socket: send: Raw binary of length: 21

I don't see any output of this loop... , strange

Edit: I put the ESP_LOGD on line 378 (directly under the while statement)

I now put an ESP_LOGD in line 367 of Socket.cpp. This one gets logged. So, the result of rc = ::lwip_recv_r(m_sock, data, length, 0); is -1.

It is -1 as expected. Now we have to trace back and see why. We need to know all parameters value that are in this call:

  • data
  • length
  • and find out where or why Socket::receive is called with parameter exact=false

In the corresponding wireshark trace I can not see any telegram coming from the browser. So I suppose the invoaction of Socket::receive() is done somewhere internally?

This is what ive found. Only option where receive is called with exact=flase is here:
https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L413-L421

because exact is set to be false if is ommited here:
https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.h#L80

Now I logged the data you wanted. Result:

I (50083) WebSocketTask: ------ msg deleted
I (50083) WebSocketTask: ------ item returned from Ringbuffer
D (54265) Socket: length: 1; data: โ•
E (54265) Socket: receive: No more processes
D (54265) HttpParser: << parse

So, data length 1.

Again, im in rush so im making mistakes.
Here is one more place where its called with exact=false:
https://github.com/nkolban/esp32-snippets/blob/59f667e5614c611ad68c8d1766b731e911de43bf/cpp_utils/Socket.cpp#L319

Next step up is here(which is more consistent with logs):
https://github.com/nkolban/esp32-snippets/blob/2adbc1cb90e3dd567a3a0c80f7fed617400c065c/cpp_utils/HttpParser.cpp#L191-L196

@squonk11 How looks like yours WebSocketTask?

@squonk11 Have you seend this? Its another empty request:

D (26562) HttpParser: >> parseRequestLine: "" [0]
D (26563) HttpParser: << parseRequestLine: method: , url: , version:
I (26565) sendAllParainfo: read: 234000000000000000100000000E4C00000001810.05;;;
I (26570) sendAllParainfo: read: 2350000000000000000000000000600080000000300.20.32
;;V;

this is my WebSocket task:

class WebSocketTask : public Task {
public:
   WebSocketTask(std::string taskName="Task", uint16_t stackSize=10000, uint8_t priority=5, BaseType_t core_id=tskNO_AFFINITY): Task(taskName, stackSize, priority, core_id) {};
   void run(void *data) {
       size_t size;
       WebSocketMessage* pmsg;
//     ESP_LOGI("WebSocketTask","------ init");
       while(1) {
           pmsg = (WebSocketMessage*) xRingbufferReceiveFromISR(WebSocket_rx_queue, &size);
           //ESP_LOGI("WebSocketTask","------ Ringbuffer read");
           //ESP_LOGI("WebSocketTask","size:%d; ", size );
           if(pmsg != NULL) {
               ESP_LOGI("WebSocketTask","data:%s", (*pmsg).getData()->c_str());
               ESP_LOGI("WebSocketTask","WSPointer:%d", (int)(*pmsg).getWebSocket());
               ESP_LOGI("WebSocketTask","------");
//temp             (*pmsg).getWebSocket()->send(*((*pmsg).getData())+"_WS", WebSocket::SEND_TYPE_TEXT);
               //vTaskDelay(500);
//             WebSocketMessage* pmsg_save = pmsg;
//             delete(pmsg);
//             ESP_LOGI("WebSocketTask","------ WSmsg deleted");
               if(*((*pmsg).getData()) == "getallparainfo")
                   sendAllParainfo(*(*pmsg).getWebSocket());
               delete((*pmsg).getData());
               ESP_LOGI("WebSocketTask","------ msg deleted");
               vRingbufferReturnItemFromISR(WebSocket_rx_queue, pmsg, 0);
               ESP_LOGI("WebSocketTask","------ item returned from Ringbuffer");
           }
           vTaskDelay(1);
       }
   }
};

yes, I have seen that there is an empty request line. But I don't know where it comes from.

The sendAllParainfo() function is this:

void sendAllParainfo(WebSocket ws) {
char data[1000];
char response[100];
int res;
int num = 0;
int totalLength = 0;
uint16_t length = 0;
bool lastPI = false;

VABus fu;
fu.SetUART(&g_uart2);
res = fu.Write('1', "990", '0', (char*)"0000");
do {
    for(int i = 0; i < 10; i++) {
        data[totalLength++] = 'P';
        data[totalLength++] = 'I';
        data[totalLength++] = ':';
        res = fu.Read('1', "991", '0', data + totalLength, &length );
        if(res == 0) {  // read o.k.
            if(strncmp(data + totalLength, "999", 3) == 0) {
                lastPI = true;
                break;
            }
            ESP_LOGI("sendAllParainfo","read: %s", data + totalLength);
            totalLength += length;
        }
    }
    ws.send(data, WebSocket::SEND_TYPE_TEXT);
    totalLength = 0;
    //vTaskDelay(1);
} while(!lastPI);

};

I am not calling Socket::receive(). I have no idea where this problem comes from.

Now I sent you the wireshark trace via gmx. I hope it works this time.

@squonk11 I have question about code. With data from logging this is never true:

if(strncmp(data + totalLength, "999", 3) == 0) {
                lastPI = true;
                break;
            }

Is that right?

yes, it becomes true. Otherwise the do-while loop would never end. But it ends correctly. I now added a line ESP_LOGI("sendAllParainfo","ended!"); behind while. Now the log output is:

I (26642) sendAllParainfo: read: F99000000000000000100000000E4C00000063D10.08;;;

D (26648) WebSockett: >> send: Length: 363
D (26648) Socket: send: Raw binary of length: 367
D (26649) WebSocket: << send
I (26649) sendAllParainfo: ended!
I (26649) WebSocketTask: ------ msg deleted
I (26650) WebSocketTask: ------ item returned from Ringbuffer
D (27875) Socket: length: 1; data: โ•
E (27875) Socket: receive: No more processes
D (27875) HttpParser: << parse
D (27875) HHttpRequest: Method: , URL: "", Version:
D (27875) HttpRequest: Body: ""
D (27876) HttpServerTask: >> processRequest: Method: , Path:
D (27876) HttpServerTask: No Path handler found
D (27878) HttpServer: Path /sdcard/http is a directory
D (27878) HttpResponse: >> sendData
D (27879) Socket: send: Binary of length: 36
D (27879) Socket: send: Raw binary of length: 36

at time 26649 you see the output.

You right, just log doesnt show it because this line is after break:
ESP_LOGI("sendAllParainfo","read: %s", data + totalLength);

Im reading wireshark data now.

I must confess: this do-while loop is not so easy to understand but it is the style of programming I did during the last 30 years on quite simple microcontrollers (30MHz). Here highly optimized code (even assembly code) was very important. Unfortunately, high level C++ is much more resource hungry. That is why I am not so happy that the HttpServer is written in C++; it makes it more easy to read and understand but unfortunately the performance suffers significantly. I checked the source code of Mongoose.c: it is highly optimized C code and they are not using the standard API of lwip but some low level routines thereof. I guess that is why Mongoose performes roughly about a factor of 10 better.

From wireshark i see you sent 2 more requests "getallinfo", right (frames: 443 and 447)? But, if i understand code right in this line:
delete((*pmsg).getData()); you deleted all data so there is nothing to send, am i right?

@squonk11 Im on irc if you have time and are willing to join

yes, I clicked these two more times on the upload button in order to read the data once more. This request is transmitted from the browser to the ESP correctly and is ACKed by lwip in the following frames (444 and 448). But these request telegrams do not initiate any activity on the ESP - there is absolutely no ESP-log output when I click the upload button after the data has been uploaded before.
The delete((*pmsg).getData()); just deletes the previous upload request. By klicking the upload button again there should be another 'getallparainfo' queued in the ringbuffer. But this does not happen - there is even no reaction in the lower level HttpServer routines.

I am trying to debug the situation a little bit. But I do not know the deeper logic behind the HttpServer and WebSocket code. Can you explain in a few sentences?

  • Where is the basic interface to the lwip stack? Which functions have access to it?
  • When and how is the WebSocket connection being detected?
  • If a WebSocket connection is existing how do you distinguish between WebServer and WebSocket packets/frames?

In this matter mr @nkolban can give you best response. I can only show when and how is detected if connection is websocket:
https://github.com/nkolban/esp32-snippets/blob/6057871db7af7270d2dffa4c95764c23349f1fea/cpp_utils/HttpRequest.cpp#L113

This I found already and I also started to study the HttpServer sources. But until now without success.
I found one more strange thing in my logs:

D (9531) Socket: close: m_sock=4099, ssl: 0
D (9531) Socket: Calling lwip_close on 4099
D (9532) Socket: close: m_sock=-1, ssl: 0
D (9533) HttpServerTask: Waiting for new peer client

Here you can see that for one HttpRequest the stocket seems to be closed twice. At 9531 it is closed regularly and at 9532 it seems as if there is the attempt to close it once again although it is already closed (m_sock = -1). Strange...

There are some design notes from the WebSocket component from last year found here:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/DesignNotes/WebSockets.md

However, lets see if I can't start addressing your specific questions. This will be more an iterative interaction as opposed to one and done. We may want to chat live via IRC if needed or else just post back here.

Imagine we have an HTTP Server ... it creates an inbound listening port and passively waits for an incoming connection request. For example, lets assume it is listening on local port 80. A remote browser will then connect to the ESP32 by its IP address at port 80. This is when the fun starts.

In TCP/IP, when we listen on a socket port we use the sockets API called accept().

For example:

int newSocket = accept(serverSocket);

This is a blocking call that will listen on port 80 and, when a connection arrives, will give us a NEW socket which is used just for that interaction. The ESP32 will now start to read the data that is incoming on the request. It expects it to be HTTP protocol. It starts to parse and interpret the data. Every HTTP request starts with an HTTP header. For a normal browser request, this will be instructions on what data the request wants and other control information. Now look at RFC 6455. This is the low-level protocol specification of "WebSockets". I don't want you (unless you really want to) have to study it. That's what the library is for ... it implements this protocol.

At the highest level, when the HTTP request arrives at the ESP32 from the browser, it is either an "ordinary" HTTP request OR it contains the header "Upgrade: websocket". This is the indication that the browser wants to start talking websockets. Now the magic happens. The connection that was already received (the original HTTP connection) is "re-purposed" to now flow web socket protocol. No new connection is formed. The connection between the ESP32 and the browser happens over that already existing connection. The connection is maintained for as long as either side wishes to send or receive data.

If you wanted to see the magic in code, this is a good place to look:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/HttpServer.cpp#L84

The function (processRequest) is invoked for each new incomming HTTP request received by the ESP32. We look for a handler and then get ready to invoke it. Before we invoke the handler we ask:

"Is THIS current request for a WebSocket?"

If it is, then we spawn a new task whose job is to do nothing but watch this new socket for new incoming data:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L364

Thank you for the comprehensive information.
Based on this I started some analysis and it seems as if the WebSocketTask gets started twice. The second request from the browser goes to the port of the first connection and the int length = peerSocket.receive((uint8_t*)&frame, sizeof(frame), true); waits endless for this request.
Currently I am not sure if this problem is in my code or in the HttpServer classes. I will investigate and let you know.
Here is the log:

D (22425) WebSocketReader: WebSocketReader Task started, socket: fd: 4101
D (22426) WebSocketReader: Waiting on socket data for socket fd: 4101
D (31716) WebSocketReader: Received datta from web socket.  Length: 2
D (31716) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 0 freeptr 0 wptr 16
I (31717) WebSocketTask: data:getallparainfo
I (31717) WebSocketTask: WSPointer:1073683332
I (31717) WebSocketTask: ------
D (31718) WebSocketReader: Waiting on socket data for socket fd: 4101
D (31778) WebSocketReader: Received data from web socket.  Length: 2
D (31832) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 991 rptr 16 freeptr 0 wptr 32
D (31833) WebSocketReader: Waiting on socket data for socket fd: 4101
I (35523) WebSocketTask: ------ msg deleted
I (35523) WebSocketTask: ------ item returned from Ringbuffer
I (35524) WebSocketTask: data:
I (35524) WebSocketTask: WSPointer:1073683332
I (35524) WebSocketTask: ------
I (35524) WebSocketTask: ------ msg deleted
I (355525) WebSocketTask: ------ item returned from Ringbuffer
D (114741) paraWriteHandler: Request-Path: /api/v1/pw
D (114742) paraWriteHandler: count=4;P=990; DS=0; BA=1; VAL=0000
D (1114745) paraWriteHandler: Write : 0:
D (122806) WebSocketReader: WebSocketReader Task started, socket: fd: 4099
D (122807) WebSocketReader: Waiting on socket data for socket fd: 4099
D (129898) WebSocketReader: Received data from web socket.  Length: 2
D (129899) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 32 freeptr 32 wptr 48
I (129899) WebSocketTask: data:getallpparainfo
I (129899) WebSocketTask: WSPointer:1073621352
I (129900) WebSocketTask: ------
D (129901) WebSocketReader: Waiting on socket data for socket fd: 4099

At timestamp 22425 the WebSocketReader Task gets started the first time and has the fd:4101. At time instant 122806 the WebSocketReader Task gets started the second time; this time with fd:4099.
I think this is not a good constellation...

Sorry the previous post was too fast. This only happened because I opened my website twice... :-(
But nevertheless the log now looks like this:

D (24349) WebSocketReader: WebSocketReader Task started, socket: fd: 4101
D (24349) WebSocketReader: Waiting on socket data for socket fd: 4101
D (28421) WebSoocketReader: Received data from web socket.  Length: 2
D (28422) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 0 freeptr 0 wptr 16
I (28423) WebSocketTask: data:getallparainfo
I (28423) WebSocketTask: WSPointer:1073683660
I (28423) WebSocketTask: ------
D (28424) WebSocketReader: Waiting on socket data for socket fd: 4101
I (32235) WebSocketTask: ------ msg deleted
I (32235) WebSocketTask: ------ item returned from Ringbuffer

Here you can see that the WebSocketReader waits two times for data: once at timestamp 24349 and the second time at timestamp 28424. In Wireshark I can see that the browser sends correct Websocket telegrams to the ESP, lwip correctly ACKs them but they are not listed in the log. So, unfortunately the problem remains the same: after the first correctly received websocket telegram all following websocket telegrams get lost between lwip and HttpServer.

I think I'd like to see all the code in context. I'm imaging you have created an HttpServer ... I'm then imagine you have created a handler for a path (what is the path you are using?).

I'm imagining it would be something like:

static void wsOpenHandler(HttpRequest* pRequest, HttpResponse* pResponse) {
    if (!pRequest->isWebsocket()) {
        return
    }

    WebSocket *pWebSocket = pRequest->getWebSocket();
    pWebSocket->setHandler(new MyWebSocketHandler());
}

So what you are doing is receiving an HTTP request at a path ... asking the question ... are you a WebSocket ...? and if yes, attaching a handler to that websocket.

The handler will have methods such as:

onMessage(WebSocketInputStreambuf* pWebSocketInputStreambuf)

Now what I'd like to see is the implementation of the onMessage function that is invoked when a message is received.

This is my onMessage Handler:

void onMessage(WebSocketInputStreambuf* pWebSocketInputStreambuf, WebSocket *pWebSocket) {
        static char tag[] = "MyWebsocketHandler";
        std::string *pmsg = new std::string("");
        std::ostringstream ss;
        ss << pWebSocketInputStreambuf;
        *pmsg = ss.str();
        ESP_LOGD(tag, "MyWebSocketHandler: Data length: %d; %s", (*pmsg).length(), (*pmsg).c_str());
        WebSocketMessage WsMsg;
        WsMsg.setData(pmsg);
        WsMsg.setWebSocket(pWebSocket);
        ESP_LOGI(tag, "Pointer to Websocket: %d; size: %d",(int)WsMsg.getWebSocket(), sizeof(WsMsg));
        if(pdFALSE == xRingbufferSendFromISR(WebSocket_rx_queue, (void *)(&WsMsg), sizeof(WsMsg), 0))
            ESP_LOGE(tag, "Received data could not be queued");
        xRingbufferPrintInfo(WebSocket_rx_queue);
        pWebSocket->send(*pmsg, WebSocket::SEND_TYPE_TEXT);
    } // onMessage

this is the installation of the PathHandler:

pHttpServer->addPathHandler(HttpRequest::HTTP_METHOD_GET, "/api/v1/ws", wsHandler);

this is my wsHandler function:

static void wsHandler(HttpRequest* pRequest, HttpResponse* pResponse) {
    static char tag[] = "wsHandler";
    ESP_LOGD(tag, "Request-Path: %s", pRequest->getPath().c_str());
    WebSocket *pWebSocket = pRequest->getWebSocket();
    ESP_LOGD(tag, "m_pWebSocket: %x", (unsigned int)pWebSocket);
    WebSocketHandler *pMyWebSocketHandler = new MyWebSocketHandler();
    pWebSocket->setHandler(pMyWebSocketHandler);
    pWebSocket->send("Hello", WebSocket::SEND_TYPE_TEXT);
}

What we have is indeed a good puzzle. I am not yet seeing the underlying nature of the problem but I have some ideas to make progress and while we still have ideas, we aren't stuck. Let me see if I can summarize the problem and you tell me if you agree with the story or see any flaws in my mental model of what is going on.

You are writing an ESP32 WebSocket based application. To that end you are using the HttpServer classes supplied in this repository. The HttpServer classes allow an ESP32 C++ application to start listening on an port on the ESP32 for incoming HTTP requests from a browser. This is a necessary start for being a WebSocket server.

In your logic for the HTTP Server, you are adding a path handler for "/api/v1/ws". What this means is that whenever there is a new HTTP request to the IP address and port of your ESP32 at path "/api/v1/ws", that path handler logic that you provide will be invoked.

Your browser now makes a WebSocket initiation request to "/api/v1/ws" which causes the path handler to be invoked.

In the path handler, you create an instance of the WebSocketHandler you care about and attach that to the newly received WebSocket instance and that is the end of the generic HTTP Path Handler.

At this point, the ESP32 has a "live" web socket connection between it and the browser. By the time the path handler has been called we now know that we have a good Web Socket. Once you have attached your Web Socket handler, it will respond to messages sent by the browser.

In your onMessage handler, you appear to do a few things.

  1. You read the whole of the incoming message into a data buffer
  2. You place a copy of the WebSocket that was just used onto a RingBuffer.
  3. You send a copy of the message back through the web socket as a new outbound message

This is the end of processing the original received message (from the code you have shown).

Your concern is that when your browser sends a message this seems to work the first time but when the browser sends message number 2, it doesn't appear to show up at the ESP32 as evidenced by the lack of a subsequent log message.

Is this a summary of the story?

Here is what I suggest to gain more data for the puzzle.

First, let us reduce the complexity. While I fully realize that your solution as a whole wants to do something with the received (inbound to ESP32) message from the browser, let us temporarily comment that out. Let us test that we can receive a sequence of serial messages from the browser. I think (to me) this means commenting out the ring buffer logic so that the story is JUST that we receive a message at the ESP32, we issue a log message and then end the onMessageHandler. Change the logic of your client so that it merely periodically sends messages one after the other (with variable time between the messages). I would call this the base case. If this doesn't work then we have dramatically narrowed the scope of our puzzle.

Next I would say let us increase the time between the browser sending new messages to the ESP32. For example, does 1 message a second work but 50 messages a second fail?

Again, let me stress that these are diagnostic techniques only ... keep a safe copy of your original code so that when and if needed we can revert back and not lose anything.

As always, we are more than prepared to work with you to resolve the issue and we humbly thank you for working with our libraries. These are open source / personal hobby projects for us and we only get to work on them on our free time. This means that it may take a while to get to conclusion of the puzzle. We would love to dedicate our time to these projects ... but unfortunately, food doesn't grow on trees (well ... wait ... that's a bad analogy because I guess it does ... ).

thank you for your long and comprehensive answer.
Using your advice I got some interesting results:

  1. Using the OnMessage handler only: it works
  2. Using additionally the Ringubffer logic:: it works
  3. Using a lengthy WebSocket.send message in my Task handling the Ringbuffer contents: it works
  4. Using a lenghty Websocket.send (similar length as in 3.) which is the output of my device (read via serial interface): it does not work!

So, the problem arises as soon as I send the data which I retrieve from my device via serial interface. Now the good question is: why? I see the following possiblities:
a) there is some interference of the UART code with the WebSocket code
b) the UART code task is running on another core than the WebSocket code -> issue here?
c) The data I receive from my device is not always TEXT (as it should be)

Do you see an additional possible reason? Which is the most likely problem from your point of view? a, b or c?

Hi, sorry for interfering. Point a) you test by sending random generated or prepared data instead from UART. b) you can test by running whole code just on 1 core (menuconfig settings or xCreateTaskPinnedToCore). c) can be tested similar to test a) (random or prepared data)

Test b) I did already. I now have all tasks on the same core. Now I am able to start the data transmission from the uart via my handler over the websocket to the browser a second time. But: during this second transmission I get an Guru Meditation Error on core 0.
Next I will try to read data via the Uart but I will send other data via WebSocket.

case c) I can now exclude because if I replace the data to be send by regular characters (e.g.: all 'a') it also does not work. So I think the problem is somewhere in the UART code. There is some code written by me; probably the error is there. I will check this now.

Until now I could not find an error in my UART code. It is code which I am using already since several years on different microcontrollers with minor modifications only.
But I now saw that I might have a totally different issue: During the startup phase the log shows a strange error message (in yellow) which I did not notice until now:

0x40080000: _WindowOverflow4 at E:/msys32/opt/esp/esp-idf/components/freertos/xtensa_vectors.S:1685

and

0x400d0018: _flash_cache_start at ??:?

and

0x4008bfa4: bb_init at ??:?

what does that mean?

For the last post ... lets create a new issue on this ... it doesn't feel at all related to the original issue.

o.k. I will do.

Problem solved!!!
It was not related to my UART code but to another function I coded. I am not 100% sure what exactly was the problem but I assume the following: I passed the WebSocket to another function (not a pointer to the WebSocket). Within this function I used the WebSocket for sending data using WebSocket::send(...); what perfectly worked. Now I assume that when this function ends the destructor of the WebSocket class is invoked and thus the WebSocket closed.
Now I changed my code and I am passing a pointer to the WebSocket to my function and all works well.

Shame on me for not finding this problem on my own and that I bothered you for this. I think now I gambled away the beer I deserved last time...

I'm just happy the problem is resolved. When a puzzle is complete, I like to look back at it and ask to see if there was a methodology that could have been applied to get to the solution quicker. I think this one sounds like a "minimization" story. If we had reduced function over and over again until it worked we would have found (at the point we eliminated the UART code) that it magically started working. This "divide and conquer" approach is certainly one way to resolve an issue.

When things get tricky, I commonly ask for "the minimum re-createable problem" ...which is the smallest code set a user can create that demonstrates the problem. Some users (NOT YOU) push back with comments like "Why are YOU making me work ... just solve MY problem?". 50% of the time when a user tries to recreate in the small, they can't ... what makes me sad is the 50% of the time when they can and then Ive made them work to help solve the problems I created in the first place.

Anyway ... I'm just delighted all is working.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Lakoja picture Lakoja  ยท  8Comments

wegunterjr picture wegunterjr  ยท  7Comments

frankipl picture frankipl  ยท  8Comments

Tavo7995 picture Tavo7995  ยท  7Comments

monteslu picture monteslu  ยท  4Comments