Nodemcu-firmware: HTTP module and SSL don't play nice on dev

Created on 6 Jan 2017  ·  127Comments  ·  Source: nodemcu/nodemcu-firmware

When I happened to see another (or a new) HTTP & SSL issue the other day I went "Grrr, why me again?". Some of you may remember 😉

TL;DR

Many HTTPS requests from the http module fail while connecting to the same resources with the net/TLS module _usually_ succeeds.

Test code

function test(host, path)
  local url = "https://" .. host .. path;
  http.get(url, nil, function(code, data)
    if (code < 0) then
      print("HTTP request to " .. url .. " failed")
    else
      print("HTTP request to " .. url .. " succeeded")
    end
  end)
  local srv = tls.createConnection(net.TCP, 0)
  srv:on("receive", function(sck, c) print("net/TLS to " .. url .. " succeeded") end)
  srv:on("connection", function(sck, c)
    sck:send("GET " .. path .. " HTTP/1.1\r\nHost: " .. host .. "\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n")
  end)
  srv:connect(443, host)
end
test("raw.githubusercontent.com", "/espressif/esptool/master/MANIFEST.in")

Test result
net/TLS to https://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.in succeeded HTTP client: Connection timeout
Then I checked heap: 25312. That was suspiciously low, I started with ~44k, so I ran test("raw.githubusercontent.com", "/espressif/esptool/master/MANIFEST.in") again and got

E:M 528
E:M 272
HTTP client: Disconnected with error: 46
HTTP client: Connection timeout
HTTP client: Connection timeout
````
-> no successful feedback from net/TLS code anymore.

Does "HTTP client: Disconnected with error: 46" indicate that the client was still maintaining the previous (failed) connection which it tried to kill first? I have my doubts because I sometimes also see this when the test runs after a clean reboot.

I tested a few more URLs, each after a clean reboot with both `http` and `net` modules.

| URL        | http           | net/TLS  |
| ------------- |:-------------:| -----:|
| https://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.in      | ❌    | ✅  |
| https://httpbin.org/ip      | ❌   |  ❌ no output at all, not even error|
| https://nodemcu-build.com | ❌       |  ✅  |
| https://clients5.google.com/pagead/drt/dn/ | ❌       |  ✅  |

### NodeMCU version
```lua
NodeMCU custom build by frightanic.com
    branch: dev
    commit: 5425adefff62f9ea2094e3e4581a79f1424e4433
    SSL: true
    modules: file,gpio,http,net,node,tmr,uart,wifi,tls
 build  built on: 2017-01-06 19:39
 powered by Lua 5.1.4 on SDK 2.0.0(656edbf)

Hardware

NodeMCU devkit v2

bug

Most helpful comment

I will put up a $500 bounty to get this fixed. This is really important to us. I have to believe that it is a serious issue for anyone who needs to send secure data.

All 127 comments

I'm not sure if this is the same issue, but I'm unable to make any https requests at all on the 2.0.0 firmware. This worked on the previous version. Any time I try a GET or POST to a https URL I get HTTP client errors. Am I missing something here?

=http.get('https://google.com')
> HTTP client: Disconnected with error: 8
HTTP client: Connection timeout
HTTP client: Connection timeout

NodeMCU version:

NodeMCU custom build by frightanic.com
    branch: master
    commit: b96e31477ca1e207aa1c0cdc334539b1f7d3a7f0
    SSL: true
    modules: file,gpio,http,net,node,tmr,uart,wifi,tls
 build  built on: 2017-02-20 23:43
 powered by Lua 5.1.4 on SDK 2.0.0(656edbf)

I'm not sure if this is the same issue

I'm quite convinced it is. My table with URLs also contains a Google URL.

@marcelstoer I'm relieved that I'm not the only one having issues with the http library and HTTPS requests. I have not had _any_ success at all on the 2.0.0 SDK (built from master) but it does work ok on 1.5.4.1. Do you have any indication where the problem may be? I'm happy to help as much as I can but I'm not sure of the bug reporting/fixing process for this project. It looks like you've contributed a ton to NodeMCU development (thanks for this!!) so can you let me know how else I can help get this resolved?

It looks like you've contributed a ton to NodeMCU development (thanks for this!!) so can you let me know how else I can help get this resolved?

Sorry Nate, I do pretty much everything around here - except firmware coding 😞 The last _substantial_ contributions to the module were from @pjsg and @luismfonseca.

I'm happy to help as much as I can but I'm not sure of the bug reporting/fixing process for this project.

Well, the Lua veneer for the HTTP library is here: https://github.com/nodemcu/nodemcu-firmware/blob/dev/app/modules/http.c. The library itself is here: https://github.com/nodemcu/nodemcu-firmware/tree/dev/app/http. The last commit that tinkered with HTTPS is a592af7ab1da5dab0b656f2365d60548df1e49b4 from @djphoenix but it still looks fine to me.
If you want to dig into code and start building firmware have a look at http://nodemcu.readthedocs.io/en/latest/en/build/ (Linux/Linux VM or Docker from yours truly).

@marcelstoer Thanks for the pointers. I looked around in the code and see that SSL support is flagged by the presence of the CLIENT_SSL_ENABLE constant. It _seems like_ maybe this constant is not getting defined when building but I haven't verified this yet. It looks like the constant is commented out here: https://github.com/nodemcu/nodemcu-firmware/blob/master/app/include/user_config.h#L67

How does your cloud build service work? Is there some process that un-comments this definition when the SSL/TLS package is enabled? Are we sure that's working? I noticed in the tools/pr-build.sh script there's something that uncomments the line, but maybe that's only for dev builds?

Is there some process that un-comments this definition when the SSL/TLS package is enabled?

Correct.

Are we sure that's working?

Yes, with near certainty. I'm quite convinced that I tested with images built from my cloud builder and with manually built images (Docker). I will test again though just to be sure.

I noticed in the tools/pr-build.sh script there's something that uncomments the line, but maybe that's only for dev builds?

Almost. This is triggered from .travis.yml when someone creates a pull request. Having an automatic CI build with all modules and SSL enabled gives us greater confidence that a PR won't break stuff even though we don't have any unit or integration tests.

I tried building the firmware myself tonight with your docker build project and had the same problems. So it seems there is an actual bug somewhere (not in the build process).

even though we don't have any unit or integration tests.

😢

I captured some output from debug mode that I wanted to share. My app is doing a https POST to a https-only API, https://graph-na02-useast1.api.smartthings.com. This is the result every time:

client handshake ok!
client's data invalid protocol
Reason:[-0x7880]
HTTP client: Disconnected with error: 8
HTTP client: Connection timeout

I'm able to reproduce the problem with a simple GET to https://google.com:

=http.get("https://google.com")
> client handshake start.
please start sntp first !
please start sntp first !
client handshake ok!
client's data invalid protocol
Reason:[-0x7880]
HTTP client: Disconnected with error: 8
HTTP client: Connection timeout
HTTP client: Connection timeout

Here you can see that error 0x7880 is "The peer notified us that the connection is going to be closed." I'm not really sure where to look from here. Any ideas?

Is there a documentation somewhere about espconn error codes that are passed to http_error_callback?

I'm afraid this isn't an appropriate place to ask this question, but I'm getting desperate.
I'm having this same issue - https requests aren't working. I'd like to just downgrade and get a build using 1.5.1 if I can, because the scripts I'm using were developed under that SDK, but I am not sure how to do that. This product was originally made using a build from nodemcu-build.com, but I don't have access to that bin file and I'm not sure how to recreate it. It looks like that site doesn't let you specify anything other than the current master or dev branch. I found a "how to compile NodeMCU" page, but those instructions use Linux (which I have access to if required, but it's not one of my strengths). Help!

@Kaiser442 The simplest thing to do to get an older firmware is use @marcelstoer's Docker build container: https://hub.docker.com/r/marcelstoer/nodemcu-build/

You can check out whatever branch or tag of the firmware that you want from git and then edit the app/include/user_modules.h and app/include/user_config.h files to choose the packages/options you want. Then build the firmware using the Docker image.

The client's data invalid protocol appears to come from the SDKs libssl.a. Does anyone have the ability to get a packet trace of an attempted connection? Ideally in a way that gives you the ephemeral keys so we can look inside the encrypted stream. It might also be an idea to double-check the available cipher suites against what the relevant servers are configured to use/accept.

I'm completely not in ESP{8266,32}-land at the moment, but if someone has traces I can probably find some to have a quick look.

@marcelstoer espconn error codes can be found in sdk/*/include/espconn.h, though they're not very informative. In this case, 8 = connection aborted. The Reason: -0x7880 is from mbedTLS, and is simply the notification from the remote side that it's going to terminate the connection.

Another idea would be for someone to play with openssl s_server in seriously verbose mode, and have NodeMCU connect to it - that might give some good clues.

@jmattsson there are debugging flags in user_mbedtls.h so you can enable each log level that you want. Can't explain it carefully right now. So I suspect there are fatal architecture issues in http module because it's "don't play nice" for plain requests too. I can debug it carefully some later (maybe in this weekend).

@djphoenix Have you found something interesting?

This is a key issue for us. Can anyone help resolve this? I don't know if this is appropriate, but would a bounty help improve the priority of getting this fixed?

I have been using the frozen 1.5.4.1 branch for now due to this problem. Even sending REST api calls via TCP connections will fail (in many cases) using this branch. So there isn't really an alternative. Both TCP and HTTP connections don't work well for secured connections (it's a hit & miss depending on the server you connect to)

I would contribute $$ to a bounty to get this fixed. I'm also stuck on the 1.5.4.1 branch for my project due to this issue.

I will put up a $500 bounty to get this fixed. This is really important to us. I have to believe that it is a serious issue for anyone who needs to send secure data.

@heythisisnate will you add something to the bounty to see if we can get this issue resolved?

Wow @Jonathan411 that's quite a generous bounty! I can't afford _that_ much right now, but I'd be willing to chip in 0.035 BTC (~$40 USD) to the developer that gets a PR merged that fixes this issue.

Hey guys, I will also contribute to the bounty with a 50$ USD for anyone who can solve this issue...though I am hoping that maybe the next SDK will address it. Refer to issue #1810.

@heythisisnate , @dtran123 thanks for the bounty support, every bit helps!

Hi,

I've looked into the issue a little bit because I was experiencing similar errors. In fact I couldn't use the tls module in a consistent fashion. I tried the dev branch of the firmware mostly (with some tests with 'master' branch and some tests with the 2.0.0.0 tag from February) and I used the docker build process. DEVELOP_VERSION was set to true I've added some extra os_printf's to the mbedtls_parse_internal function in mbedtls's espconn_mbedtls.c. I also commented out line 88 and 89 in mbedtls's debug.c (this enables verbose debugging for mbedtls, which seems to be badly implemented - or I didn't use MBEDTLS_DEBUG_C properly somehow.). I flashed the integer binary.

First of all, to do a decent SSL certificate chain verification, I needed the correct time. This requires the rtctime, rtcmem and sntp modules. After compiling and flashing, the below commands can set up time (with a 1000 second recurring sync according to the documentation):

sntp.sync("pool.ntp.org",function(sec,us,server,info) print ("Seconds: "..sec.." Server: "..server.." Stratum: "..info.stratum) end, function(errorcode,info) print ("SNTP errorcode: "..errorcode.." Info: "..info) end, true)

After running this command, you should see the "Seconds: ..." line filled with the current GMT time in EPOCH format. The "rtctime.get()" command should give the same result. (Use epochconverter.com to convert to a readable format.)

After this, I tried to add the certificate authorities to the trusted certificates list using tls.cert.verify, like this:

tls.cert.verify([[
-----BEGIN CERTIFICATE-----
MIImyfirstcertificatedata...
-----END CERTIFICATE-----
]],[[
-----BEGIN CERTIFICATE-----
MIImysecondcertificatedata...
-----END CERTIFICATE-----
]])

Note that the documentation states that multiple CA certificates can be added by comma-separated strings. This might be essential because I didn't know how mbedtls is validating the certificates. (Here's an interesting entry about this here.)

Afterwards you can use the tls module to open a secure connection. I hooked the "receive" and "connection" events, but none of them got called because the connection attempt fails at the SSL handshake. Namely the mbedtls_ssl_handshake(&TLSmsg->ssl) request in line 880 in espconn_mbedtls.c (within the mbedtls_parse_internal function) returns with an error message. Before I set up time, I usually received a MBEDTLS_ERR_SSL_INVALID_RECORD (0x7200) and when I realized that time might be essential and set up sntp and the rtc clock, it changed into a MBEDTLS_ERR_SSL_ALLOC_FAILED (0x7F00). So, by enabling all those required modules, I might have run out of memory... Savage...

At this point I simply moved over to MicroPython, to try to do SSL validation there. I can't believe there's no working example on the Internet about this critical feature (provided you want to develop something that hooks to a network).
I might come back to this, because it intrigues me (and MicroPython doesn't have an ntp library for time-keeping or any examples either...)

From the top of my head, I might need to run the floating point firmware instead of the integer, but it's a weak argument at this point. (It should be noted in the docs if this was the case.) If I sprung any ideas for anyone, please share.

EDIT1: Notably, I'm interested if you have to add all intermediate certificates to the TLS trusted certificates store or if mbedtls will go through the chain received from the server and it's enough to store the root CA.

EDIT2: So my current suspect is the mbedtls library itself. The http module might have issues too, but unless mbedtls is consistent, there's no point in trying to fix the http module.

Regards,
Greg

@Greg-Szabo I agree with your EDIT2. that there is no point to look into http issue before we address secured TLS tcp socket connections. I have many projects that work well (socket connection is setup successfully) with branch 1.5.4.1 (based on previous SDK) but same code will not work on master or dev branches.

_Another side of research (yay, hello there?)_

I started work with #1700, and there are no issues with memory, but...

  • TLS handshakes takes looong time, so I wrapped handshake steps with
    system_soft_wdt_stop();
    int res = mbedtls_ssl_handshake_step(&pcb->ctx);
    system_soft_wdt_restart();
  • Now my wrapper works for handshake, reading, but some problems with write - searching for solution.

So I'll share some code in near time. Stay tuned.

So I have something working now (for tls module only). And I finally resolved why httpbin.org cannot be loaded - it doesn't work at all without SNI. A new module have fix for this.

For early pre-alpha source see my tlswrap branch

@dtran123 @Greg-Szabo about problems in mbedtls - I suspect (and sure) the key problem not in mbedtls library itself but wrapper layer, espconn. Without espconn all works better :)

Can you send here some examples for more accurate testing?

Hi djphoenix,

Thanks for all the hard work on this module so far and thanks for looking into this problem.
You might be right about the wrapper, but I couldn't get to the bottom of it.
The domain I was trying is using SNI, so I guess that's why it didn't work.

Below you'll find the code I'm using. Specialities with this example:

  1. Let's Encrypt certificate - good for 90 days only, so checking expiration is important
  2. SNI
  3. root CA + intermediate CA chain (This seems the default now on most web servers.)

I've skipped the code for wifi setup...

  sntp.sync("pool.ntp.org",function(sec,us,server,info)
    print ("Seconds: "..sec.." Server: "..server.." Stratum: "..info.stratum)
    end,
    function(errorcode,info)
    print ("SNTP errorcode: "..errorcode.." Info: "..info)
    end,
    true)

base="GET / HTTP/1.1\r\nHost: demo.philosobear.com\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n"

interca=[[
-----BEGIN CERTIFICATE-----
MIIEkjCCA3qgAwIBAgIQCgFBQgAAAVOFc2oLheynCDANBgkqhkiG9w0BAQsFADA/
MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT
DkRTVCBSb290IENBIFgzMB4XDTE2MDMxNzE2NDA0NloXDTIxMDMxNzE2NDA0Nlow
SjELMAkGA1UEBhMCVVMxFjAUBgNVBAoTDUxldCdzIEVuY3J5cHQxIzAhBgNVBAMT
GkxldCdzIEVuY3J5cHQgQXV0aG9yaXR5IFgzMIIBIjANBgkqhkiG9w0BAQEFAAOC
AQ8AMIIBCgKCAQEAnNMM8FrlLke3cl03g7NoYzDq1zUmGSXhvb418XCSL7e4S0EF
q6meNQhY7LEqxGiHC6PjdeTm86dicbp5gWAf15Gan/PQeGdxyGkOlZHP/uaZ6WA8
SMx+yk13EiSdRxta67nsHjcAHJyse6cF6s5K671B5TaYucv9bTyWaN8jKkKQDIZ0
Z8h/pZq4UmEUEz9l6YKHy9v6Dlb2honzhT+Xhq+w3Brvaw2VFn3EK6BlspkENnWA
a6xK8xuQSXgvopZPKiAlKQTGdMDQMc2PMTiVFrqoM7hD8bEfwzB/onkxEz0tNvjj
/PIzark5McWvxI0NHWQWM6r6hCm21AvA2H3DkwIDAQABo4IBfTCCAXkwEgYDVR0T
AQH/BAgwBgEB/wIBADAOBgNVHQ8BAf8EBAMCAYYwfwYIKwYBBQUHAQEEczBxMDIG
CCsGAQUFBzABhiZodHRwOi8vaXNyZy50cnVzdGlkLm9jc3AuaWRlbnRydXN0LmNv
bTA7BggrBgEFBQcwAoYvaHR0cDovL2FwcHMuaWRlbnRydXN0LmNvbS9yb290cy9k
c3Ryb290Y2F4My5wN2MwHwYDVR0jBBgwFoAUxKexpHsscfrb4UuQdf/EFWCFiRAw
VAYDVR0gBE0wSzAIBgZngQwBAgEwPwYLKwYBBAGC3xMBAQEwMDAuBggrBgEFBQcC
ARYiaHR0cDovL2Nwcy5yb290LXgxLmxldHNlbmNyeXB0Lm9yZzA8BgNVHR8ENTAz
MDGgL6AthitodHRwOi8vY3JsLmlkZW50cnVzdC5jb20vRFNUUk9PVENBWDNDUkwu
Y3JsMB0GA1UdDgQWBBSoSmpjBH3duubRObemRWXv86jsoTANBgkqhkiG9w0BAQsF
AAOCAQEA3TPXEfNjWDjdGBX7CVW+dla5cEilaUcne8IkCJLxWh9KEik3JHRRHGJo
uM2VcGfl96S8TihRzZvoroed6ti6WqEBmtzw3Wodatg+VyOeph4EYpr/1wXKtx8/
wApIvJSwtmVi4MFU5aMqrSDE6ea73Mj2tcMyo5jMd6jmeWUHK8so/joWUoHOUgwu
X4Po1QYz+3dszkDqMp4fklxBwXRsW10KXzPMTZ+sOPAveyxindmjkW8lGy+QsRlG
PfZ+G6Z6h7mjem0Y+iWlkYcV4PIWL1iwBi8saCbGS5jN2p8M+X+Q7UNKEkROb3N6
KOqkqm57TH2H3eDJAkSnh6/DNFu0Qg==
-----END CERTIFICATE-----
]]

rootca=[[
-----BEGIN CERTIFICATE-----
MIIDSjCCAjKgAwIBAgIQRK+wgNajJ7qJMDmGLvhAazANBgkqhkiG9w0BAQUFADA/
MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT
DkRTVCBSb290IENBIFgzMB4XDTAwMDkzMDIxMTIxOVoXDTIxMDkzMDE0MDExNVow
PzEkMCIGA1UEChMbRGlnaXRhbCBTaWduYXR1cmUgVHJ1c3QgQ28uMRcwFQYDVQQD
Ew5EU1QgUm9vdCBDQSBYMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB
AN+v6ZdQCINXtMxiZfaQguzH0yxrMMpb7NnDfcdAwRgUi+DoM3ZJKuM/IUmTrE4O
rz5Iy2Xu/NMhD2XSKtkyj4zl93ewEnu1lcCJo6m67XMuegwGMoOifooUMM0RoOEq
OLl5CjH9UL2AZd+3UWODyOKIYepLYYHsUmu5ouJLGiifSKOeDNoJjj4XLh7dIN9b
xiqKqy69cK3FCxolkHRyxXtqqzTWMIn/5WgTe1QLyNau7Fqckh49ZLOMxt+/yUFw
7BZy1SbsOFU5Q9D8/RhcQPGX69Wam40dutolucbY38EVAjqr2m7xPi71XAicPNaD
aeQQmxkqtilX4+U9m5/wAl0CAwEAAaNCMEAwDwYDVR0TAQH/BAUwAwEB/zAOBgNV
HQ8BAf8EBAMCAQYwHQYDVR0OBBYEFMSnsaR7LHH62+FLkHX/xBVghYkQMA0GCSqG
SIb3DQEBBQUAA4IBAQCjGiybFwBcqR7uKGY3Or+Dxz9LwwmglSBd49lZRNI+DT69
ikugdB/OEIKcdBodfpga3csTS7MgROSR6cz8faXbauX+5v3gTt23ADq1cEmv8uXr
AvHRAosZy5Q6XkjEGB5YGV8eAlrwDPGxrancWYaLbumR9YbK+rlmM6pZW87ipxZz
R8srzJmwN0jP41ZL9c8PDHIyh8bwRLtTcm1D9SZImlJnt1ir/md2cXjbDaJWFBM5
JDGFoqgCWjBH4d1QB7wCCZAA62RjYJsWvIjJEubSfZGL+T0yjWW06XyxV3bqxbYo
Ob8VZRzI9neWagqNdwvYkQsEjgfbKbYK7p2CNTUQ
-----END CERTIFICATE-----
]]

print(tls.cert.verify(interca,rootca))

srv = tls.createConnection()

srv:on("receive", function(sck, c) print(c) end)
srv:on("connection", function(sck, c)
  -- Wait for connection before sending.
  sck:send(base.."\r\n")
end)

srv:connect(443,"demo.philosobear.com")
--srv:close()

I would be grateful if you gave any help with how you're troubleshooting the wrapper and mbedtls. As mentioned above, I've added extra os_printf messages to the wrapper which worked fairly well, but it didn't yield any results.
In the mbedtls library I tried to turn on debugging, but for some reason it doesn't print anything. It's supposed to print a few arrows at least when the handshake starts and finishes, but nothing. Eventually I manually added import osapi.h and started adding extra os_printf() commands everywhere where I wanted to debug, but it's a hassle.

Regards,
Greg

@Greg-Szabo OK, for my tlswrap (that replaces espconn layer at all) this test code works for me:

function test(host, path)
    local sck = tls.createConnection()
    sck:on('connection', function(s)
        print('CONN')
        s:send(
            'GET ' .. path .. ' HTTP/1.0\r\n' ..
            'Connection: close\r\n' ..
            'Host: ' .. host .. '\r\n' ..
            '\r\n'
        )
    end)
    sck:on('receive', function(s, d)
        print(d)
    end)
    sck:on('disconnection', function(s)
        print('CLOSE')
    end)
    sck:on('reconnection', function(s, e)
        print('ERR', e)
    end)
    sck:connect(443, host)
end

test('demo.philosobear.com', '/')

Two issues still exists:

  • Now I haven't finished CA verification at all: all certs becomes valid. This moment is in my TODO-list.
  • For 4K incoming buffer, there is problem with servers that sends large TLS records. TLS specifications sets buffer size as 16K, and it means that each connection should have two 16K buffers. Sad story for ESP... So it works for small messages (under 4K) anyway, and works with servers that support max_fragment_length TLS extension.

In all of that let's not forget that actually verifying the server certificate is _optional_ in establishing a connection to the server. It's an important security aspect but optional. So, I suggest we stick with the old "make it run, then make it secure, make it fast, and make it pretty" mantra 😉
As for SNI, yes axTLS doesn't support SNI but since Espressif switched to mbed TLS one should expect this to work fine.

Agree. My secured tcp connections stopped working (for a number of servers such as IFTTT) between the previous 1.5.4.1 build and the current master & dev. The code does not use certificates. So yes we should address the basic scenario without certificates. It's a hit & miss depending on which server you connect to.

Just to let everyone know that unfortunately the upgrade to SDK 2.1.0 did not fix this secure tcp socket issue. So I am still stuck with 1.5.4.1 branch till this issue is resolved as secure tcp is important for me.

@dtran123 is https working in 1.5.4.1?

//offtopic
Would be ok if we create a Telegram group to share experiences with nodemcu, stay tuned with the last news, solve some problems, get interesting info about this project... Is anyone interested in? Just like this post and I will edit with some link.

Telegram NodeMcu Group: @NodeMcu

I have found that making REST httpS requests via tcp socket to be more reliable than using the http module and this approach is less memory hungry. For my needs I don't necessarily need to use the http module. It's more code and less elegant but for now less troublesome. Until we have a solid http module that reliably supports https with or without certificates.

The http module doesn't work for me at all. I tried the tls workaround, and it works...about 5 times. Then I get E:Med. I've verified that my timer-based loop does not leak memory by removing the call to my https function and seeing that the heap never changes. But with it included, every time my function is called, I've lost about 700 bytes of heap. I close the socket--and see the confirmation message before the timer fires again.

Heap at start of routine:
send_data heap=29464
send_data heap=27952
send_data heap=27328
send_data heap=26792
send_data heap=26000
send_data heap=25472 -- Start getting E:M here
send_data heap=24760
send_data heap=23736

send_data calls https():

function https(host, path, data)
    local sck = tls.createConnection()
    sck:on('connection', function(s)
        print('connect')
        local lines = 'PUT ' .. path .. ' HTTP/1.0\r\n' ..
            'Host: ' .. host .. '\r\n' .. 
            'Connection: close\r\n' ..
            'Accept: */*\r\n' ..
            'Content-Type: application/json\r\n' ..
            'Content-Length: ' .. string.len(data) .. '\r\n' ..
            '\r\n' ..
             data .. '\r\n'
       print(lines)
       s:send(lines)
    end)
    sck:on('receive', function(s, d)
        print(d)
    end)
    sck:on('sent', function(s)
        print('sent')
        print('before sck:close heap=' ..node.heap())
        sck:close()
        print('after sck:close heap=' ..node.heap())
    end)
    sck:on('disconnection', function(s)
        print('close')
    end)
    sck:on('reconnection', function(s, e)
        print('ERR', e)
    end)
    sck:connect(443, host)
end

NodeMCU custom build by frightanic.com
branch: master
commit: c8ac5cf
SSL: true
modules: dht,file,gpio,net,node,rtctime,sntp,tmr,uart,wifi,tls
build built on: 2017-05-25 01:57
powered by Lua 5.1.4 on SDK 2.1.0(116b762)

@TerryE @pjsg @marcelstoer @devsaurus @djphoenix

Hey guys, this is a big issue. Does anyone have some time to spend on investigating and hopefully fixing this? I currently don't, much as I'd like to.

A possibly related SDK issue just created by @davydnorris is https://github.com/espressif/ESP8266_NONOS_SDK/issues/10.

Oh interesting!

If you haven't already please head over to the espressif forum and add your input to the bug I have logged there, and link this thread as well.

There's someone who has hit the exact same issue as I have trying to use the AT command firmware, so now it's been seen and is affecting people in AT, NodeMCU and NonOS SDK builds

I have had a reply from Espressif - the bug I hit is known, and they recommend using the mbedTLS library instead of SSL. Apparently it's a drop in replacement for the ESP functions - remove libssl from your list and add mbedtls

http://www.espressif.com/en/support/download/sdks-demos

@davydnorris we have moved to mbedTLS since 2.0.0...

any update?

I started having secured https problems when master and dev got upgraded to 2.0.0 SDK.
Since we upgraded to mbedTLS since 2.0.0, it appears that possibly mbedTLS is breaking things that used to work when we had SSL lib. Either that or something in the new SDK version. That is why I am still stuck with 1.5.4.1.

Hello. Someone decided this problem (I use NodeMCU firmware)? I want connect to Telegram API, but have errors:
client handshake start.
please start sntp first !
please start sntp first !
client handshake failed!
Reason:[-0x7200]

I am also having the same issue...

http.get("https://httpbin.org/ip", nil, function(code, data)
>> if (code < 0) then
>> print("HTTP request failed")
>> else
>> print(code, data)
end
end)

> client handshake start.
please start sntp first !
client handshake failed!
Reason:[-0x7780]
HTTP client: Disconnected with error: 9
HTTP client: Connection timeout

Do you have sntp running? Is your local clock set correctly. TLS requires a reasonable degree of synchronization between the local and remote clock.

If you just build the firmware with rtctime and sntp, then you just need to do

sntp.sync()

once you have network connectivity. If you are going to be running for a long time, then you may want to do

sntp.sync(nil, nil, nil, 1)

which will continually keep the clock in sync. You might argue that this should be the default behavior (and I might well agree with that!)

The answer is in the error messages posted. HTTPS can't validate a connection without valid clock to verify certificates. You'll need to run a sntp sync before making the connection. Hence the error message

please start sntp first !

Thanks or the speedy response @pjsg and @karrots !

I did see the please start sntp first ! and assumed that it was a sync error. However, the problem still exists :(

I've tried this...

sntp.sync()
http.get("https://now.httpbin.org/", nil, function (code, resp) print(code, resp) end)

and this was the response...

> client handshake start.
please start sntp first !
client handshake failed!
Reason:[-0x7780]
HTTP client: Disconnected with error: 9
HTTP client: Connection timeout

@karrots, @pjsg
Yes, I start sntp.sync() and then try tslconn:connect(443,"api.telegram.org") but I have this errors.

First time use of sntp.sync() requires a server to be specified.

https://nodemcu.readthedocs.io/en/master/en/modules/sntp/

@karrots

I've tried this and it still throws an error...

sntp.sync('pool.ntp.org',
  function(sec, usec, server)
    print("Clock Synced: "..sec..", "..usec..", "..server)
    end,
  function(error_code)
    print("Clock Sync Failed: "..sntp_connect_status_codes[error_code])
  end
)

-- wait for clock to sync

http.get("https://now.httpbin.org/", nil, function (code, resp) print(code, resp) end)

@karrots
I have two lua files.
1) sntp.lua

sntp.sync("193.27.209.20",
  function(sec, usec, server, info)
    print('sync', sec, usec, server)
  end,
  function()
   print('failed!')
  end
)

2) testHTTPS.lua

-- Start GET
tslconn = nil
tslconn = tls.createConnection()
tslconn:on("receive", function(tslconn, payloadout)
    if (string.find(payloadout, "200 OK") ~= nil) then
        print("# Receive OK.")
        print(payloadout)
    else
        print("# Bad receive.")
        print(payloadout)
    end
end)


tslconn:on("connection", function(tslconn, payloadout)
    print("# Connection OK.")
    tslconn:send("GET /" .. "ip"
        .. " HTTP/1.0\r\n"
        .. "Host: httpbin.org\r\n"
        .. "Connection: close\r\n"
        .. "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9) Gecko/2008052906 Firefox/3.0\r\n"
        .. "Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n"
        .. "\r\n")
end)

 tslconn:on("disconnection", function(tslconn, payloadout)
    tslconn:close();
    collectgarbage();
--Print Heap
    print("Heap: " .. node.heap())
--
end)

tslconn:connect(443,"httpbin.org")
--

I connect to WiFi, then start first file, and then second.

dofile("sntp.lua")
> sync  1497805055  366941  193.27.209.20

 DYNARR_DBG(dynarr_remove): element(0x3ffefff0) removed from array
dofile("testHTTPS.lua")
testHTTPS.lua loaded.
> client handshake start.
please start sntp first !
client handshake failed!
Reason:[-0x7780]
 _,_,gw = wifi.sta.getip()
sntp.sync(gw,
   function(sec, usec, server, info)
   tm = rtctime.epoch2cal(rtctime.get() + 10800)
        print(string.format("NTP: %04d-%02d-%02d %02d:%02d:%02d", tm["year"], tm["mon"], tm["day"], tm["hour"], tm["min"], tm["sec"]))
   end)
http.get("https://now.httpbin.org/", nil, function (code, resp) print(code, resp) end)

NTP: 2017-06-18 20:42:37
http.get("https://now.httpbin.org/", nil, function (code, resp) print(code, resp) end)
HTTP client: Disconnected with error: 9
HTTP client: Connection timeout
HTTP client: Connection timeout

This is getting longer and longer w/o any more clarity - quite the opposite actually. The last ten or so comments all dealt with a (seemingly obvious) timing/SNTP issue just to arrive at the same conclusion as I did back in early January. I'm really frustrated. I even caught myself thinking a split second about dumping all those comments which divert focus...won't do that of course.

Apologies @marcelstoer. I think I probably started that one. But are we any closer to an answer for this bug? I cannot even get it to work on the 1.5.4.1-final branch...?
I understand the issue lies with espressif?

@lorol: do you have anything useful to contribute?

Alright, I finally found some time to throw at this. Using @djphoenix's experimental tlswrap branch and a few tweaks on top (see https://github.com/djphoenix/nodemcu-firmware/tree/tlswrap and https://github.com/nwf/nodemcu-firmware/tree/dev-ssl), I was able to make progress!

I was testing HTTP using function hg(url) http.get(url, nil, function(...) print(...) end) end and TLS with

function sg(...)
  srv = tls.createConnection()
  srv:on("receive", function(sck, c) print(c) end)
  srv:on("disconnection", function() print("DISCON") end)
  srv:on("connection", function(sck, c) print("CONN") sck:send("GET / HTTP/1.1\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n") end)
  srv:connect(...)
  return srv
end

I observed the following results:

| URL | HTTP | tls |
| ------------------------------- | ----------------------------- | --------------- |
| https://blog.cloudflare.com | HTTP response too long | OK |
| https://pskreporter.info | SSL bad message length | OK |
| https://pskreporter.info/gen404 | OK | n/a |
| https://google.com | SSL bad message length | error on hangup |
| https://nodemcu-build.com | SSL bad message length, crash | OK |

Getting
"https://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.in"
also works fine.

The error on hangup for google.com is really quite distressing:

> srv:close()
> HANDSHAKE ERROR: 7100
3fff8618 already freed
3fff2b48 already freed

It looks like TLS is behaving better, but not perfectly so. Despite my efforts to limit the maximum SSL segment size, it looks like that is not being respected by the HTTP module (via espconn's wrapper of mbedtls). There are obviously some other problems lingering, too; it looks like tlswrap's memory management is a little, uh, iffy.

Thanks @nwf and @djphoenix !! We have at least two people who tested the tlswrap temporary fix. It seems to have helped. Would it be possible to submit this change in the dev branch for some of us to test it out too ? I would be very anxious to test it out as soon as it is made available (I don't have the skills to make a special build using the tlswrap branch). Unless someone can give me a link to the build so I can test it.

Given the memory handling errors in tlswrap, I do not think it should be merged as is. I will attempt to understand and fix it, unless @djphoenix beats me to it. :)

ETA: That said, the motion to the newer mbedtls might be sufficiently orthogonal that it could be merged independently. When I get a minute I'll test with that configuration too.

OK, doing some more testing. So others can follow along. I have cherry-picked @djphoenix's 802b92c7c7d2d0b495f1c66f3dd0ec20a1e20dd0 and 4958a4a12a16d91d58ec73652a0b1ddc4df4f6fa but not a57a80a5c3b5d6e2cfa0b403383840c909dad84d. I applied a few local patches to get mbedtls debugging working, kicked on CLIENT_SSL_ENABLE, have set MBEDTLS_SSL_MAX_CONTENT_LEN to 5120, and taught app/mbedtls/app/espconn_mbedtls.c to set a maximum fragment length of MBEDTLS_SSL_MAX_FRAG_LEN_4096 (in both places that it builds a mbedtls config). Because I care about ECC ciphers, I kicked on MBEDTLS_ECDSA_C, MBEDTLS_KEY_EXCHANGE_ECDHE_ECDSA_ENABLED, and MBEDTLS_KEY_EXCHANGE_ECDH_ECDSA_ENABLED. I am using the Lua test code as in my previous comment. With all that in place, I can now report:

| URL | HTTP | tls |
| ------------------------------- | ------------------------------------- | ------------------------- |
| https://blog.cloudflare.com | HTTP client: Response too long (8215) | OK, error on close |
| https://pskreporter.info | ssl_tls.c:3525 bad message length | OK, error on close-notify |
| https://pskreporter.info/gen404 | OK | n/a |
| https://google.com | HTTP client: Response too long (7006) | OK |
| https://nodemcu-build.com | ssl_tls.c:3525 bad message length | OK, error on close-notify |

hg("https://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.in") is OK, but again has an error on close notify.

When I write "OK", I mean "data appears to have transferred successfully"; it shouldn't be taken as a comprehensive test. "error on close notify" means that the remote server sent us a close notify alert and freaked out the local stack, which responded with its own and returned an error:

TLS<2>: ssl_tls.c:3961 got an alert message, type: [1:0]
TLS<2>: ssl_tls.c:3976 is a close notify message
TLS<1>: ssl_tls.c:3744 mbedtls_ssl_handle_message_type() returned -30848 (-0x7880)
TLS<1>: ssl_tls.c:6577 mbedtls_ssl_read_record() returned -30848 (-0x7880)
client's data invalid protocol
TLS<2>: ssl_tls.c:6906 => write close notify
TLS<2>: ssl_tls.c:4032 => send alert message
TLS<2>: ssl_tls.c:2701 => write record
TLS<2>: ssl_tls.c:1258 => encrypt buf
TLS<2>: ssl_tls.c:1560 <= encrypt buf
TLS<2>: ssl_tls.c:2416 => flush output
TLS<2>: ssl_tls.c:2435 message length: 31, out_left: 31
TLS<2>: ssl_tls.c:2441 ssl->f_send() returned 31 (-0xffffffe1)
TLS<2>: ssl_tls.c:2460 <= flush output
TLS<2>: ssl_tls.c:2850 <= write record
TLS<2>: ssl_tls.c:4045 <= send alert message
TLS<2>: ssl_tls.c:6922 <= write close notify
Reason:[-0x7880]
TLS<2>: ssl_tls.c:7064 => free
TLS<2>: ssl_tls.c:7129 <= free

"error on close" is some other error when the connection is shut down, such as:

HTTP client: Sending request header
HTTP client: All sent
TLS<2>: ssl_tls.c:6522 => read
TLS<2>: ssl_tls.c:3728 => read record
TLS<2>: ssl_tls.c:2208 => fetch input
TLS<2>: ssl_tls.c:2366 in_left: 0, nb_want: 5
TLS<2>: ssl_tls.c:2390 in_left: 0, nb_want: 5
TLS<2>: ssl_tls.c:2391 ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
TLS<2>: ssl_tls.c:2403 <= fetch input
TLS<2>: ssl_tls.c:2208 => fetch input
TLS<2>: ssl_tls.c:2366 in_left: 5, nb_want: 99
TLS<2>: ssl_tls.c:2390 in_left: 5, nb_want: 99
TLS<2>: ssl_tls.c:2391 ssl->f_recv(_timeout)() returned 94 (-0xffffffa2)
TLS<2>: ssl_tls.c:2403 <= fetch input
TLS<2>: ssl_tls.c:1576 => decrypt buf
TLS<2>: ssl_tls.c:2051 <= decrypt buf
TLS<2>: ssl_tls.c:3753 <= read record
TLS<2>: ssl_tls.c:6762 <= read
TLS<2>: ssl_tls.c:6522 => read
TLS<2>: ssl_tls.c:3728 => read record
TLS<2>: ssl_tls.c:2208 => fetch input
TLS<2>: ssl_tls.c:2366 in_left: 0, nb_want: 5
TLS<2>: ssl_tls.c:2390 in_left: 0, nb_want: 5
client's data invalid protocol
TLS<2>: ssl_tls.c:6906 => write close notify
TLS<2>: ssl_tls.c:4032 => send alert message
TLS<2>: ssl_tls.c:2701 => write record
TLS<2>: ssl_tls.c:1258 => encrypt buf
TLS<2>: ssl_tls.c:1560 <= encrypt buf
TLS<2>: ssl_tls.c:2416 => flush output
TLS<2>: ssl_tls.c:2435 message length: 31, out_left: 31
TLS<2>: ssl_tls.c:2441 ssl->f_send() returned 31 (-0xffffffe1)
TLS<2>: ssl_tls.c:2460 <= flush output
TLS<2>: ssl_tls.c:2850 <= write record
TLS<2>: ssl_tls.c:4045 <= send alert message
TLS<2>: ssl_tls.c:6922 <= write close notify
Reason:[-0x7880]
TLS<2>: ssl_tls.c:7064 => free
TLS<2>: ssl_tls.c:7129 <= free
HTTP client: Disconnected with error: 8
HTTP client: Connection timeout
HTTP client: Calling disconnect
HTTP client: manually Calling disconnect callback due to error -12
HTTP client: Disconnected

I was able to successfully connect, send, and receive from a HTTP server on my desktop using ECDSA keys (nodemcu-side verification is, of course, off at the moment), though it gave an error on close (in fact, the outcome above).

It looks like I am able to at least hobble along and exchange short messages over TLS and HTTP. What more would be useful for people to know?

@nwf good stuff, thanks! Do you have any idea what might cause the

ssl_tls.c:3525 bad message length

I first thought it might be related to SNI but it can't be as https://pskreporter.info fails while https://pskreporter.info/gen404 doesn't. That's irritating.

Certificate-wise both servers look fine:
https://www.ssllabs.com/ssltest/analyze.html?d=nodemcu-build.com
https://www.ssllabs.com/ssltest/analyze.html?d=pskreporter.info

This may be completely irrelevant but I have seen the "bad message length" error when clients attempt to perform normal comms on a secure connection and don't understand the encrypted message or handshake returned.

It couldn't be anything as obvious as that could it?

@marcelstoer I don't yet know. Some of the earlier "bad message length" messages are from servers that don't understand the maximum fragment length option (e.g. google), which is why I had to raise the buffer size from 4096 to 5120, but that doesn't look to be the case in the tests I've run most recently.

I am not in a very good position to capture the packets from the board for analysis, so I've been doing mostly black-box testing. I'll try again when I'm on a different network and see if there's anything interesting reported by wireshark's decoders.

@davydnorris An interesting possibility, to be sure, and maybe the cause of some, but it doesn't explain the pskreporter discrepancy above.

Hi there.
Firstly, thanks to @nwf for follow-up my first bits of tlswrap.
I currently very busy on my primary work, so can't spend enough time on my favourite opensource project, so sad...

So about "buffer size" and "message length" - an important thing is "max frame size negotiation" has very poor support in server-side. Most servers use 16K buffers for TLS I/O (as in specs), so any TLS connection should have at least 32K buffer, and there is no way around.

@djphoenix Hi! Yeah, I see that. I think there is utility in keeping TLS's utilization as small as possible, but perhaps the docs should note that the implementation in nodemcu is not fully spec conforming (by default, unless one raises the MBEDTLS_SSL_MAX_CONTENT_LEN) and should, therefore, not be assumed to be able to communicate with arbitrary servers. Fortunately, it doesn't seem like nodemcu is often asked to fetch from servers unknown to the author, so perhaps this is an acceptable compromise. Perhaps a maximum fragment length overrun is also cause to scream a little louder on the console?

In any case, I will throw some more time at this next week; ideally testing will become automated and will grow to include MQTT (at least, against my own broker) and cert verification. :)

Hi,

if there is a need for a large buffer that doesn't really fit in memory, I
have a (crazy) idea: replace the pure memory buffer with a kind of software
swap file on the flash. From the point of view of the buffer user (in this
case, mbedtls), the buffer would look bigger than it really is. From the
point of view of real usage, you could specify how much of the total buffer
is really kept in memory at any given time (e.g.: 2KB in memory, while the
total buffer is e.g.: 16KB).

Of course, it would not be easy at all to implement, especially in C, and
would require changes to how the buffer is accessed (who knows where inside
mbedtls), and also it would be slow due to flash I/O times, but maybe it
would make it viable within the limitations of the ESP.

Thoughts?

@nwf Were you able to make progress this past week ? Also, would it be possible to send us a link to the latest custom build so we can try it out. thx.

Great to see this gaining traction!
Massive thankyou to all that are helping out, it is very much appreciated! :)

If there is anything I can do to help with testing, please let me know. I have a Microsoft Azure MQTT Hub set up (which will only work with devices over a secure SSL connection) as well as a Mosquitto MQTT broker available for trying the new patches.

Also - I wondered if the older, frozen version (1.5.4.1) has this SSL issue? If not, is it possible to use that whilst the new firmware version gets fixed? I haven't had much luck getting it working (using the custom firmware builder) and I'm still faced with the same errors...?

@georeb SSL seems to work fine in 1.5.4.1 build for both tcp socket connections and mqtt. Since the upgrade to the new 2.x SDK, secured connections stopped working (for the most part).

I have not had much time, but I spent a(n) (un)pleasant evening digging through debug messages and TLS packets. If anyone wants to follow along, my tree is up as commit nwf/nodemcu-firmware@d7f83d3305225675c5543c9a4055b165362b61d6 and the several proceeding. Some of those may be worth carving out as separate PRs, especially the "-Wall reduction" one. (It is merely a reduction; the tree is not -Wall clean afterwards.) Please expect this tree to be ephemeral and rewritten as I find time.

ETA: The reason for all the munging about with clocks is that I am now able to verify the timestamp on my test ECDSA self-signed cert. That's a kind of progress, I guess. It's still fragile, tho'.

I have encountered the same problem - http requests go through just fine, but https do not. Sorry it's not helpful, I just want to bump this thread and to let everyone know it's still a problem. I use the 2.1.0 build.

I'm glad I found this thread, I was pulling my hair out trying to use the DarkSky API for sunrise and sunset times and it requires HTTPS. Is there anything that I can capture to help with this?

@kirkryan If you need sunrise/sunset times why don't you let the ESP calculate it itself?
I have a module for this https://github.com/vsky279/nodemcu-firmware/blob/sun/app/modules/sun.c
I did not try to pull it into dev but if there was interest I could prepare a PR.

@kirkryan If you are equipped to build your own firmware images, you can try my experimental branch as mentioned in #2049 and peer into the TLS state machine. It is possible that the bumped version of mbedtls will solve the problem for you, but do note that since @djphoenix did his import of the later version, CVEs have been found in mbedtls which, I think, impact nodemcu's usage of it. Nobody, so far as I know, has attempted further updates.

@nwf Is there any way that we can safely submit a version of your experimental branch fix into the dev branch so we all have a chance to test it out ? Is there a lot of work to derisk it for dev branch use ?

@dtran123, you have a chance of testing it out by fetching the branch that nwf mentions into your own local docker image and doing the build there.

Thanks @TerryE . I followed your advice and learned how to use the docker build utility. I cloned @nwf dev-ssl branch and successfully built a nodemcu build.

I am happy to report that secure mqtt and secure tcp connections seem to work better !

Before: secure mqtt fails to publish to cloudmqtt server.
After: secure mqtt publish success to cloudmqtt server.

Before: secure connection to thingspeak failed
After: secure connection to thingspeak pass

However, I am still having problems with IFTTT. Here is the lua code and traceback:

Lua code:

print(wifi.sta.getip())
srv = tls.createConnection()
srv:on("receive", function(sck, c) print("received!") print(c)
  srv:close()
end)
srv:on("sent", function(sck)
  print("sent!")
end)
srv:on("connection", function(sck, c)
  print("connected!")
  srv:send("GET /trigger/motion/with/key/encryptedhereIQmf2uum7XOJxjEQWPBVMDfQaWgDoUqL HTTP/1.1\r\nHost: maker.ifttt.com\r\nConnection: keep-alive\r\nAccept: /\r\n\r\n")
end)
srv:on("disconnection", function(conn, c) print("disconnected!") end)
srv:on("reconnection", function(conn, c) print("reconnected!")
end)
srv:connect(443, 'maker.ifttt.com')

TRACEBACK:

dofile("test_ifttt.lua")
192.168.0.108 255.255.255.0 192.168.0.1
client handshake start.
TLS<2> (heap=22328): ssl_tls.c:6337 => handshake
TLS<2> (heap=22328): ssl_cli.c:3279 client state: 0
TLS<2> (heap=22328): ssl_tls.c:2416 => flush output
TLS<2> (heap=22328): ssl_tls.c:2428 <= flush output
TLS<2> (heap=22328): ssl_cli.c:3279 client state: 1
TLS<2> (heap=22328): ssl_tls.c:2416 => flush output
TLS<2> (heap=22328): ssl_tls.c:2428 <= flush output
TLS<2> (heap=22328): ssl_cli.c:717 => write client hello
mbedtls_time: 0
TLS<2> (heap=22328): ssl_tls.c:2701 => write record
TLS<2> (heap=22328): ssl_tls.c:2416 => flush output
TLS<2> (heap=22328): ssl_tls.c:2435 message length: 293, out_left: 293
TLS<2> (heap=20712): ssl_tls.c:2441 ssl->f_send() returned 293 (-0xfffffedb)
TLS<2> (heap=20712): ssl_tls.c:2460 <= flush output
TLS<2> (heap=20712): ssl_tls.c:2850 <= write record
TLS<2> (heap=20712): ssl_cli.c:1049 <= write client hello
TLS<2> (heap=20712): ssl_cli.c:3279 client state: 2
TLS<2> (heap=20712): ssl_tls.c:2416 => flush output
TLS<2> (heap=20712): ssl_tls.c:2428 <= flush output
TLS<2> (heap=20712): ssl_cli.c:1410 => parse server hello
TLS<2> (heap=20712): ssl_tls.c:3728 => read record
TLS<2> (heap=20712): ssl_tls.c:2208 => fetch input
TLS<2> (heap=20712): ssl_tls.c:2366 in_left: 0, nb_want: 5
TLS<2> (heap=20712): ssl_tls.c:2390 in_left: 0, nb_want: 5
TLS<2> (heap=20712): ssl_tls.c:6347 <= handshake
TLS<2> (heap=22328): ssl_tls.c:6337 => handshake
TLS<2> (heap=22328): ssl_cli.c:3279 client state: 2
TLS<2> (heap=22328): ssl_tls.c:2416 => flush output
TLS<2> (heap=22328): ssl_tls.c:2428 <= flush output
TLS<2> (heap=22328): ssl_cli.c:1410 => parse server hello
TLS<2> (heap=22328): ssl_tls.c:3728 => read record
TLS<2> (heap=22328): ssl_tls.c:2208 => fetch input
TLS<2> (heap=22328): ssl_tls.c:2366 in_left: 0, nb_want: 5
TLS<2> (heap=22328): ssl_tls.c:2390 in_left: 0, nb_want: 5
TLS<2> (heap=22328): ssl_tls.c:2391 ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
TLS<2> (heap=22328): ssl_tls.c:2403 <= fetch input
TLS<2> (heap=22328): ssl_tls.c:2208 => fetch input
TLS<2> (heap=22328): ssl_tls.c:2366 in_left: 5, nb_want: 70
TLS<2> (heap=22328): ssl_tls.c:2390 in_left: 5, nb_want: 70
TLS<2> (heap=22328): ssl_tls.c:2391 ssl->f_recv(_timeout)() returned 65 (-0xffffffbf)
TLS<2> (heap=22328): ssl_tls.c:2403 <= fetch input
TLS<2> (heap=22328): ssl_tls.c:3753 <= read record
mbedtls_time: 0
TLS<2> (heap=22328): ssl_cli.c:1671 server hello, total extension length: 21
TLS<2> (heap=22328): ssl_cli.c:1859 <= parse server hello
TLS<2> (heap=22328): ssl_cli.c:3279 client state: 3
TLS<2> (heap=22328): ssl_tls.c:2416 => flush output
TLS<2> (heap=22328): ssl_tls.c:2428 <= flush output
TLS<2> (heap=22328): ssl_tls.c:4223 => parse certificate
TLS<2> (heap=22328): ssl_tls.c:3728 => read record
TLS<2> (heap=22328): ssl_tls.c:2208 => fetch input
TLS<2> (heap=22328): ssl_tls.c:2366 in_left: 0, nb_want: 5
TLS<2> (heap=22328): ssl_tls.c:2390 in_left: 0, nb_want: 5
TLS<2> (heap=22328): ssl_tls.c:2391 ssl->f_recv(_timeout)() returned 5 (-0xfffffffb)
TLS<2> (heap=22328): ssl_tls.c:2403 <= fetch input
TLS<1> (heap=22328): ssl_tls.c:3525 bad message length: in_msglen=4758
TLS<1> (heap=22328): ssl_tls.c:3734 mbedtls_ssl_read_record_layer() returned -29184 (-0x7200)
TLS<1> (heap=22328): ssl_tls.c:4261 mbedtls_ssl_read_record() returned -29184 (-0x7200)
TLS<2> (heap=22328): ssl_tls.c:6347 <= handshake
client handshake failed!
Reason:[-0x7200]
TLS<2> (heap=25824): ssl_tls.c:7065 => free
TLS<2> (heap=36664): ssl_tls.c:7130 <= free
reconnected!

Can someone check if they have similar problem with secured connections to IFTTT ?

learned how to use the docker build utility

Hope that was a pleasant experience 😉

I am happy to report that secure mqtt and secure tcp connections seem to work better !

Great, thanks for testing! Working _better_ is better than nothing - even if the fix isn't complete. @nwf any chance to carve out a PR of what you already have anytime soon?

Haven't got into debugging TLS yet, but two points:

             srv:send("GET /trigger/ ...

Use the parameter, sck in this case, and not the upval srv. This can create memory leaks.

TLS<1> (heap=22328): ssl_tls.c:3525 bad message length: in_msglen=4758

The length seems suspicious to me. Check whether the ESP8266 mbedts implementation can handle records of this size. I suspect not.

TLS<1> (heap=22328): ssl_tls.c:3525 bad message length: in_msglen=4758

The length seems suspicious to me.

You're expecting something smaller than the MTU size, right (I do)?

I'm thrilled to see some movement on this issue. Thanks @dtran123!
I'm increasing my bounty offer to $500 USD because I _really_ want to see this issue resolved in the latest firmware.

Note that my lua code above works fine if I swap the URL from IFTTT to Thingspeak so the memory leak issue doesn't seem to cause an issue other cases.

I am hoping to spend the next few days characterizing further @nwf dev-ssl branch to see how robust is the improvement for various destination servers.

It is really my wish that we finally support secured tcp http connections on the ESP8266. I can always use the ESP32 for that but it would be awesome that the ESP8266 could do it too reliably (or at least for the core usecases).

For those interested in getting a custom build with the @nwf improvement to test, let me know. It would be my pleasure to share it :)

Forgot to give a great thanks to @marcelstoer for providing such a great tool. The docker util is great for someone like me who is a casual hobbyist programmer. It allows me to avoid the toolchains and such and tweak a few things. Also the documentation was great...I was able to follow it and get my first custom build. BTW, I did it on MS Windows OS and it was fine.

OK, based on a quick static analysis -- don't have time to get into testing this; sorry -- the error above comes from a check here that there is enough room left in the SSL buffer to accept the current message which is 4,758 bytes long in this case.

The Buffer is allocated according to the constant MBEDTLS_SSL_MAX_CONTENT_LEN in ssl_internal.h based on MBEDTLS_SSL_MAX_CONTENT_LEN which defaults to 16K in the standard package (see ssl.h), but is overridden but the setting in include/user_mbedtls.h which is 4K and hence the package barfs on a 4,758 byte long message.

You can't easily request a maximum returned message size at the HTTP layer (see this StackOverflow discussion). Also a related discussion on this esp8266/Arduino#1375.

So in short your maker.ifttt.com request is returning too much data for your configured buffer size. IMO, this is a bit of an odd thing to do for an IoT integration service. Does the API allow you to fragment / batch returned data in multiple smaller returns? Alternatively can you restructure your request to return less data?

You could try upping the max content size, but 22Kb heap isn't really enough to run a serious Lua app.

This isn't really a bug per se, as the software is doing what it is supposed to do in these circumstances. It is more of an application configuration bumping against system resource constraints of the ESP8266 chipset.

As a codicil to my previous post -- this teaches me to jump into an issue with a point response and not reading through the entire thread. I note that @nwf has upped this setting in some of his tests to 5120. You could try that. But the inherent issue is that the ESP8266 architecture only has 48Kb RAM to play with and you can't fit a quart into a pint pot.

Thanks @TerryE . That is exactly the point I made in your PR #2068. Your work on increasing RAM on the ESP8266 will give us hope that we can push the limits of this module to enable secured connections. Otherwise the ESP8266 is doomed for those of us who care about security.

Thanks for your insights on how to tweak the constants. I will try that. In my case, it would be more fun for me to creatively squeeze my lua application to gain the needed security.

Happy to report that setting MBEDTLS_SSL_MAX_CONTENT_LEN = 5120 did the trick !!!
I now can connect to IFTTT as well as mnubo.com API. Awesome !

I feel like I will have more fun now with the ESP8266. I will definitely keep an eye on this type of error when secured connection fail. I agree with @nwf that there should be a log when we have busted the limit.

In my opinion, if there was a way to change this constant on the fly (or taking effect after a reboot), it will allow people to allocate enough buffer size as per their specific needs. A note in the documentation would indicate the inherent tradeoff by doing that with reduced memory for the lua app.

In my case, it would be more fun for me to creatively squeeze my lua application to gain the needed security.

Before I chucked it all in and became a retired gentleman doing this stuff for pleasure, I used to be a CTO for one of EDS (and the later HP) services divisions, so I am very much aware of the security issues. My approach is somewhat different.

None of my household devices have direct access to the internet. I effectively run two networks in the house: a WIfi service dedicated to IoT devices, and my normal house LAN / Wifi that all of my PCs and mobile devices connect to. The only link between the two is a battery-backed RPi with HDD that runs my MQTT broker and Node Red which is connected to both and locked down. I also have VPN into my house network so that I can connect to it from my tablet / mobile phone whilst away, and I'll be adding a limited portal through my public website in due course.

Directly connecting any IoT device to cloud-based resources and services is intrinsically vulnerable to exploitation and attack, IMO. Security is more than just HTTPS / TLS to external resources.

Totally agree. There are a few scenarios which for simplicity will avoid the need of gateways in the house. Some products sell really well like the Belkin Wemo or the Nest which do not rely on any gateways. I am thinking of some possible commercial opportunities. Having some basic security is better than nothing for non-critical IoT devices. But in the end, it is a cool thing to see a 1$ module that can handle basic security with less than 64K of RAM :)

@dtran123 just to clarify -- is your message above suggesting that the only thing needed to make TLS connections work and solve this months long issue is setting a constant to a higher value? There's no branch or PR that also needs to be merged?

@heythisisnate Nate, there is an underlying issue: if you issue a TLS request to a server which replies with a response larger than your configured MBEDTLS_SSL_MAX_CONTENT_LEN then the receive will barf,
In @dtran123's case, the maker.ifttt.com responses were only slightly larger than the standard 4Kb so upping this meant that his application was no longer exciting this constraint.

If something changes so that in 8 months, say, if ifttt responses grow to being over the 5120 size then his application will start to fail again.

Thanks, I will try this. In my case I have control over the destination server response.

please use the dev-ssl branch from @nwf nodemcu fork (see above for link). I think he did more than just change the value of a constant. I am hoping that we can merge @nwf ssl changes into the dev branch.

Can someone develop a feature to change the MBEDTLS_SSL_MAX_CONTENT_LEN on the fly ? I don't mind if a restart of the module is required (similar to what we did with the adc library for the battery voltage vs adc voltage). I see a parameter in one of the TLS functions to change the max value. This way for the majority of people, they can avoid the hassle of creating a custom build.

I am hoping that we can merge @nwf ssl changes into the dev branch.

@dtran123, a good step on the way to this is for another independent tester -- say you -- to test this branch and can confirm that (a) it doesn't seem to break anything that worth in dev; and (b) solves a specific issue.

We ain't looking for perfect; just stepwise better :wink:

Well...I did further testing to narrow down what changes is absolutely required. Guess what ?? I re-cloned the dev branch from the main nodemcu repo and changed the MBEDTLS_SSL_MAX_CONTENT_LEN to 5120. That was all that is required to make my secured connections work (thingspeak, ifttt, mnubo, mqtt to cloudmqtt server). So in theory, I don't specifically need the other changes in @nwf dev-ssl branch. It is possible @nwf did some other changes for his own specific needs. It looks like only changing the constant could be a lower risk step that only involves increasing the size of the I/O buffers. I will further characterize this scenario and report back. But so far I see that we could merge a change with only the size increase for now. I think that should address 80% of the usecases.

Hello everyone. Thanks for the tests of my dev-ssl branch; I'm glad to see things are better.

I'm currently away from my ESP8266s, but real soon now I'm going to push another version that bumps to mbedtls 2.6.1 and so should fix some CVEs in our TLS layer. I'll carve out PRs for

  • the mbedtls updates and some improvements to mbedtls debugging
  • the bump of MBEDTLS_SSL_MAX_CONTENT_LEN to 5120
  • general tidying around the tree

Those should be non-controversial and, as @dtran123 indicates, are likely to improve matters sufficiently for most people's use cases. That leaves open

  • the possibility of dynamically setting MBEDTLS_SSL_MAX_CONTENT_LEN
  • the requisite work of integrating rtctime/sntp support into mbedtls to better support certificate verification.

and, of course, a set of scripts people can run to test TLS functionality in isolation. It'd be nice to have a list of services people are likely to care about (several of which are mentioned in this issue) and tests for each in various configurations (esp. varying values of MBEDTLS_SSL_MAX_CONTENT_LEN).

These last three are probably non-blocking.

  • People can do their own testing as it stands.
  • It should be OK to disable timestamp verification, as most people probably know what certs, or at least, root certs, they are expecting to see, if they turn on verification. (We should probably scream if they leave it off, tho', as there's minimal utility in such an unverified TLS connection.)
  • 5120 is probably a more generally tolerable default than we have now and I don't think I'd want it to be much bigger anyway.

BTW, so far, I have been doing all my tests without certificates. I wanted to validate this first as a basis. Then, I will test with certificates.

How many web servers today support the max fragment length extension? Once it becomes widespread, then using 4k buffers should just work.

Support for this extension has just been merged to master for openssl, so there is more hope now.

Maybe pressure should be exerted upon the IoT server people (like IFTTT) to use a stack that supports the max fragment length extension.

[I'm assuming that MBEDTLS does actually request this]

So... I don't want to hassle you guys, but is this any closer to a resolution? I can make HTTPS requests on the 1.5.4.1 branch, but not master/dev, and the issue has been open nearly a year.

If there's no solution in sight I'll plan to use 1.5.4.1, otherwise I'll wait for a fix for this - I'm only wanting to receive < 1KB of JSON data, from a server which only talks HTTPS.

@kieranc HTTPS works in dev and master branch but you may need to tweak MBEDTLS_SSL_MAX_CONTENT_LEN (via custom build) depending on which server you connect to (e.g. IFTTT server requires higher values). For MQTT connections, make sure you setup certificates on your mqtt server. This has solved some of my bad secured connections to mqtt.

@dtran123 I get my builds from nodemcu-build.com so I can't tweak much, I'll stick with 1.5.4.1 for the time being. Thanks.

@kieranc Getting setup with the docker build tool (https://hub.docker.com/r/marcelstoer/nodemcu-build/) was pretty straightforward (took me 30 mins). I think it is a good investment of time to learn it since it comes in handy in cases like this.

@dtran123 I understand, if it were just for me I'd set up a build environment, but I'm trying to make a simple process for others to follow to recreate a project, and 'learn to use docker' doesn't really fit in the mandate. Plus, 1.5.4.1 works fine for my needs.

I finally had a chance to try this again. I tested bumping the MBEDTLS_SSL_MAX_CONTENT_LEN value to 5120 as suggested. This _might_ be helping because the test request is successful, but not before spewing two errors and about a 1-second delay.

Here's a very simple example (tested both on dev and master) with the same results:

=http.get("https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/latest", {}, function(code,data) print(code, data) end)
> HTTP client: Disconnected with error: 8
HTTP client: Connection timeout
200 {"url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562","assets_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets","upload_url":"https://uploads.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets{?name,label}","html_url":"https://github.com/nodemcu/nodemcu-firmware/releases/tag/2.1.0-master_20170824","id":7514562,"tag_name":"2.1.0-master_20170824","target_commitish":"master","name":"2.1.0-master_20170824","draft":false,"author":{"login":"marcelstoer","id":624195,"avatar_url":"https://avatars0.githubusercontent.com/u/624195?v=4","gravatar_id":"","url":"https://api.github.com/users/marcelstoer","html_url":"https://github.com/marcelstoer","followers_url":"https://api.github.com/users/marcelstoer/followers","following_url":"https://api.github.com/users/marcelstoer/following{/other_user}","gists_url":"https://api.github.com/users/marcelstoer/gists{/gist_id}","starred_url":"https://api.github.com/users/marcelstoer/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/marcelstoer/subscriptions","organizations_url":"https://api.github.com/users/marcelstoer/orgs","repos_url":"https://api.github.com/users/marcelstoer/repos","events_url":"https://api.github.com/users/marcelstoer/events{/privacy}","received_events_url":"https://api.github.com/users/marcelstoer/received_events","type":"User","site_admin":false},"prerelease":false,"created_at":"2017-08-24T19:26:33Z","published_at":"2017-08-24T19:32:27Z","assets":[],"tarball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/tarball/2.1.0-master_20170824","zipball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/zipball/2.1.0-master_20170824","body":"Primarily a patch release to fix a fundamental bug in the cron module.\r\n- Fix for #2080\r\n- New native module for DS18B20, #2003 \r\n- Rewrite of the Lua DS18B20 module, #1996 \r\n\r\nTake a look at the ['2.1 follow-up patch' milestone](https://github.com/nodemcu/nodemcu-firmware/milestone/7) for the full list of all issues included in this release."}

As you can see, after issuing the http.get, I always get the HTTP client: Disconnected with error: 8 error. Then, after about a second pause, the request is successful and the callback is executed. I see that @FrankX0 observed the same in testing #2214 (related to this issue).

I'm able to reproduce the same behavior with several other API hosts (not just github.com).

Anyone have an idea of what this HTTP client: Disconnected with error: 8 really means and where it is coming from. I'm confused about what the error means when the request is ultimately successful.

Anyone have an idea of what this HTTP client: Disconnected with error: 8 really means and where it is coming from.

Well, it originates from httclient's http_error_callback() that's invoked as the reconnection callback. The error code 8 is not a defined value/parameter in espconn.h and is probably nonsense, I presume.

Out of curiosity I ported the espconn apps for lwip and mbedtls from SDK's third_party in master. @heythisisnate's testcase passes smoothly now:

http.get("https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/latest", {}, function(code,data) print(code, data) end)
> 200   {"url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562","assets_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets","upload_url":"https://uploads.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets{?name,label}","html_url":"https://github.com/nodemcu/nodemcu-firmware/releases/tag/2.1.0-master_20170824","id":7514562,"tag_name":"2.1.0-master_20170824","target_commitish":"master","name":"2.1.0-master_20170824","draft":false,"author":{"login":"marcelstoer","id":624195,"avatar_url":"https://avatars0.githubusercontent.com/u/624195?v=4","gravatar_id":"","url":"https://api.github.com/users/marcelstoer","html_url":"https://github.com/marcelstoer","followers_url":"https://api.github.com/users/marcelstoer/followers","following_url":"https://api.github.com/users/marcelstoer/following{/other_user}","gists_url":"https://api.github.com/users/marcelstoer/gists{/gist_id}","starred_url":"https://api.github.com/users/marcelstoer/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/marcelstoer/subscriptions","organizations_url":"https://api.github.com/users/marcelstoer/orgs","repos_url":"https://api.github.com/users/marcelstoer/repos","events_url":"https://api.github.com/users/marcelstoer/events{/privacy}","received_events_url":"https://api.github.com/users/marcelstoer/received_events","type":"User","site_admin":false},"prerelease":false,"created_at":"2017-08-24T19:26:33Z","published_at":"2017-08-24T19:32:27Z","assets":[],"tarball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/tarball/2.1.0-master_20170824","zipball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/zipball/2.1.0-master_20170824","body":"Primarily a patch release to fix a fundamental bug in the cron module.\r\n- Fix for #2080\r\n- New native module for DS18B20, #2003 \r\n- Rewrite of the Lua DS18B20 module, #1996 \r\n\r\nTake a look at the ['2.1 follow-up patch' milestone](https://github.com/nodemcu/nodemcu-firmware/milestone/7) for the full list of all issues included in this release."}

That might be a good indication - I haven't done any further tests, though.
If anybody is interested: you'll find this quick&dirty hack at https://github.com/devsaurus/nodemcu-firmware/tree/sdk_v2.2_preview. For convenience the http module is already enabled and MBEDTLS_SSL_MAX_CONTENT_LEN bumped to 5120. Start make from a clean source tree.

I did some testing with the examples mentioned in this issue. There appears to be no regression for http, just the HTTP client: Disconnected with error: 8 is fixed.
I'll start a dedicated PR for this update as it improves things and is independent from any SDK release.

Can anyone else reproduce this failure? A http.getl that was working fine before on 1.5.4.1 is now not working at all. I can't connect to any Github api from the NodeMCU. For example this request that worked fine before now is not:

http.get("https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/latest", {}, function(code,data) print(code, data) end)
> -1 nil

I can't figure out what changed. Is it just me?

@heythisisnate What version, exactly, are you testing against? If you're able to produce your own images, debug information would likely be illuminating.

@nwf I'm on branch 1.5.4.1-final. I rebuilt the fw with debug enabled. Here's the error:

=http.get("https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/latest", nil, function(code,data) print(code, data) end)
HTTP client: hostname=api.github.com
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/repos/nodemcu/nodemcu-firmware/releases/latest
HTTP client: DNS request
HTTP client: DNS pending
> HTTP client: DNS found api.github.com 192.30.253.117
net_socket_received is called.
client handshake start.
client handshake failed
Error: SSL error 70
HTTP client: Disconnected
http_status=-1
-1  nil

It looks like Error: SSL error 70 is the important part here. I wonder if something changed on Github's infrastructure. I don't think that this is related directly to the topic of this issue, but it's interesting because the Github API was one of our test cases above and it was working fine on 1.5.4.1 until recently.

On the current dev build it is working though:

200 {"url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562","assets_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets","upload_url":"https://uploads.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets{?name,label}","html_url":"https://github.com/nodemcu/nodemcu-firmware/releases/tag/2.1.0-master_20170824","id":7514562,"tag_name":"2.1.0-master_20170824","target_commitish":"master","name":"2.1.0-master_20170824","draft":false,"author":{"login":"marcelstoer","id":624195,"avatar_url":"https://avatars0.githubusercontent.com/u/624195?v=4","gravatar_id":"","url":"https://api.github.com/users/marcelstoer","html_url":"https://github.com/marcelstoer","followers_url":"https://api.github.com/users/marcelstoer/followers","following_url":"https://api.github.com/users/marcelstoer/following{/other_user}","gists_url":"https://api.github.com/users/marcelstoer/gists{/gist_id}","starred_url":"https://api.github.com/users/marcelstoer/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/marcelstoer/subscriptions","organizations_url":"https://api.github.com/users/marcelstoer/orgs","repos_url":"https://api.github.com/users/marcelstoer/repos","events_url":"https://api.github.com/users/marcelstoer/events{/privacy}","received_events_url":"https://api.github.com/users/marcelstoer/received_events","type":"User","site_admin":false},"prerelease":false,"created_at":"2017-08-24T19:26:33Z","published_at":"2017-08-24T19:32:27Z","assets":[],"tarball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/tarball/2.1.0-master_20170824","zipball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/zipball/2.1.0-master_20170824","body":"Primarily a patch release to fix a fundamental bug in the cron module.\r\n- Fix for #2080\r\n- New native module for DS18B20, #2003 \r\n- Rewrite of the Lua DS18B20 module, #1996 \r\n\r\nTake a look at the ['2.1 follow-up patch' milestone](https://github.com/nodemcu/nodemcu-firmware/milestone/7) for the full list of all issues included in this release."}

@heythisisnate If you're experiencing issues with 1.5.4.1-final, please open a new ticket. This one is for the dev branch.

@FrankX0 it is _kinda_ working on the dev branch ... returns successfully, but not until after a Disconnected with error: 8 message and a timeout. This is still what's holding me back from moving to the dev branch.

=http.get("https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/latest", nil, function(code,data) print(code, data) end)
HTTP client: hostname=api.github.com
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/repos/nodemcu/nodemcu-firmware/releases/latest
HTTP client: DNS request
HTTP client: DNS pending
> HTTP client: DNS found api.github.com 192.30.253.117
client handshake start.
please start sntp first !
please start sntp first !
client handshake ok!
HTTP client: Connected
HTTP client: Sending request header
HTTP client: All sent
Reason:[-0x7880]
HTTP client: Disconnected with error: 8
HTTP client: Connection timeout
HTTP client: Calling disconnect
HTTP client: manually Calling disconnect callback due to error -12
HTTP client: Disconnected
http_status=200
strlen(full_response)=3123
response={"url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562","assets_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets","upload_url":"https://uploads.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets{?name,label}","html_url":"https://github.com/nodemcu/nodemcu-firmware/releases/tag/2.1.0-master_20170824","id":7514562,"tag_name":"2.1.0-master_20170824","target_commitish":"master","name":"2.1.0-master_20170824","draft":false,"author":{"login":"marcelstoer","id":624195,"avatar_url":"https://avatars0.githubusercontent.com/u/624195?v=4","gravatar_id":"","url":"https://api.github.com/users/marcelstoer","html_url":"https://github.com/marcelstoer","followers_url":"https://api.github.com/users/marcelstoer/followers","following_url":"https://api.github.com/users/marcelstoer/following{/other_user}","gists_url":"https://api.github.com/users/marcelstoer/gists{/gist_id}","starred_url":"https://api.github.com/users/marcelstoer/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/marcelstoer/subscriptions","organizations_url":"https://api.github.com/users/marcelstoer/orgs","repos_url":"https://api.github.com/users/marcelstoer/repos","events_url":"https://api.github.com/users/marcelstoer/events{/privacy}","received_events_url":"https://api.github.com/users/marcelstoer/received_events","type":"User","site_admin":false},"prerelease":false,"created_at":"2017-08-24T19:26:33Z","published_at":"2017-08-24T19:32:27Z","assets":[],"tarball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/tarball/2.1.0-master_20170824","zipball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/zipball/2.1.0-master_20170824","body":"Primarily a patch release to fix a fundamental bug in the cron module.\r\n- Fix for #2080\r\n- New native module for DS18B20, #2003 \r\n- Rewrite of the Lua DS18B20 module, #1996 \r\n\r\nTake a look at the ['2.1 follow-up patch' milestone](https://github.com/nodemcu/nodemcu-firmware/milestone/7) for the full list of all issues included in this release."}<EOF>
200 {"url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562","assets_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets","upload_url":"https://uploads.github.com/repos/nodemcu/nodemcu-firmware/releases/7514562/assets{?name,label}","html_url":"https://github.com/nodemcu/nodemcu-firmware/releases/tag/2.1.0-master_20170824","id":7514562,"tag_name":"2.1.0-master_20170824","target_commitish":"master","name":"2.1.0-master_20170824","draft":false,"author":{"login":"marcelstoer","id":624195,"avatar_url":"https://avatars0.githubusercontent.com/u/624195?v=4","gravatar_id":"","url":"https://api.github.com/users/marcelstoer","html_url":"https://github.com/marcelstoer","followers_url":"https://api.github.com/users/marcelstoer/followers","following_url":"https://api.github.com/users/marcelstoer/following{/other_user}","gists_url":"https://api.github.com/users/marcelstoer/gists{/gist_id}","starred_url":"https://api.github.com/users/marcelstoer/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/marcelstoer/subscriptions","organizations_url":"https://api.github.com/users/marcelstoer/orgs","repos_url":"https://api.github.com/users/marcelstoer/repos","events_url":"https://api.github.com/users/marcelstoer/events{/privacy}","received_events_url":"https://api.github.com/users/marcelstoer/received_events","type":"User","site_admin":false},"prerelease":false,"created_at":"2017-08-24T19:26:33Z","published_at":"2017-08-24T19:32:27Z","assets":[],"tarball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/tarball/2.1.0-master_20170824","zipball_url":"https://api.github.com/repos/nodemcu/nodemcu-firmware/zipball/2.1.0-master_20170824","body":"Primarily a patch release to fix a fundamental bug in the cron module.\r\n- Fix for #2080\r\n- New native module for DS18B20, #2003 \r\n- Rewrite of the Lua DS18B20 module, #1996 \r\n\r\nTake a look at the ['2.1 follow-up patch' milestone](https://github.com/nodemcu/nodemcu-firmware/milestone/7) for the full list of all issues included in this release."}

@heythisisnate IIRC then I tried your command on #2269 and the Disconnected with error: 8 is gone with this PR.

Ok awesome. I'm cheering for #2269 to be merged soon then so we can put this issue to rest. As of now my program that uses the Github API isn't working properly on either 1.5.4.1 or dev 😢
Is there anything I can do to help with that?

  • NodeMCU 2.1.0 built with Docker provided by frightanic.com
  • branch: dev
  • SSL: true

If the browser accesses

https://api.telegram.org/bot446398269:AAF7jqBWZhj9oJlWT2f8mCS8_X5_LxJGLRA/sendMessage?chat_id=211698604&text=Hello!

then the response comes quickly.

If the request from NodeMCU

http.get(text_req, nil, function(code, data)
                if (code < 0) then 
                    print("telegram bot failed")
                    print(code, data) 
                else 
                    print(code, data) 
                end
            end) 

then

HTTP client: hostname=api.telegram.org
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/bot446398269:AAF7jqBWZhj9oJlWT2f8mCS8_X5_LxJGLRA/sendMessage?chat_id=211698604&text=Hello!
HTTP client: DNS request
HTTP client: DNS pending
HTTP client: DNS found api.telegram.org 149.154.167.220
HTTP client: Connection timeout
HTTP client: Calling disconnect
HTTP client: manually Calling disconnect callback due to error -12
HTTP client: Disconnected

How to search for this error?

Just a though, but looking the code, it does a dns resolve in the loop so to speak, and if this takes too long then your client might timeout. However, the esconn stack caches the last 4 resolutions so have you tried doing a

  net.dns.resolve('api.telegram.org', funcToDoTheGet)

and that way the DNS name is resolved and cached before you start the HTTP dialogue.

@AlexSmok this is the exact same issue I'm seeing. Allegedly it's fixed in SDK 2.2 which is waiting on this PR: #2269

@TerryE This way the same does not work

@AlexSmok, @heythisisnate : I can confirm that the above example is indeed fixed by #2269.

HTTP client: hostname=api.telegram.org
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/bot446398269:AAF7jqBWZhj9oJlWT2f8mCS8_X5_LxJGLRA/sendMessage?chat_id=211698604&text=Hello!
HTTP client: DNS request
HTTP client: DNS pending
HTTP client: DNS found api.telegram.org 149.154.167.220
client handshake start.
client handshake ok!
HTTP client: Connected
HTTP client: Sending request header
HTTP client: All sent
Reason:[-0x7880]
HTTP client: Disconnected
http_status=200
strlen(full_response)=698
response={"ok":true,"result":{"message_id":91,"from":{"id":446398269,"is_bot":true,"first_name":"\u041c\u043e\u0438 \u0434\u0430\u0442\u0447\u0438\u043a\u0438","username":"smok_sensors_bot"},"chat":{"id":211698604,"first_name":"Alex","last_name":"Smok","username":"alexsmok","type":"private"},"date":1521057955,"text":"Hello!"}}<EOF>

This will again make things better, but there are still secure websites which cannot be accessed correctly.
Example:

HTTP client: hostname=nodemcu-build.com
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/
HTTP client: DNS request
HTTP client: DNS pending
HTTP client: DNS found nodemcu-build.com 217.26.52.40
client handshake start.
client handshake ok!
HTTP client: Connected
HTTP client: Sending request header
HTTP client: All sent
client's data invalid protocol
Reason:[-0x7200]
HTTP client: Disconnected with error: -114
HTTP client: Connection timeout
HTTP client: Calling disconnect
HTTP client: manually Calling disconnect callback due to error -12
HTTP client: Disconnected
HTTP client: Chunk Size:8192
HTTP client: Chunk Size:0

This even results in a reset of the device.

0x7200 is, in particular, MBEDTLS_ERR_SSL_INVALID_RECORD and is the return code when a TLS message is longer than the buffer allocated. See the chatter above around MBEDTLS_SSL_MAX_CONTENT_LEN . It would be great if you could crank up the SSL debug level as well, though, to confirm.

The reset of the device seems more likely a bug in the http module than anything SSL-specific. The 8192 and 0 chunk sizes sure seem suspicious.

Strange.

If I call the

http.get(text_req, nil, function(code, data)
                if (code < 0) then 
                    print("telegram bot failed")
                    print(code, data) 
                else 
                    print(code, data) 
                end
            end) 

code before the http server starts (Immediately after receiving the IP address.) , then the code works like this:

HTTP client: hostname=api.telegram.org
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/bot446398269:AAF7jqBWZhj9oJlWT2f8mCS8_X5_LxJGLRA/sendMessage?chat_id=211698604&text=Hello!
HTTP client: DNS request
HTTP client: DNS pending
HTTP client: DNS found api.telegram.org 149.154.167.220
client handshake start.
please start sntp first !
please start sntp first !
client handshake ok!
HTTP client: Connected
HTTP client: Sending request header
HTTP client: Sending request body
Reason:[-0x7880]
HTTP client: Disconnected with error: 8
HTTP client: Connection timeout
HTTP client: Calling disconnect
HTTP client: manually Calling disconnect callback due to error -12
HTTP client: Disconnected
http_status=200
strlen(full_response)=699
response={"ok":true,"result":{"message_id":102,"from":{......

If after the start of the HTTP server, I get a timeout.

UPD.
Http server is not to blame. I wrote the script:
Immediately after receiving the ip

tmr.create (): alarm (7000, tmr.ALARM_AUTO, telegram)

  and watched.

I did not notice any dependence.

The code either works many times in a row or a timeout.

@nwf: the 0x7200 error when accessing nodemcu-build.com is indeed caused by the ssl buffer being too small. Increasing it to a minimum of 8192, results in (without resetting):

HTTP client: hostname=nodemcu-build.com
HTTP client: port=443
HTTP client: method=GET
HTTP client: path=/
HTTP client: DNS request
HTTP client: DNS pending
> HTTP client: DNS found nodemcu-build.com 217.26.52.40
client handshake start.
client handshake ok!
HTTP client: Connected
HTTP client: Sending request header
HTTP client: All sent
E:M 6320
HTTP client: Response too long (6311)
E:M 6320
HTTP client: Response too long (6311)
E:M 5752
HTTP client: Response too long (5743)
client's data invalid protocol
Reason:[-0x45]
HTTP client: Disconnected with error: -69
HTTP client: Connection timeout
HTTP client: Calling disconnect
HTTP client: manually Calling disconnect callback due to error -12
HTTP client: Disconnected
http_status=-1

But I guess this shows that the limited amount of RAM is too small for this website (sorry @marcelstoer).

Maybe now is the time to close this topic and merge SDK 2.2?

One addition.
I encountered a site which also results in a 0x7200 error, but this is due to a minor version mismatch (with debugging enabled). The version reported: [3:4]. Does this suggest TLS v1.3?
If this is true, how can I set the maximum version supported to v1.2?

I have no idea how to pin the maximum version... I would have guessed that version negotiation would have done the right thing here. This suggests something broken about mbedTLS, espressif's use of it, or the remote server. What is the remote, speaking of? It may be useful to point https://www.ssllabs.com/ssltest/ at it and see what it says?

Please open a separate bug for the device reset induced by the http client.

I would be in favor of this particular bug being closed; I think it has outlived its usefulness.

I got confirmed by the developers of mbed TLS that the issue I found (minor version mismatch) is due to a problem in de client. So my assumption is that somehow the received data is malformed.
I would like to dig into this further, but to do so I need te get access to the data actually received by the ESP8266.
Is there some debug option I can enable so I can get access to the network data received during setting up of the secure connection?

Frank the LFS patch will be out in a week or so, after our next staging of dev->`master, and it inculdes an enhanced remote gebug facility. This gist gives you an idea of the sorts of things that you can do with this. Basically its the standard GDB debugger but with one arm chopped off: annoying but usable. In your case debugging set up stacks, you will need to up the timeouts into minutes.

It's either that or lots of lua_debug() print statements through the code.

@FrankX0 tcpdump is always available and not ESP-specific, if all you need is the TLS records and encrypted payloads (which, I suspect, is enough for this case). There's also, after #2267, tls.setDebug() to engage the existing mbedTLS debug infrastructure. If you call tls.setDebug(4), you should get really quite verbose chatter about the library's operation, which should include the raw bytes exchanged over the wire as well as information of utility to mbedTLS upstream.

Some issues are fixed with SDK 2.2, some can't be fix on this platform and for all others there should be dedicated issues -> closing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

liuyang77886 picture liuyang77886  ·  5Comments

vsky279 picture vsky279  ·  7Comments

dicer picture dicer  ·  6Comments

marcelstoer picture marcelstoer  ·  4Comments

djphoenix picture djphoenix  ·  3Comments