Nodemcu-firmware: dangling pointer bug in uzlib_deflate.c causes lua.cross to sometimes crash

Created on 29 Jan 2019  路  9Comments  路  Source: nodemcu/nodemcu-firmware

Expected behavior

not crash, generate a flash image

Actual behavior

crashes, leaving a 0-length flash image file

Test code

lua.cross.exe -f -m 0x20000 -o lfs.img _init.lua dummy_strings.lua inspect.lua kosbo_lfs.lua

I have already debugged the code and have a resolution posted below in the Details section

NodeMCU version

master
HEAD: 11592951b90707cdcb6d751876170bf4da82850d
Merge pull request #2582 from nodemcu/dev
Arnim L盲uger
Friday, December 07, 2018 3:47:49 PM

Hardware

I am using a NodeMCU-dev board, rev 0.9, but that is not relevant for lua.cross

Details

appuzlibuzlib_deflate.c:571, a buffer is freed:
FREE(dynamicTables);

However, there are pointers into this buffer that are left dangling. Specifically, oBuf, which is used at line 575.

You can see the memory layout as it was allocated at line 221, set into dynamicTables at line 217, and where oBuf was set to point within that allocated block at line 224.

To resolve this, I would move the FREE at line 571 to after the control block at 574-582. (i.e., just before returning from the function.)

Since the Uzlib seems to be third party code, it might be useful to see if it was already fixed upstream.

This bug may be unnoticed because the freed memory block is being accessed quickly after freeing, and it's contents have likely not been damaged yet. However it was crashing reliably for me in debug builds, because the debug malloc on MSVC blitzes freed blocks with a pattern. In release builds, it usually ran OK, but occasionally did not.

Most helpful comment

Okeydoke. I'll do the following:

  • I'll make a fork

  • set up one branch that is for this specific bug only. It's the proverbial 'one-line fix'. I think this has merit for inclusion on it's own because it is a fundamental bug that affects all toolchains. I'll generate a PR from that branch, and that PR should resolve this issue #2632

  • I'll set up a separate branch for the MSVC stuff. I will improve beyond my current work with appropriate #ifdef around MSVC-specific stuff. I just did a diff, and it looks like there is minor surgery in nine files. I hadn't built spiffsimg yet, since it turned out that I didn't need to, but I'll go for that as well. That one looks even easier, anyway. I'll do a separate PR for that stuff.

That way we can get the bugfix for this issue in, and then separately if y'all think the MSVC support is too much, then you can just leave that in a branch or reject it or whatever.

My motivation for doing the msvc build is simply that the only other tool I needed was the luac for the LFS stuff. All the other stuff I can do through the online build system. Since I had msvc on-hand, and because I did not yet know about the mingw alternative, I decided to give it a go. It might be useful to someone else. Incidentally, for the curious: the free MS Visual Studio Community Edition is fully capable.

All 9 comments

@ziggurat29 thanks for this but i am bit confused by your explanation. The struct dynTables *dynamicTables is used to contain the deflate huffman decode tables and constants. The malloc at L215 allocates space for it and the dt->hashTable and dt->hashChain vectors and the storage is as you say freed at L571, but none of these dynamic tables are used between L572 and the return at L584; specifically oBuf->buffer is a separate output byte array, so the order of these frees (or re-ordering them) should not change the execution.

If you are getting crashes as you suggest then IMO a more probable underlying issue is that I am walking of the end of tables and invalidly stomping out of scope RAM. Though I also note that you say that you are using MSVC which isn't supported for firmware builds, so are you using uzlib for your own application? We got no issues with this as the code is released under an open-source licence, but this does also add other possible failure modes sich as the out-of-bounds issue being in your application code.

Incidentally pieces of the code are taken from or based on the attributed sources, but a large part is a fresh implementation to work within ESP8266 Ram constraints

@TerryE Terry from what I can see @ziggurat29 is has ported lua.cross to native windows without mingw.
I would very much like to get a copy of that if not the project file to be able to build my own version.

I looked at the code myself and found that
in line 216 the memory for oBuf and dynTables is allocated together with dynTables hashChain and hashTable.

in line 572 it is freed altogether again.
oBuf is the accessed below as stated by @ziggurat29 .

@ziggurat29 does it work if you move the line as you suggested?
Maybe create a pull request if it works. If you have problems doing so I will assist you.

@TerryE Terry from what I can see @ziggurat29 is has ported lua.cross to native windows without mingw.

Using Cygwin was trivial. Supporting mingw would involve a bit of work but is doable and would produce a native windows executable. Using native MSVC is quite a lot more work because of the intrinsic incompatibilities resulting from the lack of decent POSIX support and needing to use the native Windows APIs. Not for the faint-hearted, IMO.

I apologize for not having visited this issue for the past week, and for responding so late. My recollection of the problem has since been paged out, but I think I can do a more detailed walk-though.

Looking at the code, you can see that at about line 224:

  oBuf          = (struct outputBuf *)(dt+1);

so, oBuf is pointing /into/ the dt chunk. Also note, dt is an alias of dynamicTables.

Later, around line 571 we do:

  FREE(dynamicTables);

This obviously releases the entire malloc'd block that was assigned to dynamicTables, but recall that single block includes several contiguous objects, especially oBuf which is _pointing into_ that block -- it's not a separately allocated object. So when dynamicTables is freed, oBuf is implicitly freed as well. So now it is now pointing into a free'ed memory block. If no one touches oBuf anymore it's not a problem, but we do touch it several times right away. E.g. a few lines later:

    uchar *trimBuf = realloc(oBuf->buffer, oBuf->len);

(and other places).
If the free'ed block has not been altered, then those things will still work (even though oBuf refers to freed memory), but if the block /has/ been altered, then there will be problems because things like

  oBuf->buffer and oBuf->len

will contain junk.

On MSVC, if you do a debug build, then all the mallocs and frees cause the memory to be filled with a pattern. This is to make these sorts of issues more obvious (it really is helpful). In the case of MSVC, a free() will cause the memory to first be overwritten with 0xdd before doing the actual free(). In the case of the code mentioend, that will cause, say,

    uchar *trimBuf = realloc(oBuf->buffer, oBuf->len);

to resolve to:

  uchar *trimBuf = realloc(0xdddddddd, 0xdddddddd);

and so crashing ensues.

If you need it, I can revert my change and stimulate a crash and give you the stack trace, but I think it is already apparent that all the cases of oBuf being used after the FREE(dynamicTables) are instances of accessing memory after it has been free'ed, and that moving the FREE(dynamicTables) until /after/ those statements have executed will resolve the issue.

@HHHartmann , yes, I built it with MSVC. You are welcome to the modified project if you want it. Aside from setting up the .vcxproj file, I did have to modify some source for all the compiler-specific stuff; e.g. changing __attribute() to functionally equivalent __declspec() stuff, and a little linker legerdemain to produce a working IN_RODATA_AREA(). It really wasn't _too_ bad, but it did involve maybe 5 source files. Presently, these changes are not wrapped in #ifdefs as I normally would if I were to submit the changes for inclusion in the main codebase, but I think those sort of improvements could be done with ease.

At any rate, the changes needed for MSVC support of luac.cross wound up being 'simple' largely because luac.cross itself happens to be a simple application. If it were a more complex application then it might have been more of a mess, but MSVC can handle things like printf, fopen, and longjmp just fine. Accommodating the linker for placement of the rotables was maybe the most tricky, but that's intrinsically a toolchain-specific concern and doesn't have anything to do with libc or OS's.

As @TerryE mentioned, there is now a Cygwin build option. I was unaware of that option until _after_ I had gone through the effort of building on MSVC (lol, it pays to read more, but isn't that a halting problem?). However, this bug as filed is a fundamental one regardless of toolchain, runtime libs, or OS. I believe it's effect has been masked so far simply due to peculiarities of the current toolchain's libc implementation, hence why it became visible when I used a different one (from MSVC). The bug is deterministically fixed by moving the 'free' as I described in the original post.

@ziggurat29, If you do an issues search (is:issue luac in:title), then you will see that I am very much aware of this gap, and there is an open issue on the stufff that you are offering: #2315.

Most of the active developers here do so in a Posix environment. In my case I used to be actively involved with MS tools and have been to Redmond a few times, but I stopped using MS stuff including WinX when I retired early from HP. I have an old laptop which still dual-boots into Win7, but I am not willing to pay for MSVC out of my own pocket, when I regard the free GNU toolchain as superior for my needs.

This all underlines to me that you are bringing what seems to be rare skills to the project -- being an active MS developer who is willing to work on and contribute to the project, so I welcome and will support your contribution.

What I ask is that you fork our repo on github and commit your version to your fork, so that I can review and integrate your suggested changes. Whilst it is too problematic adding Windows build support for the entire project, having this for the limited host-based tools such as luac and spiffsimg is highly desirable, IMO.

We can talk about this further by email (my mailbox using my userid at apache.org still works) or on our gitter thread.

PS. I've been laid up for this last week with a _horrible_ respiratory virus which makes me feel sh*t. My head hasn't been clear enough to work on the project, but I am on the mend now and will try to catch up.

Okeydoke. I'll do the following:

  • I'll make a fork

  • set up one branch that is for this specific bug only. It's the proverbial 'one-line fix'. I think this has merit for inclusion on it's own because it is a fundamental bug that affects all toolchains. I'll generate a PR from that branch, and that PR should resolve this issue #2632

  • I'll set up a separate branch for the MSVC stuff. I will improve beyond my current work with appropriate #ifdef around MSVC-specific stuff. I just did a diff, and it looks like there is minor surgery in nine files. I hadn't built spiffsimg yet, since it turned out that I didn't need to, but I'll go for that as well. That one looks even easier, anyway. I'll do a separate PR for that stuff.

That way we can get the bugfix for this issue in, and then separately if y'all think the MSVC support is too much, then you can just leave that in a branch or reject it or whatever.

My motivation for doing the msvc build is simply that the only other tool I needed was the luac for the LFS stuff. All the other stuff I can do through the online build system. Since I had msvc on-hand, and because I did not yet know about the mingw alternative, I decided to give it a go. It might be useful to someone else. Incidentally, for the curious: the free MS Visual Studio Community Edition is fully capable.

OK, PR #2661 is in to resolve this problem.

Also, I submitted PR #2665 that has all the MSVC stuff for luac.cross. I didn't know if it was appropriate for me to file an enhancement issue to associate with the PR, so I didn't. It is dependent on this fix being in-place, though.

spiffsimg turns out to be more work than expected, so I will submit that PR separately when it is working. I'm not sure why one needs spiffsimg as a _user_ of nodemcu (luac.cross is critical for LFS usage), but I don't mind adapting it, too.

I'm not sure why one needs spiffsimg as a _user_ of nodemcu

If you want to "factory install" NodeMCU + Lua app, then using SPIFFSimg makes this simpler, as no tty interactive session is needed.

I'll take a look at the MS stuff tomorrow.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

HHHartmann picture HHHartmann  路  7Comments

sivix picture sivix  路  4Comments

vsky279 picture vsky279  路  7Comments

ShAzmoodeh picture ShAzmoodeh  路  6Comments

dicer picture dicer  路  6Comments