I am using ESP.getSketchMD5() to manage software releases.
I have noticed that if I use the same Arduino IDE, but in a different place in the filesystem the sketch MD5 is different.
I have used a portable version of the IDE to be sure that the same files are used.
For example loading a sketch that prints MD5 shows this behaviour.
Is this a known thing?
void setup() {
Serial.begin(115200);
Serial.println();
Serial.println( ESP.getSketchMD5());
}
void loop() {
}
using the strings command on a compiled binary shows some occurrence of the source path of the core, it would be nice if this could be avoided.
If it's about __FILE__ gcc macro related, then I'd like to have to solution too.
There's a GCC patch from Debian which can be used for this purpose:
https://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20170814/009124.html
On Thu, Mar 15, 2018, 20:03 david gauchard notifications@github.com wrote:
If it's about __FILE__ gcc macro related, then I'd like to have to
solution too.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/esp8266/Arduino/issues/4520#issuecomment-373353465,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEJcei-jH7x29EYYgMGHi63B5UGJ0Ol2ks5telilgaJpZM4Sr5bq
.
The use of ESP.getSketchMD5() is interesting to me as well, so after running strings on my binaries I saw the long absolute paths being stored. Since I compile my projects on Linux using makeEspArduino.mk makefile instead of the Arduino IDE, it was trivial to edit my config.mk from:
ESP_ROOT=/home/user/Arduino/hardware/esp8266com/esp8266
LIBS=~/Arduino/hardware/esp8266com/esp8266/libraries/ ~/Arduino/libraries/
to be:
ESP_ROOT=../hardware/esp8266com/esp8266
LIBS=../hardware/esp8266com/esp8266/libraries/ ../libraries/
Then, I compiled my current project on two different machines where the working directory is different, and the resulting md5sums matched. That's cool!
However, the more interesting result for me was that my binary size was now smaller:
from:
Ram: 61220 bytes
Flash: 375172 bytes
to:
Ram: 61096 bytes
Flash: 374200 bytes
I'm always working to save a byte here and there, so this was a nice gift for very little effort.
Edit: And yes, running strings now shows all the paths stored are relative.
@igrr thanks for the interesting reading !
While it should be feasible to backport it to (xtensa-)gcc-4.8.2 (at which cost?), @Eszartek's solution with makeEspArduino.mk seems very good at little cost.
Not to threadjack, but if we do rebuild gcc please add in the "#define JUMP_TABLES_IN_TEXT 1" change, too, to save extra heap at the same time.
Where is the actual patch referenced in the email thread? I can't seem to find the changes themselves.
In my comment just above yours I put a link to the changes
@d-a-v The links themselves go to the mailing list archive, which talks about the patch. But where are the actual patch files they proposed? I'm not seeing any link to it.
@earlephilhower in the bottom of the link in my comment, reproduced here:
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html
this is the link to the 3rd patch containing the __FILE__ management gcc patch.
Thanks, @d-a-v . I saw the links but for some reason didn't see the actual changes. I'll see if they can apply to the GCC toolchain I think we're using now: https://github.com/jcmvbkbc/gcc-xtensa .
There are several branches on this repo.
I guess there is a good reason, still I don't know at what cost we could have -flto.
@d-a-v -flto, unfortunately, is not only a matter of rebuilding the toolchain. For simple programs it works pretty well, but for larger programs where you assign certain function to specific sections (like, put some stuff into IRAM and some into flash), LTO has problems tracking the correct sections to use.
I can't quantify flto benefit. I guess it could be huge especially for us.
Do you mean that the section-mixing flto issue is always true ?
Or is there a way to selectively use this option at the cost of splitting source files by hand to isolate and group objects from the same section together ?
FWIW when testing the JUMP_TABLES_IN_ROM setting I was able to build and replace all the gcc executables dir using branch call0-4.8.2. Everything "just worked."
Whether that is, or is not, the branch being distributed I can't really say.
However, if you wanted to do FLTO tests, @d-a-v , it seems like it would be a safe starting point.
@d-a-v It looks like the GCC patch could be applied by hand, with probably only minor mods, to the GCC-4.x in the repo. The meat of the change is in libcpp/macro.c which seems to not have undergone much editing from gcc4 to gcc7 (case BT_FILE:). Edit the local const char *name somehow (I think environment variables are not a good choice as it's hidden and not something easily used under both Linux/Mac and Windows) to remove the /.../Arduino/hardware/esp8266/xxx/ bit and you're golden.
Whether this is worth the grief, considering you'd need to rebuild the toolchain(this is a libcpp.so file) and use it on all supported platforms, is a question above my pay grade.
--edit--
How cool is it to see this at the top of a file you're looking at editing?
Adapted to ANSI C, Richard Stallman, Jan 1987
I will apply the patch when building the new GCC version. CI builds for all platforms are set up (including ARM), just need to prepare the patch set.
Hi
I have a small team (3 developers) working with Visual Studio + Visual Micro to program ESP8266.
Sometimes they work in the same proejct... So to avoid some problems I would like to check the binary file built from the same version. With the same source files I'm getting differents MD5 values from the "same" firmware. So, how can I trust which one I should use to send to production? We also work with PIC here, but we don't have this behaviour using their compiler.
What is your advice to figure out if the different binary, in fact are the same???
Now that we have GDB support (or will once the pull is merged), I don't think the GCC patch listed above is a good idea as it replaces the file path everywhere (including debug info). I don't think you'd be able to run GDB with symbols w/o serious manual effort remapping the files back to proper paths.
Remapping __FILE__, though, would not affect debugging info while making the final .bin file match between builds in different locations. The .elf will still not match, but will be usable with GDB (and we're concerned w/sketchMD5() anyway, not the ELF itself).
In GCC/gcc/final.c there's a BUILT_IN_FILE accessor we can patch using the same ideas as the above.
This patch actually works, unlike the prior one, to strip paths from __FILE__ wherever it may lie, while leaving the ELF debug info untouched. Just built a test sketch with __FILE__ macros, exited Arduino, renamed the global Arduino dir, rebuilt in the new dir, and the BINs have 0 diffs while GDB has the full paths to source files.
````
diff --git a/libcpp/macro.c b/libcpp/macro.c
index 6d46027e4ea..0390b4c0414 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -275,9 +275,16 @@ _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode *node)
const char *name;
uchar *buf;
Latest git head has this patch in it (2.5.0-4 toolchain). Built same sources, one in ~/sketch_jul21b, one in ~/Arduino/sketch_jul21b. Same bins
earle@server:~/Arduino$ sha256sum /tmp/arduino_build_909419/sketch_jul21b.ino.bin
8f2770e97627615c1753678474a282dabf6e9eb0a8d9b6db33cf1353acdf2db2 /tmp/arduino_build_909419/sketch_jul21b.ino.bin
earle@server:~/Arduino$ sha256sum /tmp/arduino_build_531665/sketch_jul21b.ino.bin
8f2770e97627615c1753678474a282dabf6e9eb0a8d9b6db33cf1353acdf2db2 /tmp/arduino_build_531665/sketch_jul21b.ino.bin
Most helpful comment
This patch actually works, unlike the prior one, to strip paths from
__FILE__wherever it may lie, while leaving the ELF debug info untouched. Just built a test sketch with__FILE__macros, exited Arduino, renamed the global Arduino dir, rebuilt in the new dir, and the BINs have 0 diffs while GDB has the full paths to source files.````
diff --git a/libcpp/macro.c b/libcpp/macro.c
index 6d46027e4ea..0390b4c0414 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -275,9 +275,16 @@ _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode *node)
const char *name;
uchar *buf;
name = linemap_get_expansion_filename (pfile->line_table,
pfile->line_table->highest_line);
+#if defined (_WIN32)
+#endif
else
{
name = _cpp_get_file_name (pfile->main_file);
````