The last few builds, there were lots of issues related to crashes, freezes, and other kinds of mayhem in ESPeasy-world.
A first indication is the report of CRC errors while reading the settings file, but also loading previously saved settings will fail.
The storage of settings is just a very basic binary dump of memory starting at the address of the Settings object.
That should be fine, as long as all items in that struct are the same size.
However, they are not. It is ranging from booleans(1 bit) to unsigned long (32 bit) to arrays of bytes/chars and 2-dimensional arrays of unsigned int.
To complicate stuff even more, are there compilers which do apply padding and make objects align at addresses multiple of 4.
This means that storing a struct with:
May (or may not) use less bytes than:
And then there are differences in compilers and optimization flags.
Simply stated, the current way of storing data is fine as long as you're always using the same compiler with the same compiler settings and the same struct.
Now back to reality....
People will be using pre-built versions and every now and then build their own.
We're adding members to the settings struct.
So in other words, for now, please use clean settings when upgrading to another version.
And we should add a proper import/export option for settings to make sure the updates are as smooth as can be.
At this moment, it looks like we have to consider all settings possibly tainted, due to the reasons mentioned earlier.
I could try for now, as a quick work-around, make the settings appear stable again and refuse to read previously stored settings. (simply refusing to read when CRC fails)
Please comment on the offered ideas, or offer some new suggestions.
Great round up @TD-er I really like how you explain it to a noob like me 😉
I think we should go with the FAIL=skip old setting.
But I will probably need to update info on wiki, post on the forum and maybe add something in the release notes?
Yep and I am not sure how far back the settings will be correct.
The thing is you really cannot tell what the compiler does to the positioning of the data,
So the thing is you have to either do it yourself, by packing booleans to 32-bit integers, etc.
Or simply not store it like this and expect it to be re-usable on newer builds.
Maybe import/export to some format, or store it in text or whatever.
For starters we can make a new format and make sure all members are packed into 32 bit chunks.
But does this also explain that on a unit that was cleared and reconfigured the settings CRC still fails?
Writing and reading was done with the same firmware in this case.
The unit is working ok inspite of the CRC failing btw
Yes, because there is something "strange" in the function to compute the checksum.
This is the relevant part:
https://github.com/letscontrolit/ESPEasy/blob/b8fa42925f0c73ba08a3571e4d7688166f8d03e6/src/Misc.ino#L700-L717
As you can see, the amount of data added to the MD5Builder object is based on the sizeof(Settings) - 16.
But that only is correct if the address of Settings.md5 starts at the last 16 bytes of that struct.
What if the compiler made some optimization to locate the md5 member to another position in the struct (another position regarding how the struct is used in memory)?
Then the saved checksum did alter the data used to compute the checksum and thus the checksum will never match.
The checksum is currently only an indication and not used in the code to reject the settings.
As long as you're using the same binary, the settings will work just fine.
But as soon as you're using another binary firmware, there is no guarantee the order of the data in the struct remains the same and thus when reading stored settings, the settings may get corrupted in memory.
For example the boolean deepSleepOnFail may be set based on bit number X of member DST_Start
(just taken some random members as example)
Ok, I understand now. Thnx for your explanation.
I've continued reading on the subject.
See this (bit technical) well written article about this subject: Order your members
In short:
The order of members should not be changed by the compiler, but adding new members may result in unexpected behavior due to padding and alignment of members.
And we're probably throwing away quite a lot of memory by not ordering our members based on their size.
On the other hand, adding new members inbetween existing members would make it impossible to read older versions.
When we may change the settings struct, I would advice to set the checksum at the first addresses, add some version number (16 bit) and a length (16 bit) to list the size of the struct when the checksum was computed.
That will allow to continue extending settings and continue to be able to read older ones.
Hi,
I am not sure I understand.
You are suggesting to clean the settings every time there is an updated version of the firmware from one version to another?
So practically whenever there is a new release (from the pre-build versikns) we should upload first a 1MB blank file, then upload the new version and fjnally recreate the devices, rules, etc?
Or these procedures apply only when switching releases compiled from different sources?
How will OTA work in this case?
Sorry if this is a stupid question.
There are no stupid questions, only stupid answers and I hope I'm not making one of those right now ;)
If we should consider a setting invalid, then we can do that in the code itself. Then you do not have to clear the stored settings first. Just do not use the data and clean it in the code to start over.
However, that's a rather drastic solution and not one I prefer.
I would really like to have the ability to keep reading the stored settings when upgrading.
At the moment, the checksum is stored as last object in the settings, which means you have to move the checksum data when adding new members to the settings.
Also the checksum does not know what bytes were used to compute the settings' checksum.
So if you are switching from version B to A (older version) and back to B there is the possibility the checksum inserted by version A overwrites some data used by settings of version B.
This is the main problem I guess and that should be dealt with.
So we can do 3 things as a fix:
I think option 1 is the best one in the end, but may take some time to do well.
In the end it may lead to less memory use, more flexible setting and some other benefits.
But the major drawback is that you invalidate all existing settings up-to this point when the new settings are introduced.
The other two options have as only drawback that you will run into issues when switching back and forth to previous settings. And ofcourse still use more RAM than needed.
But that's the easiest to implement for now.
I don't think the blank file is necessary. The new version may ignore the settings when the version (that is part of settings) is different. I agree it is not an ideal situation. I guess the only other option is to store the settings in another format before upgrade (key-value for instance) but that probably requires a lot of memory.
Edit: TD-er already answered the question.
I think there were ideas to make the plugin set more flexible (either runtime or compile time).
Would that affect the way to store the settings or was the idea abandoned anyway?
Those ideas are still present and actually may become more active now we're experiencing these issues.
They were mainly put aside because implementing them would take quite some work and affects all parts of the code.
But it would make things a lot more flexible and easier to add new things to the settings.
Also a redesign would allow some settings to be changed. For example the topic length for MQTT, user name/password length increase, etc.
I would go for option 3 for now (if it cannot be done good, do it the most simple way).
I understand the reason for the CRC is to mitigate flash read/write errors but the risk of those occurring does not seem very big (at least I have never seen it on my 6 units) and even if the CRC check fails ESPEasy does not act on it.
The CRC for the binary can remain of course.
TD-er: thanks for the clarification.
A few more questions to better understand the problem:
When you say “setting “ you mean the settings file, right?
And in case of invalidation of the settings file, will the IP address. be retained?
So the problem is present only if there is change in the settings structure.
If my above assumptions are correct my suggestion would be to clearly state in the name of the release the version of the settings structure so at least I would know if I want to upgrade and in case I could take note of the config and also would know which versions are compatible with the current settings.
Example: 20180422-dev-settings01
Currently the CRC for the security settings is also not causing any issues, because that struct has not been extended for a while
@giig1967g
Yes, I meant the "SettingsStruct", which is present in the code as "Settings" and also stored in the settings file.
In the settings file, the IP config is also stored, as well as the configuration for the controllers and the devices (called "tasks" in the source code)
Actually everything that's not fit for Rules, Notifications or SecuritySettings. Those are the other files.
From a user point of view then it's really important that he knows before upgrading that the operation will destroy his settings file and that he will not be able to connect to the web server.
So the only way for user to be aware of the change is the firmware file name .
@wdonker : but removing the CRC will solve the problem? I mean, will I be able to move back and forward among different versions?
I thought of a new approach, to simply add a StructSize member to the settings.
This StructSize can also be used to check the stored checksum, but for now the checksum of the settings file will be disabled.
There is also a mechanism to patch settings, which I will also use to set some of the newer members.
I also noticed that some of the more recently added parameters to the settings were never initialized.
OK, curious that I was able to pin this down to after 'a specific build' the problem was introduced,
prior to that was OK..
Are you able to say 'what change' resulted in the 'crc fail' ?
Changing from Rxx build to V2 builds the firmware issues a reset command. (if I interpret it properly..)
Is this code able to be used to help in this case ?
I would prefer one config file for all settings including Rules, Notifications and SecuritySettings based on text form, perhaps something like CSV file protected by CRC instead of current binary form. That could bring more flexibility and hopefully even more compatibility with earlier versions. Maybe the structure could contain variable index, type and value for most items and a special sections for Rules, Notifications and SecuritySettings. Every new variable should just have an index incremented. Import to earlier versions could just ignore unknown indexes (above maximum for that FW release). It could be easy to prepare / maintain settings templates (edit variables offline in text editor and update a recalculated CRC). It should be also possible to remove items from config file which should remain unchanged (import procedure could allow missing indexes when CRC is OK so just some changes could be easily updated). What do you think about it?
My preference is to have some form of very compact storage on the node itself and import/export via JSON.
I am looking at https://github.com/USCiLab/cereal , which allows to do just that.
Only thing is that it depends on a somewhat recent compiler and I am not yet sure it will work together with Arduino IDE.
We simply have to test that.
And if it can be used on both build platforms, the question remains if it is light-weight enough for our already stuffed memory.
So that's a route not to be taken "in a day or two".
The plugin settings are somewhat in binary form, at least for some plugins. So that's also something to look into.
Maybe those plugins should implement their own import/export function. Not sure yet.
@Oxyandy The settings used by Rxx and V2/Mega builds are totally different.
And I am quite sure when the CRC fail was introduced.
That was when the useRTOS flag was added.
It was something nobody anticipated it could have such a big impact.
I prever the 1. way TD-er explaint. I've done some privat projects with a similar problem. My solution was a checksum and or a version number at start.
It has some additional advantages. If you have a versionnumber in the first byte (s), you can fast read it first. No need to read the whole file. Then some migration can be done first.
I prever a binary file format too. Text based files need to much memory, more time to read and parse and more code for parsing.
Most helpful comment
My preference is to have some form of very compact storage on the node itself and import/export via JSON.
I am looking at https://github.com/USCiLab/cereal , which allows to do just that.
Only thing is that it depends on a somewhat recent compiler and I am not yet sure it will work together with Arduino IDE.
We simply have to test that.
And if it can be used on both build platforms, the question remains if it is light-weight enough for our already stuffed memory.
So that's a route not to be taken "in a day or two".
The plugin settings are somewhat in binary form, at least for some plugins. So that's also something to look into.
Maybe those plugins should implement their own import/export function. Not sure yet.
@Oxyandy The settings used by Rxx and V2/Mega builds are totally different.
And I am quite sure when the CRC fail was introduced.
That was when the useRTOS flag was added.
It was something nobody anticipated it could have such a big impact.