kdb mount test.ini user/test ini
echo "\xEF\xBB\xBF[sec]\x0Aa=b" > `kdb file user/test`
That BOM is interpreted like a whitespace (simply ignored) and would not confuse the parser.
kdb ls user/test
#> user/test/sec/a
It seems like BOM+[ confuses the parser so that the section is not recognized.
kdb ls user/test
#> user/test/a
#> user/test/[sec]
Hi! Can you please assign me to this issue? I have picked 3 more ini related problems.
Actually this is a generic problem where maybe several parsers will choke. E.g. I quickly tried yajl, which has a similar problem:
kdb mount test.yajl user/test yajl
echo "{\"a\": \"b\"}" > `kdb file user/test` # works
echo "\xEF\xBB\xBF{\"a\": \"b\"}" > `kdb file user/test` # fails
To solve this generically, we could implement a remove/bom in the plugin filecheck (simply stripping the BOM from the file) or simply reject files which have BOM (use the already existing reject/bom feature).
@sanssecours What do you think? Do your storage plugins support BOM?
What do you think?
Rejecting a file with a BOM sounds like a good idea. Stripping the BOM (and maybe re-adding it later) sounds good in theory, but would potentially require communication between the BOM and storage plugin in the set direction.
I think the best solution would be to add the iconv plugin to the backend and convert the data in the get direction into UTF-8 (without BOM) and then back to whatever encoding the file used before. If there was BOM, then iconv should just add it again in the set direction. This way storage plugins could always just assume UTF-8 encoding. For all files that use the proper format (UTF-8 without BOM) the overhead would be minimal, since iconv does not need to change the file content at all.
Do your storage plugins support BOM?
YAML CPP does support BOMs:
kdb mount config.yaml user/tests/yaml yaml
printf "\xEF\xBB\xBFkey: value" > `kdb file user/tests/yaml`
kdb ls user/tests/yaml
#> user/tests/yaml/key
. All other plugins that I wrote do not, as far as I can tell.
Yes, I fully agree! To fix this issue here, it is enough if all storage plugins require "filecheck" and we reject all these different malformed files.
Later (if someone really needs config files with BOM or UTF-16) we can introduce a new fileiconv plugin.
@sanssecours Should we only reject BOM or also files with inconsistent line endings or any other setting filecheck has?
@alexsaber To solve the issue, you would add filecheck in infos/needs (in basically all storage plugins) and add /reject/bom (and maybe also check/lineending, depending on @sanssecours) to be default if no config is given for filecheck.
Note: Currently the filecheck plugin always goes over the full content, even if only /reject/bom is active. If we decide that /reject/bom is default, we should optimize this case. This should be trivial.
Thank you for the hints and comments!
Am Do., 21. März 2019 um 18:19 Uhr schrieb markus2330 <
[email protected]>:
Yes, I fully agree! To fix this issue here, it is enough if all storage
plugins require "filecheck" and we reject all these different malformed
files.Later (if someone really needs config files with BOM or UTF-16) we can
introduce a new fileiconv plugin.@sanssecours https://github.com/sanssecours Should we only reject BOM
or also files with inconsistent line endings or any other setting filecheck
has?@alexsaber https://github.com/alexsaber To solve the issue, you would
add filecheck in infos/needs (in basically all storage plugins) and add
/reject/bom (and maybe also check/lineending, depending on @sanssecours
https://github.com/sanssecours) to be default if no config is given for
filecheck.Note: Currently the filecheck plugin always goes over the full content,
even if only /reject/bom is active. If we decide that /reject/bom is
default, we should optimize this case. This should be trivial.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ElektraInitiative/libelektra/issues/1942#issuecomment-475324338,
or mute the thread
https://github.com/notifications/unsubscribe-auth/APkYvvuRg0CC0SQKDZFF1Xis5UhDZ10qks5vY789gaJpZM4TlW_A
.
Should we only reject BOM or also files with inconsistent line endings or any other setting filecheck has?
Most storage plugins assume Unix line endings (\n). I think it would be great, if we reject all files that use Windows (\n\r) or Classic Mac OS (\r) line endings for those plugins. Rejecting files which contain null bytes or unprintable characters does also make sense for a lot of storage plugins.
Yes, most likely most of them will not support anything else other than \n. For csvstorage, however, we now add support for DOS line endings, see #2508. For the others I am fine with rejecting BOM and non-UNIX line endings.
INI is not important anymore due to TOML