Hi,
I just noticed a weird thing when I try to build an XML :
require "xml"
string = XML.build(indent: " ", encoding: "utf-8") do |xml|
xml.element("person", id: 1) do
xml.element("firstname") { xml.text "Jane" }
end
end
puts string
<?xml version="1.0" encoding="utf-8U"?>
<person id="1">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
</person>
The 'encoding' attribute is utf-8U instead of utf-8.
Why an 'U' is appended to the character encoding I specify?
Crystal 0.24.1 on Debian 9
This issue is also reproductible in the crystal playground
This looks really strange. It happens with other encodings as well, but not all (for example ISO-8859-1 is unmmodified) and differs depending on capitalization. For example ASCII becomes ASCIIU but ascii becomes asciiV.
I could not identify any potential issues in the Crystal bindings. XML::Builder just passes a char pointer to xmlTextWriterStartDocument pointing to the chars in the string (Bytes[117_u8, 116_u8, 102_u8, 45_u8, 56_u8, 0_u8]) and that's how it is supposed to work.
When using the C API directly, everything works as expected: https://carc.in/#/r/3rri
My example uses xmlBufferCreate and xmlNewTextWriterMemory instead of xmlOutputBufferCreateIO and xmlNewTextWriter. So I guess it has something to do with the IO based buffer used by XML::Builder.
It's https://github.com/crystal-lang/crystal/pull/5587. Fixed in master.
Most helpful comment
It's https://github.com/crystal-lang/crystal/pull/5587. Fixed in
master.