Elixir: Show BOM on inspected strings

Created on 2 Dec 2017  ·  6Comments  ·  Source: elixir-lang/elixir

Environment

Erlang/OTP 20 [erts-9.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
Elixir 1.5.2
Ubuntu 17.10

Current behavior

Strings produced with the ~s sigil sometimes contain the BOM. If that's the case, then subsequent comparison with the same string created with plain quotes will fail.

Reproduction:

iex(50)> s1 = ~s(2017-11-29\t2017-12-02\t"QUOTED"\t"ABCDE"\t"EFG"\t"HIJKLMON"\tPQR)
"2017-11-29\t2017-12-02\t\"QUOTED\"\t\"ABCDE\"\t\"EFG\"\t\"HIJKLMON\"\tPQR"
iex(51)> s2 = "2017-11-29\t2017-12-02\t\"QUOTED\"\t\"ABCDE\"\t\"EFG\"\t\"HIJKLMON\"\tPQR"
"2017-11-29\t2017-12-02\t\"QUOTED\"\t\"ABCDE\"\t\"EFG\"\t\"HIJKLMON\"\tPQR"
iex(52)> s1 == s2
false

The BOM is visible upon inspection with i:

iex(53)> i s1
Term
  "2017-11-29\t2017-12-02\t\"QUOTED\"\t\"ABCDE\"\t\"EFG\"\t\"HIJKLMON\"\tPQR"
Data type
  BitString
Byte size
  62
Description
  This is a string: a UTF-8 encoded binary. It's printed surrounded by
  "double quotes" because all UTF-8 encoded codepoints in it are printable. 
Raw representation
  <<239, 187, 191, 50, 48, 49, 55, 45, 49, 49, 45, 50, 57, 9, 50, 48, 49, 55, 45, 49, 50, 45, 48, 50, 9, 34, 81, 85, 79, 84, 69, 68, 34, 9, 34, 65, 66, 67, 68, 69, 34, 9, 34, 69, 70, 71, 34, 9, 34, 72, ...>>
Reference modules
  String, :binary
Implemented protocols
  IEx.Info, List.Chars, Inspect, String.Chars, Collectable
iex(54)> i s2
Term
  "2017-11-29\t2017-12-02\t\"QUOTED\"\t\"ABCDE\"\t\"EFG\"\t\"HIJKLMON\"\tPQR"
Data type
  BitString
Byte size
  59
Description
  This is a string: a UTF-8 encoded binary. It's printed surrounded by
  "double quotes" because all UTF-8 encoded codepoints in it are printable. 
Raw representation
  <<50, 48, 49, 55, 45, 49, 49, 45, 50, 57, 9, 50, 48, 49, 55, 45, 49, 50, 45, 48, 50, 9, 34, 81, 85, 79, 84, 69, 68, 34, 9, 34, 65, 66, 67, 68, 69, 34, 9, 34, 69, 70, 71, 34, 9, 34, 72, 73, 74, 75, ...>>
Reference modules
  String, :binary
Implemented protocols
  IEx.Info, List.Chars, Inspect, String.Chars, Collectable

Expected behavior

I would expect the string comparison to pass. I had this exact case in ExUnit case and the test would fail with printing the same strings and saying they failed the equality check, which is super-confusing.

Elixir Enhancement Starter

Most helpful comment

For now, let's show \uFEFF on inspected strings (instead of showing nothing). That should at least make it more obvious without changing its representation.

All 6 comments

I don’t think the sigil is adding the BOM but rather your editor. It is the
same as “é” which can be written in two different ways in Unicode and they
won’t be equal. Elixir won’t alternate between the two. It is all up to you
and your editor. Similar to any zero width white spaces in Unicode you may
add.

Maybe ExUnit could show better diffs in this case, but I don’t think it is
an issue with sigils.

I'm able to reproduce this by copy-pasting the s1 string from Chrome to GNOME Terminal. All settings are on defaults. But as you noted, this is environment-related, since if I type the same string with the sigil straight into the iex session in the console, the BOM is not added.

I guess the issue can be closed, though I'm really unhappy about this.

I guess the issue can be closed, though I'm really unhappy about this.

Yeah, I totally understand that. It is the same issue as this:

iex(1)> "é" == "é"
false

or using non zero width white space:

iex(1)> "a" == "a"
false

It makes you pull your hair until you figure out what is really happening. It is not a behaviour specific to Elixir either.

For now, let's show \uFEFF on inspected strings (instead of showing nothing). That should at least make it more obvious without changing its representation.

I'd like to give this issue a go if it's up for grabs.

Man, you guys are quick. I was about to write that I'll happily contribute a PR.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Paddy3118 picture Paddy3118  ·  3Comments

vothane picture vothane  ·  3Comments

shadowfacts picture shadowfacts  ·  3Comments

ckampfe picture ckampfe  ·  3Comments

ericmj picture ericmj  ·  3Comments