https://travis-ci.org/haskell-pushbot/cabal-binaries/builds/258926423
$ Cabal/unit-tests
[snip]
Distribution.Utils.ShortText
ShortText Id: OK
+++ OK, passed 100 tests.
ShortText Ord: OK
+++ OK, passed 100 tests.
ShortText Monoid: OK
+++ OK, passed 100 tests.
ShortText BinaryId: FAIL
*** Failed! Falsifiable (after 40 tests and 5 shrinks):
"\65534"
Use --quickcheck-replay=551066 to reproduce.
ping @ezyang
CC @hvr who introduced this test in 993d20a2e9b8fb29aefaa2c266f31177a00a5ee6
Wikipedia says \65534 is U+FFFE \
FFFE and FFFF are not unassigned in the usual sense, but guaranteed not to be a Unicode character at all. They can be used to guess a text's encoding scheme, since any text containing these is by definition not a correctly encoded Unicode text. Unicode's U+FEFF Byte order mark character can be inserted at the beginning of a Unicode text to signal its endianness: a program reading such a text and encountering 0xFFFE would then know that it should switch the byte order for all the following characters.
Whoa.
I need to look into why a BOM (which btw makes no sense whatsoever for UTF8 encodings) doesn't round-trip properly. Iirc I specifically tested such corner-cases in the implementation of http://hackage.haskell.org/package/text-short
PS: I just noticed this is with the GHC 7.6.3 configuration, so this may be a problem with the legacy fallback...
After some investigation, the issue is in fact for the String-backed legacy fallback, whose Binary instance relies on the roundtrip property of Distribution.Utils.String.{encode,decode}StringUtf8, which fails for the BOM:
> decodeStringUtf8 ( encodeStringUtf8 "\65534")
"\65533"
because decodeStringUtf8 (imho rightfully) considers a BOM invalid in an UTF8 stream, and maps it to the replacement-character.
I'll take a stab at harmonizing the decodeStringUtf8 semantics with the more round-tripping friendly ones from text and text-short.
I'm confident this one's been fixed via #4928; I ran unit-tests -p BinaryId --quickcheck-tests 999999 compiled for GHC 7.6.3 a few times; and also tried the replay value; everything passed so far.
Most helpful comment
I'm confident this one's been fixed via #4928; I ran
unit-tests -p BinaryId --quickcheck-tests 999999compiled for GHC 7.6.3 a few times; and also tried the replay value; everything passed so far.