I read an audio stream from a URL (it's a webradio) with ICY metadata. The source is a bit old, it uses Airtime, and it sends its metadata in a non-UTF-8 format. In our case, it's mostly French songs.
When retrieving metadata, ExoPlayer replaces all accentuated characters (like é, à , ë, â, ç...) with a single replacement character � (code 0xFFFD). It is then impossible to guess which character has been replaced by this unknown character �.
ExoPlayer should ideally adapt to these sources to parse the correct characters by relying on a given Locale. Or at least, it should provide a raw result of the data it parsed (for example in an array of uint8 to represent each byte it parsed from the ICY) to allow the programmer to deal with these characters by themself.
You can clone and run my radio app : https://github.com/yattoz/Tsumugi-app
All you need to do is press play and wait. Songs with accents are not very frequent so you may need to wait quite some time.
The metadata code is located in the file RadioService.kt.
The audio source used is : https://radio.mahoro-net.org/streams/tsumugi
It's in plain text in the app, and is accessible from anywhere in the world.
You'll find attached the bug report from my physical device, Motorola Moto G5 Plus "potter". I harvested it right when a song with an accent appeared (see logcat extract just below)
bugreport-potter_n-OPS28.85-17-6-2-2019-12-11-22-38-04.zip
In addition, here is what is displayed by my Log when I print respectively:
addMetadataOutputE/fr.forum_thalie.tsumugi: ======RadioService=====onMetadata: Title ----> France Gall - R�siste
E/fr.forum_thalie.tsumugi: [70, 114, 97, 110, 99, 101, 32, 71, 97, 108, 108, 32, 45, 32, 82, 65533, 115, 105, 115, 116, 101]
E/fr.forum_thalie.tsumugi: raw: entries=[ICY: title="France Gall - R�siste", url="null", rawMetadata="StreamTitle='France Gall - R�siste';"]
I am using ExoPlayer 2.11.0 (the latest release at the time of writing).
I saw the same behaviour on 2.10.6.
This has been reproduced on:
I didn't test anything like modifying ExoPlayer by myself, but I happened to read quickly the files related to ICY metadata parsing: https://github.com/google/ExoPlayer/tree/release-v2/library/core/src/main/java/com/google/android/exoplayer2/metadata/icy
It might be a problem to force the decoding as UTF-8 of the byte array in the IcyDecoder:
https://github.com/google/ExoPlayer/blob/76962d50f1d80941d6768e4e765fa4ff010705e7/library/core/src/main/java/com/google/android/exoplayer2/metadata/icy/IcyDecoder.java#L42
This method is actually a simple String decoding with a given charset: https://github.com/google/ExoPlayer/blob/76962d50f1d80941d6768e4e765fa4ff010705e7/library/core/src/main/java/com/google/android/exoplayer2/util/Util.java#L545
Or as I said before, if there's no good alternative, it might be helpful to store and expose this byte array to let the developer deal with special characters.
Thank you very much for your hard work!
Thanks for the report! I wasn't able to reproduce after watching the provided stream for ~1 hour - but I can see how we're assuming a UTF-8 character encoding without any concrete evidence, and it looks like it's not strictly defined for ICY.
I'll have a look into how we can best handle this.
I'm going to mark this as an enhancement, since the ICY spec is pretty under-defined it's hard to really call this a bug in ExoPlayer - we currently do a sensible-ish thing in an ambiguous situation :)
If this can help, I noticed the following when listening to this stream using Foobar2000 on Windows.
Résiste is displayed as something like R漢iste (notice how the first s has disappeared). (In that case, of course Japanese is not the right way to decode this stream. But if we imagine that some Japanese stream is using non-Unicode encoding, and relies only with this "legacy" encoding, then ExoPlayer simply won't be able to decode it at all.)I don't know if that could help, but that's what I noticed.
It seems your stream metadata is encoded in ISO-8859-1.
It looks like this is the default for at least one ICY server:
https://github.com/savonet/liquidsoap/issues/411#issuecomment-288759200
I've updated IcyDecoder to fall-back to ISO-8859-1 if UTF-8 decoding fails - now accents in your stream are rendered correctly in LogCat by the demo app.
Most helpful comment
It seems your stream metadata is encoded in ISO-8859-1.
It looks like this is the default for at least one ICY server:
https://github.com/savonet/liquidsoap/issues/411#issuecomment-288759200
I've updated IcyDecoder to fall-back to ISO-8859-1 if UTF-8 decoding fails - now accents in your stream are rendered correctly in LogCat by the demo app.