Exoplayer: Support non-UTF-8 ICY metadata

Created on 11 Dec 2019  Â·  3Comments  Â·  Source: google/ExoPlayer

[REQUIRED] Issue description

I read an audio stream from a URL (it's a webradio) with ICY metadata. The source is a bit old, it uses Airtime, and it sends its metadata in a non-UTF-8 format. In our case, it's mostly French songs.

Current behaviour:
  • When retrieving metadata, ExoPlayer replaces all accentuated characters (like é, à, ë, â, ç...) with a single replacement character � (code 0xFFFD). It is then impossible to guess which character has been replaced by this unknown character �.

    Desired behaviour
  • ExoPlayer should ideally adapt to these sources to parse the correct characters by relying on a given Locale. Or at least, it should provide a raw result of the data it parsed (for example in an array of uint8 to represent each byte it parsed from the ICY) to allow the programmer to deal with these characters by themself.

[REQUIRED] Reproduction steps

You can clone and run my radio app : https://github.com/yattoz/Tsumugi-app
All you need to do is press play and wait. Songs with accents are not very frequent so you may need to wait quite some time.
The metadata code is located in the file RadioService.kt.

[REQUIRED] Link to test content

The audio source used is : https://radio.mahoro-net.org/streams/tsumugi
It's in plain text in the app, and is accessible from anywhere in the world.

[REQUIRED] A full bug report captured from the device

You'll find attached the bug report from my physical device, Motorola Moto G5 Plus "potter". I harvested it right when a song with an accent appeared (see logcat extract just below)
bugreport-potter_n-OPS28.85-17-6-2-2019-12-11-22-38-04.zip

In addition, here is what is displayed by my Log when I print respectively:

  • the title
  • the title as Int values
  • the Metadata object used in addMetadataOutput
E/fr.forum_thalie.tsumugi: ======RadioService=====onMetadata: Title ----> France Gall - R�siste
E/fr.forum_thalie.tsumugi: [70, 114, 97, 110, 99, 101, 32, 71, 97, 108, 108, 32, 45, 32, 82, 65533, 115, 105, 115, 116, 101]
E/fr.forum_thalie.tsumugi: raw: entries=[ICY: title="France Gall - R�siste", url="null", rawMetadata="StreamTitle='France Gall - R�siste';"]

[REQUIRED] Version of ExoPlayer being used

I am using ExoPlayer 2.11.0 (the latest release at the time of writing).
I saw the same behaviour on 2.10.6.

[REQUIRED] Device(s) and version(s) of Android being used

This has been reproduced on:

  • Android emulators for Android 5, 6, 7.1, 9 and 10.
  • Motorola Moto G5 Plus, Android 8.1
  • Xiaomi Redmi 6A, Android 8.1 then 9 (after update)
  • Blackberry Q5 (API18 - equivalent Android 4.3)

Comment:

I didn't test anything like modifying ExoPlayer by myself, but I happened to read quickly the files related to ICY metadata parsing: https://github.com/google/ExoPlayer/tree/release-v2/library/core/src/main/java/com/google/android/exoplayer2/metadata/icy

It might be a problem to force the decoding as UTF-8 of the byte array in the IcyDecoder:
https://github.com/google/ExoPlayer/blob/76962d50f1d80941d6768e4e765fa4ff010705e7/library/core/src/main/java/com/google/android/exoplayer2/metadata/icy/IcyDecoder.java#L42
This method is actually a simple String decoding with a given charset: https://github.com/google/ExoPlayer/blob/76962d50f1d80941d6768e4e765fa4ff010705e7/library/core/src/main/java/com/google/android/exoplayer2/util/Util.java#L545

Or as I said before, if there's no good alternative, it might be helpful to store and expose this byte array to let the developer deal with special characters.

Thank you very much for your hard work!

enhancement

Most helpful comment

It seems your stream metadata is encoded in ISO-8859-1.

It looks like this is the default for at least one ICY server:
https://github.com/savonet/liquidsoap/issues/411#issuecomment-288759200

I've updated IcyDecoder to fall-back to ISO-8859-1 if UTF-8 decoding fails - now accents in your stream are rendered correctly in LogCat by the demo app.

All 3 comments

Thanks for the report! I wasn't able to reproduce after watching the provided stream for ~1 hour - but I can see how we're assuming a UTF-8 character encoding without any concrete evidence, and it looks like it's not strictly defined for ICY.

I'll have a look into how we can best handle this.

I'm going to mark this as an enhancement, since the ICY spec is pretty under-defined it's hard to really call this a bug in ExoPlayer - we currently do a sensible-ish thing in an ambiguous situation :)

If this can help, I noticed the following when listening to this stream using Foobar2000 on Windows.

  • If I explicitly set in Regional Settings that non-Unicode text should be interpreted with the French locale, Foobar2000 displays these characters correctly.
  • If I set up in Regional Settings another language, say for example Japanese, Foobar2000 interprets the special characters the way they are interpreted in Japanese : it uses the special character + the next character (making it 16 bytes) and decode these 16 bytes as a Japanese character. So a text like Résiste is displayed as something like Ræ¼¢iste (notice how the first s has disappeared). (In that case, of course Japanese is not the right way to decode this stream. But if we imagine that some Japanese stream is using non-Unicode encoding, and relies only with this "legacy" encoding, then ExoPlayer simply won't be able to decode it at all.)

I don't know if that could help, but that's what I noticed.

It seems your stream metadata is encoded in ISO-8859-1.

It looks like this is the default for at least one ICY server:
https://github.com/savonet/liquidsoap/issues/411#issuecomment-288759200

I've updated IcyDecoder to fall-back to ISO-8859-1 if UTF-8 decoding fails - now accents in your stream are rendered correctly in LogCat by the demo app.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x410c picture 0x410c  Â·  3Comments

talklittle picture talklittle  Â·  3Comments

Tramnguyen108 picture Tramnguyen108  Â·  3Comments

mkaflowski picture mkaflowski  Â·  3Comments

orcunkobal picture orcunkobal  Â·  3Comments