Beets: Discogs: problem with hidden left-to-right marks?

Created on 9 Dec 2016  Â·  11Comments  Â·  Source: beetbox/beets

Problem

I'm trying to import this album:
https://www.discogs.com/Palle-Mikkelborg-Radiojazzgruppen-The-Mysterious-Corona/release/5292307

The resulting folder will be named Hector Bingert ‎– Don Menza

I've noticed that when importing albums with multiple artists from discogs, a left-to-right mark is present in the artist string. It is just before the em-dash in this case.

This causes havoc in my terminal, and the symbol is also present in the artist tag. Could this symbol be stripped away during the import, somehow?

Setup

$ uname -a
Linux x 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ beet version
beets version 1.4.1
Python version 2.7.12
plugins: badfiles, chroma, copyartifacts, discogs, embedart, fetchart, info, missing, scrub, web
needinfo stale

Most helpful comment

I got a reply from Discogs today:

Hi Jack,

Thanks for contacting our support team. I'm passing this issue directly to our developers so that they can take a look and react as appropriate. They'll be sure to take any further action as promptly as possible, although unfortunately I'm not able to provide a solid timetable on a possible resolution. Thank you very much for your patience and your understanding, and my sincere thanks for letting us know about this issue.

Best wishes!

I guess it's better than nothing! 😄

All 11 comments

It may be worth sanitizing media items in general, removing stuff such as null bytes and other invisible characters. What do you think @sampsyo? Would this be best left to plugins or part of beets itself?

Wow; that is weird! I was able to confirm this on my machine. I tagged that specific album and then pasted the string into Python:

>>> u'Hector Bingert ‎– Don Menza - El Encuentro'
u'Hector Bingert \u200e\u2013 Don Menza - El Encuentro'

And that would be U+200E: http://www.fileformat.info/info/unicode/char/200e/index.htm

I'm undecided about where exactly to put this. My first instinct says that, since this seems to be an issue with the Discogs API, let's solve it there. Other web services are unlikely to have the same strange problem. But it would also be nice to know that we've gotten all the weird characters that Discogs might return.

Maybe it would even make sense to file a bug with the Discogs people? This can't be what they're intending to do…

Maybe it would even make sense to file a bug with the Discogs people? This can't be what they're intending to do…

It is a bit odd for them to be returning special control characters like that.

I've opened a ticket with Discogs, I'll update if / when I get a response.

Awesome; thank you!

I got a reply from Discogs today:

Hi Jack,

Thanks for contacting our support team. I'm passing this issue directly to our developers so that they can take a look and react as appropriate. They'll be sure to take any further action as promptly as possible, although unfortunately I'm not able to provide a solid timetable on a possible resolution. Thank you very much for your patience and your understanding, and my sincere thanks for letting us know about this issue.

Best wishes!

I guess it's better than nothing! 😄

@jackwilsdon did you ever hear back from Discogs about the API issue? I'm not sure if it's still an issue on their end. Maybe it's worth sanitising data fetched from third-party APIs in beets more generally?

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@arcresu the last I heard from them was the following:

Thanks for bringing this up. It looks like an issue at the data persistence level, so we'll have to investigate a bit before coming to any particular solution. I have added a ticket for us to investigate this further.

I was given a ticket number of 257929, but it seems like their system no longer recognises this ticket.

The problem still does seem to be present:

$ curl -s https://api.discogs.com/releases/2372867 | jq .artists_sort | xxd
00000000: 2248 6563 746f 7220 4269 6e67 6572 7420  "Hector Bingert 
00000010: e280 8ee2 8093 2044 6f6e 204d 656e 7a61  ...... Don Menza
00000020: 220a                                 

Note that the e2808e is still present (start of second line of output), which is the aforementioned left-to-right mark.

Is it worth us opening a new ticket with Discogs seeing as this is still a problem?

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Is it worth us opening a new ticket with Discogs seeing as this is still a problem?

If you don't mind @jackwilsdon I'd think it's worth it to ask them again...

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vredesbyyrd picture vredesbyyrd  Â·  4Comments

lhupitr picture lhupitr  Â·  5Comments

ctrueden picture ctrueden  Â·  3Comments

bammerlaan picture bammerlaan  Â·  4Comments

Freso picture Freso  Â·  4Comments