I've been using beets for a couple of years now and I love it. There's a minor annoyance for me that I've noticed since the beginning and have more or less ignored, but I thought I'd finally ask if there's anything I can do about it. Apologies if I've missed an existing solution in the config guide or setup.
When beets imported my library it Unicode-ified a lot of previously plaintext ASCII tags & filenames. For example, "El-P - I'll Sleep When You're Dead" becomes "ElāP - Iāll Sleep When Youāre Dead" (in both files & tags.)
These look almost the same, but the punctuation is Unicode-ified:
This isn't beets' doing, if I download the JSON results for the musicbrainz link then these UTF-8 characters are used there.
Similar things apply for other punctuation marks, this is just a good example as it has two of them. :)
The annoyance is:
I know that I can fix this for files by enabling "asciify", and it looks like this was dealt with for the Lyrics plugin in #270. However as well as Latin-character albums I also have a bunch with names in non-Latin script, so I actually want Unicode for things which I can't effectively represent in ASCII.
I guess my dream feature would be a "sanitise punctuation" option where these almost-the-same-as-an-ASCII-character punctuation glyphs get swapped for their ASCII versions in both tags and filenames, but anything else gets left as UTF-8.
I understand that this is a lot more to do with the design of Unicode than the design of beets (and that some people actually care about the distinction between hyphen-minus and hyphen, I just don't care in this case!)
I'd be happy to look into writing a patch for a feature like the above, if that's potentially acceptable. The approach discussed in #270 for lyrics (ie find-replace) seems applicable.
Hi! Thanks for the discussionāthis is a fairly frequent question, but it's not usually as clearly elaborated as it is here.
It sounds like there are two separate issues:
%asciify{} called %asciify_punct{} or something.Does that sound like an accurate synopsis?
As a stopgap, you may be interested in the "replace" section of config.yaml. It works solely on paths and not tags. The slash may not be needed, I edited my config which uses many weird escape characters.
replace:
'[\ā]': -
Sampsyo's summary is great. #1 looks like the way to go, especially with asciify_punct. I'm not a beets contributor / maintainer, so my opinion isn't as important as the people who dig into the code and make it work. Then in the long term, 488 would also be awesome, but if it were easy it probably would be done already.
Hi @sampsyo & @RollingStar ,
Thanks for the great synopsis @sampsyo and the suggestion @RollingStar .
I think the synopsis is accurate, in as much as those two changes would solve this for me perfectly. I hadn't seen 488, thanks for the heads-up.
Cool. I'm marking this as a feature request for the first part: a version of "asciify" that only affects punctuation.
Any news for this? I'd like to see it affecting tags as well, as Last.FM seems to not auto-correct U+2019 to U+0027 and vice-versa.
Apologies for bumping this issue, but it would really be great to have this working as the previous comment suggests.
Thanks for the great tool!
@imiric I have the same desire, and have a hacky fix that works for my purposes. I have a local version of the beets repo that I have patched with these changes:
--- a/beets/autotag/__init__.py
+++ b/beets/autotag/__init__.py
@@ -26,6 +26,9 @@ from .hooks import AlbumInfo, TrackInfo, AlbumMatch, TrackMatch # noqa
from .match import tag_item, tag_album, Proposal # noqa
from .match import Recommendation # noqa
+from unidecode import unidecode
+
# Global logger.
log = logging.getLogger('beets')
@@ -35,10 +38,12 @@ log = logging.getLogger('beets')
def apply_item_metadata(item, track_info):
"""Set an item's metadata from its matched TrackInfo object.
"""
- item.artist = track_info.artist
+ item.artist = unidecode(track_info.artist)
item.artist_sort = track_info.artist_sort
item.artist_credit = track_info.artist_credit
- item.title = track_info.title
+ item.title = unidecode(track_info.title)
item.mb_trackid = track_info.track_id
if track_info.artist_id:
item.mb_artistid = track_info.artist_id
@@ -62,14 +67,16 @@ def apply_metadata(album_info, mapping):
"""Set the items' metadata to match an AlbumInfo object using a
mapping from Items to TrackInfo objects.
"""
for item, track_info in mapping.items():
# Album, artist, track count.
if track_info.artist:
- item.artist = track_info.artist
+ item.artist = unidecode(track_info.artist)
else:
- item.artist = album_info.artist
- item.albumartist = album_info.artist
- item.album = album_info.album
+ item.artist = unidecode(album_info.artist)
+ item.albumartist = unidecode(album_info.artist)
+ item.album = unidecode(album_info.album)
# Artist sort and credit names.
item.artist_sort = track_info.artist_sort or album_info.artist_sort
@@ -102,7 +109,7 @@ def apply_metadata(album_info, mapping):
item[suffix] = value
# Title.
- item.title = track_info.title
+ item.title = unidecode(track_info.title)
This ensures things like dashes, quotes, etc. are simplified to ASCII.
The post above was a great starting point for me. My copy is calling a little utility function to only decode the punctuation:
def pundecode(text):
result = u""
for character in text:
if character.isalpha():
result += character
else:
result += unidecode(character)
return result
Most helpful comment
@imiric I have the same desire, and have a hacky fix that works for my purposes. I have a local version of the beets repo that I have patched with these changes:
This ensures things like dashes, quotes, etc. are simplified to ASCII.