Beets: Ä and Ö characters aren't detected on Windows, "no such file or directory" error

Created on 20 Apr 2019  ·  15Comments  ·  Source: beetbox/beets

Problem

Beet doesn't detect at least the Ä and Ö characters, preventing from managing music with applicable filenames. Examples of artists with a diaeresis include Motörhead and Mötley Crüe.

Running beet -vv import -A "E:\CUETools\Motörhead" in verbose (-vv) mode:

user configuration: C:\Users\user\AppData\Roaming\beets\config.yaml
data directory: C:\Users\user\AppData\Roaming\beets
plugin paths: C:\Users\user\beets\myplugins
Sending event: pluginload
library database: C:\Users\user\AppData\Roaming\beets\library.db
library directory: C:\Users\user\Music
Sending event: library_opened
error: no such file or directory: E:\CUETools\Motrhead

Setup

  • OS: Windows 10 64-bit
  • Python version: Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] on win32
  • beets version: 1.4.7
  • Turning off plugins made problem go away (yes/no): no

My configuration (output of beet config) is:

directory: C:\Users\user\Music\
library: C:\Users\user\AppData\Roaming\beets\library.db

import:
    copy: no
    write: no
    resume: ask
    quiet_fallback: skip
    timid: no
    log: beetslog.txt
ignore: .cue .log .pdf .accurip .m3u8 .m3u .txt .nfo
art_filename: cover

plugins: convert
pluginpath: ~/beets/myplugins
threaded: yes

ui:
    color: yes

paths:
    default: $albumartist/$album/$track $title
    singleton: single songs/$artist - $title
    comp: $album/$track $title
    albumtype:soundtrack: soundtracks/$album/$track $title
convert:
    copy_album_art: yes
    embed: no
    never_convert_lossy_files: yes
    format: opus
    formats:
        opus:
            command: ffmpeg -i $source -acodec libopus -b:a 128k $dest
            extension: opus
        aac:
            command: ffmpeg -i $source -y -vn -acodec aac -aq 1 $dest
            extension: m4a
        alac:
            command: ffmpeg -i $source -y -vn -acodec alac $dest
            extension: m4a
        flac: ffmpeg -i $source -y -vn -acodec flac $dest
        mp3: ffmpeg -i $source -y -vn -aq 2 $dest
        ogg: ffmpeg -i $source -y -vn -acodec libvorbis -aq 3 $dest
        wma: ffmpeg -i $source -y -vn -acodec wmav2 -vn $dest
    dest:
    pretend: no
    threads: 8
    max_bitrate: 500
    auto: no
    tmpdir:
    quiet: no

    paths: {}
    no_convert: ''
    album_art_maxwidth: 0
needinfo stale windows

Most helpful comment

Alright so it seems like I've somewhat worked out what's happening;

  1. Arguments in Python 3 are unicode strings, which are passed to beets.ui.commands.import_func during import
  2. The unicode string is converted to the system's argument encoding (cp1252 on my Windows install)

    • This changes the unicode argument to a byte string, with \xf6 representing the ö: b'Mot\xf6rhead'

  3. The converted string is passed to beets.util.normpath which passes it to beets.util.syspath
  4. beets attempts to decode it as UTF-8 which promptly fails as \xf6 is not a valid UTF-8 representation (the correct UTF-8 representation for ö is \xc3\xb6).

    • Interestingly enough \x00f6 is the value of ö in UTF-16, but I don't think that's relevant

  5. beets tries to decode it again using either the filesystem encoding or default system encoding if there is no filesystem encoding

    • This is again UTF-8 on my system, but because path.decode(..., 'replace') has been used, it doesn't fail and instead replaces the ö with a .

  6. beets tries to use the path Mot�rhead, which clearly fails spectacularly

I'm not really sure what the right course of action is, but it certainly seems that we shouldn't be encoding the arguments using cp1252 and then decoding them using utf-8.

Sorry if the text above is a bit incoherent, I wrote it as I came across the code. It might be better discussing this in our new Gitter.

TL;DR: the culprit seems to be encoding arguments in cp1252 and decoding them in utf-8.

All 15 comments

These kinds of encoding issues are really hard to debug on Windows. (To be clear, this typically isn't a problem on Unix OSes.) It's hard to say what's going on here, but any chance you could try fiddling around with your codepage settings for cmd.exe?

Beyond that, anybody who runs Windows is hereby invited to go spelunking to see if they can reproduce and narrow down what locale business might be causing this for you.

any chance you could try fiddling around with your codepage settings for cmd.exe?

I use Powershell, I try changing the encoding setting tomorrow.

What about passing your import directory like:

\\?\<Drive>:\<directory to be imported>

That is the format I use for my music location when music is to be moved. This gets past encoding and path like issues so far on WIN10_X64.

What about passing your import directory like:

\\?\<Drive>:\<directory to be imported>

That is the format I use for my music location when music is to be moved. This gets past encoding and path like issues so far on WIN10_X64.

Assuming that I did this correctly, beet import -A \\?\E:\CUETools\Motörhead:

error: no such file or directory: \\?\E:\CUETools\Motrhead

I changed Powershell encoding to UTF-8, still have the same issue.

https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8

$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'

verified with $PSDefaultParameterValues['Out-File:Encoding']:

utf8

What happens if you cd to the folder and use relative import? beet import -A .

Alright so it seems like I've somewhat worked out what's happening;

  1. Arguments in Python 3 are unicode strings, which are passed to beets.ui.commands.import_func during import
  2. The unicode string is converted to the system's argument encoding (cp1252 on my Windows install)

    • This changes the unicode argument to a byte string, with \xf6 representing the ö: b'Mot\xf6rhead'

  3. The converted string is passed to beets.util.normpath which passes it to beets.util.syspath
  4. beets attempts to decode it as UTF-8 which promptly fails as \xf6 is not a valid UTF-8 representation (the correct UTF-8 representation for ö is \xc3\xb6).

    • Interestingly enough \x00f6 is the value of ö in UTF-16, but I don't think that's relevant

  5. beets tries to decode it again using either the filesystem encoding or default system encoding if there is no filesystem encoding

    • This is again UTF-8 on my system, but because path.decode(..., 'replace') has been used, it doesn't fail and instead replaces the ö with a .

  6. beets tries to use the path Mot�rhead, which clearly fails spectacularly

I'm not really sure what the right course of action is, but it certainly seems that we shouldn't be encoding the arguments using cp1252 and then decoding them using utf-8.

Sorry if the text above is a bit incoherent, I wrote it as I came across the code. It might be better discussing this in our new Gitter.

TL;DR: the culprit seems to be encoding arguments in cp1252 and decoding them in utf-8.

What happens if you cd to the folder and use relative import? beet import -A .

Import succeeds then.

Sorry, the command beet -vv import -A "E:\CUETools\Motörhead" in my issue was missing ¨ because I was experimenting if I can pass the command without the diaeresis and copied a wrong line.

I don't think the quotes make any difference

Wow! Very nice work investigating this, @jackwilsdon. It seems like we need to somehow remember that, on Windows, we either (a) need to preserve the Unicode command-line arguments as-is, or (b) re-decode them later using the argument encoding to recover the original filename.

Doing this in a cross-platform way is absolutely crazy-making! I'm really not sure what a clean solution is, but it will need a lot of platform-specific special cases…

I sent this in Gitter but I thought it's worth putting here too and fleshing out a bit:

What are your thoughts on somewhat "abstracting" I/O such that we use unicode internally within beets and delegate to some other layer to handle converting to the system native encoding? As initial phase we could have some form of "Filesystem" layer which handles all of this, expand into a layer which handles arguments passed into the process too.

I'm not sure to what extent we currently use unicode strings vs. platform native strings within beets, but I think it would greatly simplify logic if we could move all of the encoding handling elsewhere and keep the core of beets working in just unicode.

A smart abstraction (something like pathlib) might be really nice! However, there is a downside to representing all paths as Unicode—namely, that paths on Unix are not guaranteed to follow any particular Unicode encoding and in practice often do not. (I followed up on Gitter—still trying to get used to it! :smiley:)

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings