Tridactyl: Encoding of non-Latin-1 characters entered in external editor gets messed up with Unicode-based external editors on Windows

Created on 30 Jul 2018  Â·  5Comments  Â·  Source: tridactyl/tridactyl

  • Brief description of the problem: When opening the content of a text input box in an external editor, the encoding of any non-ASCII characters entered there gets messed up.

  • Steps to reproduce:

    1. I tested this on Windows with native_main.bat and native_main.exe.
    2. I'm using <C-i>, i.e., :editor, in a text input box, which does not contain non-ASCII characters. (For the case of the text containing non-Latin characters, see #878.)
    3. I tried to enter, e.g., the characters ä–“→…×.
    4. Then close the external editor (in this case Emacs, configured to expect the temporary file to be UTF-8 encoded).
    5. Here's what I got back in Tridactyl, in the text input box: ä–“→…×. Judging from the à this looks like a charset conversion that mistakes Latin-1 for UTF-8 or vice versa, but I haven't been able to figure it out exactly (and thus to implement a workaround).
  • Tridactyl version (:version): says :REPLACE_ME_WITH_THE_VERSION_USING_SED – the latest non-experimental release of today or yesterday.

  • Firefox version (Top right menu > Help > About Firefox): 60.1.0esr (64-bit Windows)

  • URL of the website the bug happens on: doesn't matter

  • Config (in a new tab, run :viewconfig, copy the url and paste it somewhere like gist.github.com): seems irrelevant to me (but will be happy to provide more details) – in particular choosing, via :set editorcmd, a different external editor, which is configured to open the respective file assuming US-ASCII or Latin-1 encoding, doesn't solve the problem, because then one wouldn't even be able to type any non-Latin characters into the file.

  • Contents of ~/.tridactylrc or ~/.config/tridactyl/tridactylrc (if they exist): I think it doesn't contain anything specific to my problem but will be happy to provide more details.

  • Operating system: Windows 10 64-bit 15063.1155

  • Result of running :! echo $PATH in tridactyl: (BTW this should be :! echo %PATH% on Windows) – really nothing special. Starts with C:\Program Files\Mozilla Firefox, contains all directories containing external editor executables (I tried primarily emacsclientw but also gvim and notepad.)

P2 bug windows

All 5 comments

This looks Windows specific: I can ä–“→…× to my heart's content on Linux.

Edit: it works fine in Emacs on Linux, too.

Tridactyl version (:version): says :REPLACE_ME_WITH_THE_VERSION_USING_SED

You might be running a broken tridactyl version if this is your version number. This is unlikely to fix your issue but could you try manually updating? Your version number should be >=1.13.1-218.

This is probably caused by python's string handling on windows. Could you run the following excmd: jsb tri.native.pyeval("sys.getdefaultencoding()").then(cmd => fillcmdline(cmd.content)) and telling us what it returns?

It might be a good idea to specify "utf-8" in the native messenger, e.g. here: https://github.com/cmcaine/tridactyl/blob/876050d410d7ce5bb5b63d2b4720298bd6caf53f/native/native_main.py#L433-L438 and here: https://github.com/cmcaine/tridactyl/blob/876050d410d7ce5bb5b63d2b4720298bd6caf53f/native/native_main.py#L455

@glacambre regarding my version number, I'll try updating later. I received this version by Firefox's default updating mechanism.
The output of jsb tri.native.pyeval("sys.getdefaultencoding()").then(cmd => fillcmdline(cmd.content)) on my system is utf-8.
Note that my native messenger is the EXE-wrapped version provided by Tridactyl.

@clange @glacambre @bovine3dom

I can reproduce this on Windows. #876 and #878 are very likely due
to the Unicode handling _in-between_ JavaScript and Python (not
specific to JavaScript or Python on Windows).

I will try to have a look. Thank you.

Just to confirm what @gsbabil wrote, the issue is in JavaScript code itself, it should create the temporary file using UTF-8 but currently it doesn't and if the external editor (e.g. Vim) writes UTF-8 into this file, it reads it back as CP1252 (or presumably whatever the current code page is). For the characters that are representable in CP1252, this can be dealt with by setting fileencoding in Vim, but the others just can't be used...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cmcaine picture cmcaine  Â·  3Comments

WloHu picture WloHu  Â·  4Comments

bovine3dom picture bovine3dom  Â·  3Comments

bovine3dom picture bovine3dom  Â·  3Comments

ipwnponies picture ipwnponies  Â·  4Comments