Node: Investigate non-ASCII build file path on Windows

Created on 16 Nov 2017  Â·  18Comments  Â·  Source: nodejs/node

  • Version: master
  • Platform: Windows
  • Subsystem: build

If there's a non-ASCII character in the path to the node directory, the build will break. This issue is only happening on Windows as far as I'm aware (there's a similar bug on POSIX systems, but it was already fixed). See the discussion on https://github.com/nodejs/node/pull/16735 and the errors below:

image
(https://gist.github.com/anonymous/875aac0ab3e4dfa219854b691f8d3431)

addons build i18n-api windows

All 18 comments

It's ironic that ICU doesn't support Unicode properly.

The issue here is that icutrim.py fails with a non-ASCII path:

C:\Users\Nikolai\WindĂ’o\node\tools\icu>python icutrim.py -P C:\Users\Nikolai\WindĂ’o\node\Debug\ -D ..\..\deps\icu-small\source\data\in\icudt60l.dat --delete-tmp -T C:\Users\Nikolai\WindĂ’o\node\Debug\obj\global_intermediate\icutmp -F icu_small.json -O icudt60l.dat -v -L en,root
Options: {'verbose': 1, 'filterfile': 'icu_small.json', 'toolpath': 'C:\\Users\\Nikolai\\Wind\xd2o\\node\\Debug\\', 'deltmpdir': 1, 'outfile': 'icudt60l.dat', 'datfile': '..\\..\\deps\\icu-small\\source\\data\\in\\icudt60l.dat', 'locales': 'en,root', 'endian': 'little', 'tmpdir': 'C:\\Users\\Nikolai\\Wind\xd2o\\node\\Debug\\obj\\global_intermediate\\icutmp'}
icu_small.json: icutrim.py config: Trim down ICU to just a certain locale set, needed for node.js use.
* converters: 0 items
* stringprep: 0 items
* translit: 2 items
* brkfiles: 0 items
* brkdict: 0 items
* confusables: 0 items
* brkitr: 0 items
* coll: 1 items
* curr: 1 items
* lang: 0 items
* rbnf: 0 items
* region: 0 items
* ROOT: 1 items
* unit: 0 items
* zone: 1 items

###WARNING: Encountered abnormal bytes while converting input stream to target encoding: U_ILLEGAL_CHAR_FOUND
        Pre-context: \Users\Nikolai\Win
        Context: ĂŠ
        Post-context: o\node\Debug\iculs
An error occurred processing file C:\Users\Nikolai\WindĂŠo\node\Debug\obj\global_intermediate\icutmp\lang/\res_index.txt. Error: U_ILLEGAL_CHAR_FOUND
FAILED: C:\Users\Nikolai\WindĂŠo\node\Debug\genrb -d C:\Users\Nikolai\WindĂŠo\node\Debug\obj\global_intermediate\icutmp\lang/ -s C:\Users\Nikolai\WindĂŠo\node\Debug\obj\global_intermediate\icutmp\lang/ res_index.txt

Who maintains this file? It contains a link to http://bugs.icu-project.org/, but its history consists only of Node.js-local changes. cc @srl295

Welcome to the irony of my life. You can workaround with full-icu as an option. Otherwise will look at this next week when I am at a windows box. And sorry.

@seishun I can't even get this far. I get:

C:\Users\StevenLoomis\Documents\GitHub\NĂ’DE>vcbuild.bat
ERROR: The system was unable to find the specified registry key or value.
ERROR: The system was unable to find the specified registry key or value.
Cannot determine current version of Node.js

C:\Users\StevenLoomis\Documents\GitHub\NĂ’DE>

… and then an editor is popped up to _edit_ getnodeversion.py

(running node --version returns v8.9.4)

I ran this from in powershell and cmd.exe, in and out of the visual studio command tools for 2015 and 2017.

@mmarchini what is the actual path of your node dir ? I see both:

C:UsersNikolaiWindĂ’o

and

C:UsersNikolaiWindĂŠo

@srl295 That looks like you don't have Python installed (see https://github.com/nodejs/node/issues/16864).

what is the actual path of your node dir ? I see both:

I reproduced it by placing it in "WindĂ’o". I don't know why Python outputs "WindĂŠo".

Thx. Will retry when back at that box.

@srl295 do you think you'll have to time to look into it any time soon? I could try fixing it myself, but I might not have enough knowledge about it.

@seishun I'll have to see, I just reinstalled my windows env and have some things coming up. So not necessarily soon.

The issue is in iculslocs. When run from a non-ASCII path, it generates a file with invalid UTF-8, which genrb chokes on.

To reproduce, run:

C:\<non-ascii path>\iculslocs -i icudt60l.dat -N icudt60l -T lang -b res_index.txt

@srl295 how would you suggest to proceed with this? It seems iculslocs is a third-party project.

edit I wrote iculslocs for node, but thought that it might be used upstream in ICU. As things stand now, node is the only user and I have not maintained any changes upstream.

@seishun

how would you suggest to proceed with this

it can be fixed by a normal PR. Still did not get to investigate.

It looks like you posted an unfinished comment.

@seishun can you attach the res_index.txt file here? There are multiple files under ./out/Release/gen/icutmp/

You mean the file iculslocs generates? Here you go: res_index.txt

^ that would still be interesting, but I think I found the problem.

iculslocs writes out:

(BOM)// -*- Coding: utf-8; -*-
//
// Warning this file is automatically generated
// Updated by ../../out/Release/iculslocs based on icudt60l:res_index.txt

… but the path given is based on argv[0] which is in process encoding, not necessarily utf-8. The path doesn't really add any value here. I'll change it to the following (note that the ../.. path is gone)

// -*- Coding: utf-8; -*-
//
// NOTE: This file was generated during the build process.
// Generator: tools/icu/iculslocs.cc
// Input package-tree/item: icudt60l-lang/res_index.res

(package icudt60l, tree lang, item res_index.res - probably only of interest to someone debugging )

@seishun are you able to test the commit I made to https://github.com/srl295/node/tree/fix-iculslocs ?

@seishun perfect, that confirms what I thought. I think the above fix will solve the issue. Thanks for your investigation.

For the record, your file had broken utf-8 in the comment mentioned above.

// Updated by C:\Users\Nikolai\Wind“o\node\Debug\iculslocs based on icudt60l-lang:res_index.txt

That fixes the issue with iculslocs, but doesn't completely fix the build issue because there is another similar issue with "protocol_generated_sources" from V8.

In addition, fixing both issues would allow builds from Latin-1 paths, but not for all Unicode paths. Fixing it generally probably requires moving to a different build system (https://github.com/nodejs/node/issues/18560#issuecomment-363120101).

@seishun OK. this fix should at least make the ICU tooling subsystem not dependent on the path (whether latin1 or all unicode). I'll open a PR.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danielstaleiny picture danielstaleiny  Â·  3Comments

Brekmister picture Brekmister  Â·  3Comments

stevenvachon picture stevenvachon  Â·  3Comments

seishun picture seishun  Â·  3Comments

srl295 picture srl295  Â·  3Comments