Node: detect "full-icu" module

Created on 21 Oct 2015  Â·  67Comments  Â·  Source: nodejs/node

Followup to https://github.com/nodejs/node-v0.x-archive/issues/8996

  • The npm module full-icu will at postinstall time make sure node_modules/full-icu contains the appropriate icudt*.dat file needed for full ICU support.

    • it also prints out instructions on how to fire up Node with full ICU.

  • Proposal: somehow detect the special (?) module name full-icu in any of the global search paths or local module search paths. If found && the correct icudt file is present, _automatically set Node's ICU path to point to that icudt_.dat file*

This will make getting full data as easy as npm install [-g] full-icu

feature request i18n-api

Most helpful comment

This issue has been dormant for a long time. Closing.

npm install full-icu Just Works(TM) these days, by the way.

All 67 comments

@jasnell

hmmm... trying it now...

npm install icu4c-data@55l (Node 4.0.0 and small-icu 55) -> icudt55l.dat
npm WARN unmet dependency /usr/local/lib/node_modules/yeoman requires bower@'~0.3.0' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/bower,
npm WARN unmet dependency which is version 1.5.3
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/npm-registry-client/node_modules/npm-package-arg requires semver@'4' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/semver,
npm WARN unmet dependency which is version 5.0.1
[email protected] /usr/local/lib/node_modules/icu4c-data
 •  (no icudt55l.dat)
/usr/local/lib/node_modules/full-icu/install-spawn.js:51
                        throw Error('Somehow failed to install ' + icudat);
                        ^

Error: Somehow failed to install icudt55l.dat
    at Error (native)
    at npmInstallNpm (/usr/local/lib/node_modules/full-icu/install-spawn.js:51:10)
    at Object.<anonymous> (/usr/local/lib/node_modules/full-icu/postinstall.js:62:2)
    at Module._compile (module.js:434:26)
    at Object.Module._extensions..js (module.js:452:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Function.Module.runMain (module.js:475:10)
    at startup (node.js:117:18)
    at node.js:951:3

Tagging this for tsc discussion. This is quite interesting but requires hard-coding in some this we haven't done anything like before.

Feel free to remove if you don't think it qualifies for it (yet).

Having it autodiscover is good in theory but does introduce a startup performance concern. It would be good to have a PoC implementation that we can benchmark.

It's worth noting that this creates a coupling between node and npm (something that hasn't really existed before — node just bundled npm thus far), or at least a coupling between node and ${npm_prefix}/lib/node_modules (which never has never been in any search path, to my knowledge).
I'm not sure that's a thing we want to do/have.

In addition, it obscures an application's true dependencies, by violating "dependency locality".
I think that's an even more serious concern than npm coupling.

@jasnell the issue with installing as -g has been solved, https://github.com/icu-project/full-icu-npm/issues/2

This is fine to discuss at TSC. It's been rattling around as an idea for a while now.

In addition, it obscures an application's true dependencies, by violating "dependency locality".

I assume this is not just related to the auto discover but the full-icu package itself. Do you have any suggestions as to the approach? To just get the data file, the "sub-dependency" wouldn't have to be an NPM dependency- post install could just wget fetch the data from somewhere.

Anyways - the entire approach is most certainly open to discussion.

BTW, is building with full-icu by default out of question?

@silverwind this was discussed in detail during the call: https://soundcloud.com/node-foundation/tsc-meeting-2015-10-21

Decision was to move back to GH though.

@silverwind #3460 is about detecting the full data _if_ you are built in small mode, which would remain relevant unless small mode was removed. I will reply to your question over on #3476 which is about how ICU is built itself.

Rather than having node searching through node_modules, how about having the full-icu module put the data somewhere in the user folder, and have node look for it there at startup? It's a similar approach but it decouples the mechanism used by node to find the data from the tool used to install it.

@orangemocha that could work. Suggestions on what the 'somewhere' should look like? The only precedent for this that I know of is the modules dirs. Maybe something like:

  • {CWD}/icu-data/icudt56l.dat
  • {HOME or profile}/.icu-data/icudt56l.dat
  • {Node install dir}/../lib/icu-data-icudt56l.dat

I put the specific file name because node will look for that one file that it needs. Three stat calls (in this example) and then done.

Yes, I would follow a parallel with npm global modules. They are under %APPDATA%\npm on Windows. I believe the Linux equivalent is ~/.appname. I would use node as the app name assuming this data is only useful to node.

So:

  • On *nix: ~/.node/icu-data/icudt56l.dat
  • On Windows: %APPDATA%\node\icu-data\icudt56l.dat

Mac would be something like ~/Library/application support/node

assume this is only applicable to node

Oooo are we going there? I guess we are going there.. I could see if
there's interest in this upstream in ICU. Right now a path is allowed for
data search. The node specific trick is that normally a data path can't
override linked in data. So node explicitly has to decide whether it is
installing the "small Icu" data or not at startup time.

So this model could generalize as 1.) a convention and/or 2.) an ICU API
with some sort of "Boolean isFullIcuDataFoundSomewhere()" or maybe an Icu
Api to calculate the platform path.-- all without actually loading ICU.

Removing the tsc-agenda tag for now.

Resolution from today's CTC meeting:

  1. No objections to the suggested approach.
  2. Follow up with working code.
  3. Mark experimental for a major cycle or two to shake out issues.

Does that sound about right?

@bnoordhuis yes, thanks. There have been so many different opinions on this, appreciate the quick review

FWIW, from the standpoint of making deb and rpm packages, you'd want to put the .dat file in

/usr/share/node/icudt56l.dat

We could then include basically just that file in a separate package called something like nodejs-intl-full_5.1.0.XXX.deb which would depend on the actual nodejs package, and it should all "just work" like people expect it to.

:+1: for @chrislea about the way to distribute binary packages.

Also guys agrred on the fact that distributing node ATM with icu_small is an issue for non english intl support... for example we have an app supporting 3 different languages so we can never benefit from icu in node and we always use the intl polyfil.

Please tell me how I can help. I can be at least a tester.

I agree with what @orangemocha and @srl295 said in an above comment.

I also agree with @nathan7 that we shouldn't be coupling node with npm. npm is just a tool to make life easier. By creating a set of directories for node to look at for the full-icu files, we will also allow people to download those icu files directly (without npm) and place them in any predefined directories.

The full-icu module would then use the postinstall script to install the files in the most local or closest location possible (unless it detects the -g flag).

So, I would suggest:

  • Without the -g flag: {CWD}/icu-data/icudt56l.dat
  • With the -g flag, any of:

    • {Node install dir}/../lib/icu-data-icudt56l.dat

    • ~/.node/icu-data/icudt56l.dat

    • %APPDATA%\node\icu-data\icudt56l.dat


To play devil's advocate though...

I don't necessarily believe that having node try to "somehow detect the special (?) module name full-icu" in any of the node_module search paths as @srl295 suggested is a means of coupling node with npm... Node's require command already takes advantage of this. So, if anything, npm is coupled with node (obviously), not the other way around.

So, I don't really see anything bad with what Steven suggested in the first comment. require already does this.

^^ FYI. PoC of search path stuff.

You might imagine:

  • npm install -g full-icu that installs to /usr/local/kittens or
  • npm install full-icu that installs to ./kittens the appropriate files.

("kittens" here is just a stand-in for the exact right directory. )

cc: @nodejs/intl @jasnell

  • [x] record CTC comments here, writeup

removing ctc-agenda label for now

CTC comments: (loosely transcribed)

search ./node_modules/.lib/node-icu/ or perhaps ./node_modules/.node-icu
search {execPath}/../lib/node-icu

@orangemocha can you elaborate more on "this doesn't belong on node_modules"?

can you elaborate more on "this doesn't belong on node_modules"?

I was just observing that putting the ICU data in there is overloading the use of node_modules. Things in node_modules are things that you can require, have a package.json, and npm has some understanding of. It might not break anything to put ICU data in there, but I wouldn't completely exclude the possibility either.

What is the benefit of putting it in node_modules? Is it just not having to create a new folder?

One benefit is that we don't risk clobbering any existing name in cwd that someone might be using, perhaps there is a .node-icu in use out there already. node_modules is basically up for grabs since it's all generated by tooling anyway. Anything starting with a . can't be a package, .bin is already a thing, so the rules are clear.
Another is that rm -rf node_modules does a proper reset and npm i node-icu would reset it back to the way it should be, everything is nicely in sync.

@rvagg @orangemocha so… what about this search path:

./node_modules/full-icu
/usr/local/lib/node_modules/full-icu

Where full-icu is _already_ an actual, bona fide, node module…  And this search would be searching the same way the module loader would search once the v8 engine is up and running.

As @rvagg notes, rm -rf node_modules does a proper reset.

Things in node_modules are things that you can (1) require, (2) have a package.json, and (3) npm has some understanding of.

  1. full-icu can be required, it has some utilities for telling you which versino of ICU is there
  2. it has a package.json
  3. npm has knowledge of it.

@srl295 : is it conceivable that we'll want to install the icu data by other means? For example (Windows), as an optional feature in the MSI? If it's always going to be installed as an npm module, then I am fine with Node depending on it being in node_modules.

  1. ./node_modules/full-icu
  2. /usr/local/lib/node_modules/full-icu
  3. {execPath}/../lib/full-icu ( changed this to full-icu for consistency)

@orangemocha as an optional feature in the MSI (or other package) that would probably either use npm i -g which would apply to (2) or install to something relative to execPath as in (3)

there's plenty of precedent in post-install actions for just doing it in node_modules/full-icu, currently the most popular of these precedents is in downloading and unpacking prebuilt binaries for native addons and that seems like a similar use-case to what you're proposing here.
What's the process with full-icu anyway? I had a quick look and can't figure it out, is it downloading a .dat to install within that package?

@rvagg full-icu's postinstall.js:

  1. Looks at your version of node and its configuration. If you already have full data installed, or it is not installable, exit gracefully.
  2. Determines the ICU version and endianness required for data
  3. performs the correct npm install, such as npm install icu4c-data@54l (l for little endian)
  4. hardlinks the ICU data file into the full-icu directory

Net result is that node_modules/full-icu (whether local or global) contains the appropriate ICU data file.

@srl295, please bear with me as I am struggling to understand all the possible consequences of this design choice. As noted, Node having knowledge of and depending on a specific module is uncharted territory. Not to say it shouldn’t be done, but you are sort of a pioneer :-)

One of the foreseeable evolutions of such scenario is that at some point we might want to include it in the Node installer itself, hence the questions about MSI. I think that having the MSI install it as plain files is likely to become a source of problems if the user updated the module via npm. It would probably be safer to install and uninstall with npm through a custom action. /cc @joaocgreis

@orangemocha struggling with the same here also.
Looking at https://docs.npmjs.com/files/folders - both node and npm have knowledge of ./node_modules and {somewhere global}/node_modules as I understand it. (How does the module loader work? Does node call into npm when require() is used - I assume not?

how about using …/node_modules/.lib/icu instead of …/node_modules/full-icu in the paths above? This could be created by installers or npm (calling a post install script), or manually copied, etc. Deleting node_modules recursively would reset to a clean state as @rvagg noted.

@srl295 The .../node_modules/.lib/icu approach sounds intriguing. Ideally this would also (symetrically?) support {node install dir}/.lib/icu (or wherever npm install --global goes).

In my particular situation, a corporate docker/container world, I might have a read-only {node install dir} (with all appropriate software already installed) with app-specific images built on top of that. I'm just presenting this as a use case, might be a corner case though. The .lib/icu seems like a clean specification to which different kinds of tooling could be written.

(BTW, at my job we've found that npm module postinstall scripts are a major cause for concern, since they can do anything. A few download arbitrary binary executables which makes the security team sad.)

Another thought is to make require() "smarter" so that the application can just do something like require(/path/to/icudt*.dat) at startup. Hmmm... I'm not sure if the icu library is structured to allow "late" loading of data like this. Overloading require() to do this might be inappropriate/messy so perhaps a separate function should be created for this.

On the other hand, --icu-data-dir already probably provides equivalent functionality.

@drewfish - Yes, I was using "…" as shorthand for either the current directory OR the global node_modules.

I'm not sure if the icu library is structured to allow "late" loading of data like this

Not in this case. It all needs to be sorted out before v8 starts up.

@srl295 the module loader doesn't call into npm. It does search for modules ./node_modules but it's quite a bit more involved than that. See https://nodejs.org/api/modules.html#modules_all_together for a pseudo-code explanation. And https://github.com/nodejs/node/blob/de91e9a8a7cebb4543c13f60d5670beeb12b8ec9/lib/module.js#L223-L275 and https://github.com/nodejs/node/blob/de91e9a8a7cebb4543c13f60d5670beeb12b8ec9/lib/module.js#L456-L484 for the actual implementation.

Trying to summarize the requirements for ICU data location:

  1. MSI (Windows) and distribution packages (UNIX) must be able to install Node with full ICU data.
  2. If not installed with node, the Administrator/root user should be able to install for the whole system.
  3. A user should be able to add it globally for his account.
  4. A user should be able to add it to some packages but not others.
  5. A package should be able to list it as a dependency.

So, which ones do we really want? Is anything missing?

@srl295 will each version of node work with a single data file? Should this be a mechanism to update the data file without updating node, or should we be thinking of avoiding problems if there is an update somehow?

The module based approach discussed here could work, with some tweaks. It does not give us 2. on Windows and 3. on Unix. There's some discussion about globally and system installed modules in npm: https://github.com/npm/npm/issues/4534 and https://github.com/npm/npm/issues/6413 , but it's not likely to be resolved soon. The npm global prefix on Windows is different for each user.

What about searching an environment variable? The search sequence would become something like:

  1. ./node_modules/full-icu
  2. ${NODE_FULL_ICU}
  3. {execPath}/full-icu (Windows, easily installed by MSI) or /usr/local/lib/node_modules/full-icu and {execPath}/../lib/full-icu (Unix, does this make sense?)

@joaocgreis

will each version of node work with a single data file?
Yes. It can only work with a single data file.

So here is my current proposal for a change to the behavior of node when compiled with configure … --with-intl=small-icu :

If env NODE_ICU_DATA or option --icu-data-dir is present, use that for ICU. _(This is the current behavior. These parameters can take a complex search path and not just a single directory. )_

else, set the ICU data path to the first of these contain the file icudt99x.dat _(where 99x is replaced by a compiletime-calculated value)_:

  1. {cwd}/node_modules/.node-icu
  2. {home}/.node_modules/.node-icu (== userprofile)
  3. {execpath}/../../lib/node/node_modules/.node-icu (this is /usr/lib/node/node_modules/.node-icu etc.)

Here .node-icu is not a module name, (avoids requiring a reified module name) but would be installed by a module || installer || … and also node_modules/.node-icu gets cleaned up when node_modules is deleted.

Also note that this does not do a deep traversal of node_modules/…/node_modules/… directories.

We don't need to use the @nodejs/ctc call to bikeshed on the .node-icu part ,but I'd like to get a +1 on the approach.

^ work in progress that implements above search path.

@srl295 Left some comments on 75ae8a4.

@srl295 the proposed strategy looks good. Note that for Windows, the MSI will only be able to add files below {execpath}, so it is necessary to search in {execpath}/.node-icu or equivalent.

Thanks JoĂŁo. Perhaps an #ifdef for the WINDOWS case?

I thought {execpath} includes the binary, so {execPath}/.node-icu would be something like C:\Program Files\node\bin\node\.node-icu\… which wouldn’t work because “node” is a file, not a directory.

@srl295 I'm ok with #ifdef.

process.execPath includes the binary indeed. I thought it didn't, sorry for the confusion, because somefile/.. is not usually valid (it is on Windows cmd). The node executable on Windows is by default C:\Program Files\nodejs\node.exe, without the additional bin folder, so {execpath}/../../lib would be C:\Program Files\lib. Hence, please consider my suggestion as {execpath}/../.node-icu.

@joaocgreis OK. Makes sense.

(My code can't literally use process.execPath because C++, but the equivalent (which calls uv_exepath).

TODO:

  • [ ] incorporate above comments
  • [ ] test on Windows

removed ctc-agenda label again, let's try and sort it out here, when you're not getting push-back then it suggests you can simply move forward with it as proposed, if you need help getting enough attention for opinions then I'll try and help round some up for feedback.

ok, with some other issues out of the way, will start looking at this again as per above. any NPM folks like @othiym23 have thoughts on above? especially the node_modules/.node-icu concept.

Anything starting with a . can't be a package, .bin is already a thing, so the rules are clear.

The primary npm registry won't let you publish a package with a leading period in its name, and npm uses period-prefixed directories in a couple places for its own internal machinery (.bin, .staging during the npm@3 install process), but it's more of an ad hoc convention than a rule.

The fact that you can't use it as the beginning of a package name means that it's probably safe to claim .node-icu as a directory for Node's own use without worrying about conflicts with existing practice on the registry, and as far as npm is concerned, Node is free to reserve whatever directories under node_modules it likes, as node_modules is explicitly flagged (in the name, even) for Node's own use.

Perfect, thanks. So a special sub directory is better than a special module name (which was an old proposal).

S

I would like to mention that we any folder we pick can be accessed by various versions of node.js and related to that the language variant we provide will need to be fit for the node.js version running. Do we need to specify versions of the translation in the path where we download the file. If so: should the data be fully backwards compatible?

@martinheidegger - Thanks for asking. The files are versioned based on ICU version in node, so they won't collide. Example names are icudt55l.dat and icudt57b.dat for ICU 55 little-endian and ICU 57 big-endian respectively. Note that this is no different than current behavior - the various env vars and paths aren't versioned, but point to files which have version numbers in them.

Will update the full-icu module to:

  • if version ≥ v7.x : copy data to ./node_modules/.node-icu (local) or {execpath}/../../lib/node/node_modules/.node-icu (-g) and print a message that no extra config is needed (yay!)
  • if version === v7.0.0-pre - print a message "You may still need to pass in a path until #3460 lands…

I'll just ask this here also. @othiym23 / @nathan7 /@iarna - any thoughts about the full-icu npm module's post install doing something like

  • "if parent directory is node_modules, then create ../.node-icu and copy data there

This way if installed locally ( mycoolapp/node_modules/full-icu ) or globally ( /usr/local/lib/node_modules/full-icu ) it will actually create ../.node-icu, but if you git clone full-icu ~/src/full-icu && npm install it wouldn't try to create ~/src/full-icu

and, also,

  • should full-icu attempt to uninstall / remove ../.node-icu/<file>.dat when it is uninstalled? if that's even possible. @iarna mentioned (thanks!) that npm manages the files in node_modules/.bin and removes them if the original module is removed… wonder if there's a way to manage node_modules/.node-icu files similarly?

PS: there's now a full-icu-test module for testing the full-icu module.

@srl295 Should this issue remain open?

@Trott yes please…

ping @srl295 ... just checking in

This issue has been dormant for a long time. Closing.

npm install full-icu Just Works(TM) these days, by the way.

@bnoordhuis yeah— makes sense.

@bnoordhuis What do you mean with "just works"? I still need to configure the loading process either via env variable or cli flag.

@StarpTech If you scroll through the comments, you'll see people remarking they had problems installing the module. Those issues have been fixed.

If you're asking about auto-detection: unlikely to happen, or it would have happened already.

FYI, if you use full-icu please see a proposed change here: https://github.com/unicode-org/full-icu-npm/issues/36

We (Fedora/Red Hat Enterprise Linux) have the following use-case:

  • We want to support developers and multiple deployment scenarios.
  • Not all users require full internationalization support and would prefer to avoid the disk-space usage when possible. (Example: container environments like OpenShift)
  • Most deployments do want full internationalization support.
  • We don't want to build Node.js twice (once for small-icu, once for full-icu) so as to avoid shipping different /usr/bin/node binaries.
  • We don't want users to be required to explicitly set NODE_ICU_DATA or pass --icu-data-dir. We want the presence or absence of the data to be based on the presence or absence of an RPM package.

What we would like to see is a mode to build Node.js with small-icu bundled but also build the icudt6X[bl].dat from the full-icu. We would then like for there to be a well-known location under $PREFIX that will always be checked as if the NODE_ICU_DATA was set to it (if it's not manually set).

We would then ship two subpackages for Node.js:

  • nodejs-i18n-data which contains only the data files to put in the well-known location
  • nodejs which provides the interpreter and Recommends: nodejs-i18n-data. (Recommends: is an RPM dependency specification which means that the listed package will be installed by default unless --no-recommends are passed, such as in a container environment. And also that removing it later does not result in removing the dependent package.)

@sgallagher Hello. The full-icu npm module will download the icudt.dat file for you. Right now it downloads from NPM, but I want to change it to download from ICU's GitHub release directly. Node.js doesn't need to build icudt.dat file separately. Starting with Node.js 13, --with-intl=full-icu is the default. But you can still build with --with-intl=small-icu explicitly, and I recommend doing so starting now if that is your preference as a packager.

We would then like for there to be a well-known location under $PREFIX that will always be checked as if the NODE_ICU_DATA was set to it (if it's not manually set).

Yes, that was the premise of this issue when I opened it four years ago. However, it was never implemented beyond a proof of concept phase. It didn't work on all platforms, etc, etc. Someone could still implement this, but I don't plan to work on it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cong88 picture cong88  Â·  3Comments

jmichae3 picture jmichae3  Â·  3Comments

stevenvachon picture stevenvachon  Â·  3Comments

willnwhite picture willnwhite  Â·  3Comments

sandeepks1 picture sandeepks1  Â·  3Comments