Followup to https://github.com/nodejs/node-v0.x-archive/issues/8996
postinstall time make sure node_modules/full-icu contains the appropriate icudt*.dat file needed for full ICU support.full-icu in any of the global search paths or local module search paths. If found && the correct icudt file is present, _automatically set Node's ICU path to point to that icudt_.dat file*This will make getting full data as easy as npm install [-g] full-icu
@jasnell
hmmm... trying it now...
npm install icu4c-data@55l (Node 4.0.0 and small-icu 55) -> icudt55l.dat
npm WARN unmet dependency /usr/local/lib/node_modules/yeoman requires bower@'~0.3.0' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/bower,
npm WARN unmet dependency which is version 1.5.3
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/npm-registry-client/node_modules/npm-package-arg requires semver@'4' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/semver,
npm WARN unmet dependency which is version 5.0.1
[email protected] /usr/local/lib/node_modules/icu4c-data
• (no icudt55l.dat)
/usr/local/lib/node_modules/full-icu/install-spawn.js:51
throw Error('Somehow failed to install ' + icudat);
^
Error: Somehow failed to install icudt55l.dat
at Error (native)
at npmInstallNpm (/usr/local/lib/node_modules/full-icu/install-spawn.js:51:10)
at Object.<anonymous> (/usr/local/lib/node_modules/full-icu/postinstall.js:62:2)
at Module._compile (module.js:434:26)
at Object.Module._extensions..js (module.js:452:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Function.Module.runMain (module.js:475:10)
at startup (node.js:117:18)
at node.js:951:3
Tagging this for tsc discussion. This is quite interesting but requires hard-coding in some this we haven't done anything like before.
Feel free to remove if you don't think it qualifies for it (yet).
Having it autodiscover is good in theory but does introduce a startup performance concern. It would be good to have a PoC implementation that we can benchmark.
It's worth noting that this creates a coupling between node and npm (something that hasn't really existed before — node just bundled npm thus far), or at least a coupling between node and ${npm_prefix}/lib/node_modules (which never has never been in any search path, to my knowledge).
I'm not sure that's a thing we want to do/have.
In addition, it obscures an application's true dependencies, by violating "dependency locality".
I think that's an even more serious concern than npm coupling.
@jasnell the issue with installing as -g has been solved, https://github.com/icu-project/full-icu-npm/issues/2
This is fine to discuss at TSC. It's been rattling around as an idea for a while now.
In addition, it obscures an application's true dependencies, by violating "dependency locality".
I assume this is not just related to the auto discover but the full-icu package itself. Do you have any suggestions as to the approach? To just get the data file, the "sub-dependency" wouldn't have to be an NPM dependency- post install could just wget fetch the data from somewhere.
Anyways - the entire approach is most certainly open to discussion.
BTW, is building with full-icu by default out of question?
@silverwind this was discussed in detail during the call: https://soundcloud.com/node-foundation/tsc-meeting-2015-10-21
Decision was to move back to GH though.
@silverwind #3460 is about detecting the full data _if_ you are built in small mode, which would remain relevant unless small mode was removed. I will reply to your question over on #3476 which is about how ICU is built itself.
Rather than having node searching through node_modules, how about having the full-icu module put the data somewhere in the user folder, and have node look for it there at startup? It's a similar approach but it decouples the mechanism used by node to find the data from the tool used to install it.
@orangemocha that could work. Suggestions on what the 'somewhere' should look like? The only precedent for this that I know of is the modules dirs. Maybe something like:
/icu-data/icudt56l.dat/.icu-data/icudt56l.dat/../lib/icu-data-icudt56l.datI put the specific file name because node will look for that one file that it needs. Three stat calls (in this example) and then done.
Yes, I would follow a parallel with npm global modules. They are under %APPDATA%\npm on Windows. I believe the Linux equivalent is ~/.appname. I would use node as the app name assuming this data is only useful to node.
So:
~/.node/icu-data/icudt56l.dat%APPDATA%\node\icu-data\icudt56l.datMac would be something like ~/Library/application support/node
assume this is only applicable to node
Oooo are we going there? I guess we are going there.. I could see if
there's interest in this upstream in ICU. Right now a path is allowed for
data search. The node specific trick is that normally a data path can't
override linked in data. So node explicitly has to decide whether it is
installing the "small Icu" data or not at startup time.
So this model could generalize as 1.) a convention and/or 2.) an ICU API
with some sort of "Boolean isFullIcuDataFoundSomewhere()" or maybe an Icu
Api to calculate the platform path.-- all without actually loading ICU.
Removing the tsc-agenda tag for now.
Resolution from today's CTC meeting:
Does that sound about right?
@bnoordhuis yes, thanks. There have been so many different opinions on this, appreciate the quick review
FWIW, from the standpoint of making deb and rpm packages, you'd want to put the .dat file in
/usr/share/node/icudt56l.dat
We could then include basically just that file in a separate package called something like nodejs-intl-full_5.1.0.XXX.deb which would depend on the actual nodejs package, and it should all "just work" like people expect it to.
:+1: for @chrislea about the way to distribute binary packages.
Also guys agrred on the fact that distributing node ATM with icu_small is an issue for non english intl support... for example we have an app supporting 3 different languages so we can never benefit from icu in node and we always use the intl polyfil.
Please tell me how I can help. I can be at least a tester.
I agree with what @orangemocha and @srl295 said in an above comment.
I also agree with @nathan7 that we shouldn't be coupling node with npm. npm is just a tool to make life easier. By creating a set of directories for node to look at for the full-icu files, we will also allow people to download those icu files directly (without npm) and place them in any predefined directories.
The full-icu module would then use the postinstall script to install the files in the most local or closest location possible (unless it detects the -g flag).
So, I would suggest:
-g flag: {CWD}/icu-data/icudt56l.dat-g flag, any of:{Node install dir}/../lib/icu-data-icudt56l.dat~/.node/icu-data/icudt56l.dat%APPDATA%\node\icu-data\icudt56l.datTo play devil's advocate though...
I don't necessarily believe that having node try to "somehow detect the special (?) module name full-icu" in any of the node_module search paths as @srl295 suggested is a means of coupling node with npm... Node's require command already takes advantage of this. So, if anything, npm is coupled with node (obviously), not the other way around.
So, I don't really see anything bad with what Steven suggested in the first comment. require already does this.
^^ FYI. PoC of search path stuff.
You might imagine:
npm install -g full-icu that installs to /usr/local/kittens or npm install full-icu that installs to ./kittens the appropriate files.("kittens" here is just a stand-in for the exact right directory. )
cc: @nodejs/intl @jasnell
removing ctc-agenda label for now
CTC comments: (loosely transcribed)
search ./node_modules/.lib/node-icu/ or perhaps ./node_modules/.node-icu
search {execPath}/../lib/node-icu
@orangemocha can you elaborate more on "this doesn't belong on node_modules"?
can you elaborate more on "this doesn't belong on node_modules"?
I was just observing that putting the ICU data in there is overloading the use of node_modules. Things in node_modules are things that you can require, have a package.json, and npm has some understanding of. It might not break anything to put ICU data in there, but I wouldn't completely exclude the possibility either.
What is the benefit of putting it in node_modules? Is it just not having to create a new folder?
One benefit is that we don't risk clobbering any existing name in cwd that someone might be using, perhaps there is a .node-icu in use out there already. node_modules is basically up for grabs since it's all generated by tooling anyway. Anything starting with a . can't be a package, .bin is already a thing, so the rules are clear.
Another is that rm -rf node_modules does a proper reset and npm i node-icu would reset it back to the way it should be, everything is nicely in sync.
@rvagg @orangemocha so… what about this search path:
./node_modules/full-icu
/usr/local/lib/node_modules/full-icu
Where full-icu is _already_ an actual, bona fide, node module… And this search would be searching the same way the module loader would search once the v8 engine is up and running.
As @rvagg notes, rm -rf node_modules does a proper reset.
Things in node_modules are things that you can (1) require, (2) have a package.json, and (3) npm has some understanding of.
full-icucan berequired, it has some utilities for telling you which versino of ICU is there- it has a
package.jsonnpmhas knowledge of it.
@srl295 : is it conceivable that we'll want to install the icu data by other means? For example (Windows), as an optional feature in the MSI? If it's always going to be installed as an npm module, then I am fine with Node depending on it being in node_modules.
./node_modules/full-icu/usr/local/lib/node_modules/full-icu{execPath}/../lib/full-icu ( changed this to full-icu for consistency)@orangemocha as an optional feature in the MSI (or other package) that would probably either use npm i -g which would apply to (2) or install to something relative to execPath as in (3)
there's plenty of precedent in post-install actions for just doing it in node_modules/full-icu, currently the most popular of these precedents is in downloading and unpacking prebuilt binaries for native addons and that seems like a similar use-case to what you're proposing here.
What's the process with full-icu anyway? I had a quick look and can't figure it out, is it downloading a .dat to install within that package?
@rvagg full-icu's postinstall.js:
npm install icu4c-data@54l (l for little endian)full-icu directoryNet result is that node_modules/full-icu (whether local or global) contains the appropriate ICU data file.
@srl295, please bear with me as I am struggling to understand all the possible consequences of this design choice. As noted, Node having knowledge of and depending on a specific module is uncharted territory. Not to say it shouldn’t be done, but you are sort of a pioneer :-)
One of the foreseeable evolutions of such scenario is that at some point we might want to include it in the Node installer itself, hence the questions about MSI. I think that having the MSI install it as plain files is likely to become a source of problems if the user updated the module via npm. It would probably be safer to install and uninstall with npm through a custom action. /cc @joaocgreis
@orangemocha struggling with the same here also.
Looking at https://docs.npmjs.com/files/folders - both node and npm have knowledge of ./node_modules and {somewhere global}/node_modules as I understand it. (How does the module loader work? Does node call into npm when require() is used - I assume not?
how about using …/node_modules/.lib/icu instead of …/node_modules/full-icu in the paths above? This could be created by installers or npm (calling a post install script), or manually copied, etc. Deleting node_modules recursively would reset to a clean state as @rvagg noted.
@srl295 The .../node_modules/.lib/icu approach sounds intriguing. Ideally this would also (symetrically?) support {node install dir}/.lib/icu (or wherever npm install --global goes).
In my particular situation, a corporate docker/container world, I might have a read-only {node install dir} (with all appropriate software already installed) with app-specific images built on top of that. I'm just presenting this as a use case, might be a corner case though. The .lib/icu seems like a clean specification to which different kinds of tooling could be written.
(BTW, at my job we've found that npm module postinstall scripts are a major cause for concern, since they can do anything. A few download arbitrary binary executables which makes the security team sad.)
Another thought is to make require() "smarter" so that the application can just do something like require(/path/to/icudt*.dat) at startup. Hmmm... I'm not sure if the icu library is structured to allow "late" loading of data like this. Overloading require() to do this might be inappropriate/messy so perhaps a separate function should be created for this.
On the other hand, --icu-data-dir already probably provides equivalent functionality.
@drewfish - Yes, I was using "…" as shorthand for either the current directory OR the global node_modules.
I'm not sure if the icu library is structured to allow "late" loading of data like this
Not in this case. It all needs to be sorted out before v8 starts up.
@srl295 the module loader doesn't call into npm. It does search for modules ./node_modules but it's quite a bit more involved than that. See https://nodejs.org/api/modules.html#modules_all_together for a pseudo-code explanation. And https://github.com/nodejs/node/blob/de91e9a8a7cebb4543c13f60d5670beeb12b8ec9/lib/module.js#L223-L275 and https://github.com/nodejs/node/blob/de91e9a8a7cebb4543c13f60d5670beeb12b8ec9/lib/module.js#L456-L484 for the actual implementation.
Trying to summarize the requirements for ICU data location:
So, which ones do we really want? Is anything missing?
@srl295 will each version of node work with a single data file? Should this be a mechanism to update the data file without updating node, or should we be thinking of avoiding problems if there is an update somehow?
The module based approach discussed here could work, with some tweaks. It does not give us 2. on Windows and 3. on Unix. There's some discussion about globally and system installed modules in npm: https://github.com/npm/npm/issues/4534 and https://github.com/npm/npm/issues/6413 , but it's not likely to be resolved soon. The npm global prefix on Windows is different for each user.
What about searching an environment variable? The search sequence would become something like:
./node_modules/full-icu${NODE_FULL_ICU}{execPath}/full-icu (Windows, easily installed by MSI) or /usr/local/lib/node_modules/full-icu and {execPath}/../lib/full-icu (Unix, does this make sense?)@joaocgreis
will each version of node work with a single data file?
Yes. It can only work with a single data file.
So here is my current proposal for a change to the behavior of node when compiled with configure … --with-intl=small-icu :
If env NODE_ICU_DATA or option --icu-data-dir is present, use that for ICU. _(This is the current behavior. These parameters can take a complex search path and not just a single directory. )_
else, set the ICU data path to the first of these contain the file icudt99x.dat _(where 99x is replaced by a compiletime-calculated value)_:
{cwd}/node_modules/.node-icu{home}/.node_modules/.node-icu (== userprofile){execpath}/../../lib/node/node_modules/.node-icu (this is /usr/lib/node/node_modules/.node-icu etc.)Here .node-icu is not a module name, (avoids requiring a reified module name) but would be installed by a module || installer || … and also node_modules/.node-icu gets cleaned up when node_modules is deleted.
Also note that this does not do a deep traversal of node_modules/…/node_modules/… directories.
We don't need to use the @nodejs/ctc call to bikeshed on the .node-icu part ,but I'd like to get a +1 on the approach.
^ work in progress that implements above search path.
@srl295 Left some comments on 75ae8a4.
@srl295 the proposed strategy looks good. Note that for Windows, the MSI will only be able to add files below {execpath}, so it is necessary to search in {execpath}/.node-icu or equivalent.
Thanks JoĂŁo. Perhaps an #ifdef for the WINDOWS case?
I thought {execpath} includes the binary, so {execPath}/.node-icu would be something like C:\Program Files\node\bin\node\.node-icu\… which wouldn’t work because “node” is a file, not a directory.
@srl295 I'm ok with #ifdef.
process.execPath includes the binary indeed. I thought it didn't, sorry for the confusion, because somefile/.. is not usually valid (it is on Windows cmd). The node executable on Windows is by default C:\Program Files\nodejs\node.exe, without the additional bin folder, so {execpath}/../../lib would be C:\Program Files\lib. Hence, please consider my suggestion as {execpath}/../.node-icu.
@joaocgreis OK. Makes sense.
(My code can't literally use process.execPath because C++, but the equivalent (which calls uv_exepath).
TODO:
removed ctc-agenda label again, let's try and sort it out here, when you're not getting push-back then it suggests you can simply move forward with it as proposed, if you need help getting enough attention for opinions then I'll try and help round some up for feedback.
ok, with some other issues out of the way, will start looking at this again as per above. any NPM folks like @othiym23 have thoughts on above? especially the node_modules/.node-icu concept.
Anything starting with a . can't be a package, .bin is already a thing, so the rules are clear.
The primary npm registry won't let you publish a package with a leading period in its name, and npm uses period-prefixed directories in a couple places for its own internal machinery (.bin, .staging during the npm@3 install process), but it's more of an ad hoc convention than a rule.
The fact that you can't use it as the beginning of a package name means that it's probably safe to claim .node-icu as a directory for Node's own use without worrying about conflicts with existing practice on the registry, and as far as npm is concerned, Node is free to reserve whatever directories under node_modules it likes, as node_modules is explicitly flagged (in the name, even) for Node's own use.
Perfect, thanks. So a special sub directory is better than a special module name (which was an old proposal).
S
I would like to mention that we any folder we pick can be accessed by various versions of node.js and related to that the language variant we provide will need to be fit for the node.js version running. Do we need to specify versions of the translation in the path where we download the file. If so: should the data be fully backwards compatible?
@martinheidegger - Thanks for asking. The files are versioned based on ICU version in node, so they won't collide. Example names are icudt55l.dat and icudt57b.dat for ICU 55 little-endian and ICU 57 big-endian respectively. Note that this is no different than current behavior - the various env vars and paths aren't versioned, but point to files which have version numbers in them.
Will update the full-icu module to:
./node_modules/.node-icu (local) or {execpath}/../../lib/node/node_modules/.node-icu (-g) and print a message that no extra config is needed (yay!)v7.0.0-pre - print a message "You may still need to pass in a path until #3460 lands…I'll just ask this here also. @othiym23 / @nathan7 /@iarna - any thoughts about the full-icu npm module's post install doing something like
node_modules, then create ../.node-icu and copy data thereThis way if installed locally ( mycoolapp/node_modules/full-icu ) or globally ( /usr/local/lib/node_modules/full-icu ) it will actually create ../.node-icu, but if you git clone full-icu ~/src/full-icu && npm install it wouldn't try to create ~/src/full-icu
and, also,
full-icu attempt to uninstall / remove ../.node-icu/<file>.dat when it is uninstalled? if that's even possible. @iarna mentioned (thanks!) that npm manages the files in node_modules/.bin and removes them if the original module is removed… wonder if there's a way to manage node_modules/.node-icu files similarly?PS: there's now a full-icu-test module for testing the full-icu module.
@srl295 Should this issue remain open?
@Trott yes please…
ping @srl295 ... just checking in
This issue has been dormant for a long time. Closing.
npm install full-icu Just Works(TM) these days, by the way.
@bnoordhuis yeah— makes sense.
@bnoordhuis What do you mean with "just works"? I still need to configure the loading process either via env variable or cli flag.
@StarpTech If you scroll through the comments, you'll see people remarking they had problems installing the module. Those issues have been fixed.
If you're asking about auto-detection: unlikely to happen, or it would have happened already.
FYI, if you use full-icu please see a proposed change here: https://github.com/unicode-org/full-icu-npm/issues/36
We (Fedora/Red Hat Enterprise Linux) have the following use-case:
--icu-data-dir. We want the presence or absence of the data to be based on the presence or absence of an RPM package.What we would like to see is a mode to build Node.js with small-icu bundled but also build the icudt6X[bl].dat from the full-icu. We would then like for there to be a well-known location under $PREFIX that will always be checked as if the NODE_ICU_DATA was set to it (if it's not manually set).
We would then ship two subpackages for Node.js:
nodejs-i18n-data which contains only the data files to put in the well-known locationnodejs which provides the interpreter and Recommends: nodejs-i18n-data. (Recommends: is an RPM dependency specification which means that the listed package will be installed by default unless --no-recommends are passed, such as in a container environment. And also that removing it later does not result in removing the dependent package.)@sgallagher Hello. The full-icu npm module will download the icudt.dat file for you. Right now it downloads from NPM, but I want to change it to download from ICU's GitHub release directly. Node.js doesn't need to build icudt.dat file separately. Starting with Node.js 13, --with-intl=full-icu is the default. But you can still build with --with-intl=small-icu explicitly, and I recommend doing so starting now if that is your preference as a packager.
We would then like for there to be a well-known location under $PREFIX that will always be checked as if the NODE_ICU_DATA was set to it (if it's not manually set).
Yes, that was the premise of this issue when I opened it four years ago. However, it was never implemented beyond a proof of concept phase. It didn't work on all platforms, etc, etc. Someone could still implement this, but I don't plan to work on it.
Most helpful comment
This issue has been dormant for a long time. Closing.
npm install full-icuJust Works(TM) these days, by the way.