I've found in multiple projects that accessing process.env within a hot section of the code leads to major slowdown.
This really hurts in React server-side rendering (issue) and has caused them to rearrange how they access the env.
It would sense to cache already-accessed properties rather than reaching out to the actual environment. And of course, update the cache on assignment.
I would be happy to contribute a patch if anyone could point me in the right direction to get started, as I'm new to Node core dev.
Thanks @ChALkeR. I understand the argument for keeping it as-is, but if it is kept this way, I think it would help to document that values should be cached whenever possible as serious performance problems can result.
Alternatively, a function like process.env[freeze | unfreeze]() would be great to create this cache from inside our programs in case of misbehaving modules.
Is it actually possible to change the env externally once the process is running? If not, why not cache this?
@Fishrock123 Any native module (or a library that is used by it) could call setenv directly. Though the likelihood of this is near zero, it's still possible. And it should affect process.env.
process.env behaves as it should, and caching it on the Node.js side will break it. There is no actual problem — anything that uses it in hot code path could cache it if it's fine with that.
-1 on caching, +1 on the minor note in the docs.
While we are here, the docs for the process.env should state things a bit more clear about how does the env work, because now «changes won't be reflected outside of your process» could be misunderstood. As well as the normal env, changes affect new child processes.
Also keep in mind that there are modules that assign to process.env.
@STRML why not clone process.env to a local object once in your application?
@silverwind It's really more of a problem with child modules, like React itself (which is finally caching process.env properly new versions). And I've seen many more that do this in hot paths.
I think @ChALkeR has the right approach; I underestimated how much potential breakage there is. But documenting it properly and making it known that process.env accesses are _slow_ would be great.
Perhaps it should have been function getter/setters, which would have been more expected to have slow behavior. Too late now of course.
@silverwind
Also keep in mind that there are modules that assign to process.env.
If that was the only possible way to set the env from node, then it could be covered by updating the cache in the setter. But it's not.
Doesn't seem worthwhile. Closing, please cache it yourself if need be. :)
Just wanted to ping this, we're having quite a hard time handling this in React, and caching NODE_ENV results in a 100% (!) speedup in benchmark code. In real-world testing, I see roughly 50-70%. It's significant.
I'm getting sick of issuing PRs to repos caching access to NODE_ENV, and there really isn't a good solution for modules that need to hide features behind debug flags and want them eliminated in bundling. If we cache the environment lookup, UglifyJS is unable to eliminate the dead code in bundling unless we use a const, which then breaks older browsers.
In reality, is it really useful to rely on external changes to the env (say, from native code)? What proportion of users really need this, and is it a valuable tradeoff compared to the performance benefits we could get ecosystem-wide?
Would the Node core contributors consider a semver-major patch? My thoughts are to make process.env a plain object of the env available at boot (still global and able to be set internally), then added either a process.refreshEnv() method or a process.liveEnv object that had the old semantics.
@STRML could you detail what problems you are having caching it locally?
@STRML Another option, make a module that does this and load it in -r/--require:
const env = process.env
delete process.env // might not be necessary
process.env = env
I've been doing:
process.env = JSON.parse(JSON.stringify(process.env));
Which seems to work fine.
Okay - so the issue with a library like React caching locally is that there are three main usage paths and two goals.
The goals:
process.env in Node environments.As a library author, you usually deliver pre-bundled source (dist.js, dist.min.js), built source (say, with babel, in a lib/ folder), and sometimes pre-built source (ES2015 etc).
There are three main usage paths:
require('mylibrary/dist/lib.min.js')); this is the easiest, we can easily stub process.env.NODE_ENV or __DEV__ flags with constants in our own build steps. Goal 1 and 2 are easily attained.src/ to lib/ that's replacing __DEV__ flags or calls to process.env.NODE_ENV to cache them so process.env.NODE_ENV is not accessed in hot functions. That's what this PR does. You can do the same by simply hoisting yourself in smaller projects. Goal 1 can be attained with some fiddling, goal 2 is irrelevant.const declaration, which is not portable to < IE11, so it's not viable in most situations. It _will not_ be able to figure out: // This would work if it were a const
var NODE_ENV = process.env.NODE_ENV;
//...
if(NODE_ENV !== 'production') { // this will not be eliminated
// debug code
}
Closure compiler has a /** @const */ comment which works. I want to port this to UglifyJS which would solve the problem.
As a result, if we solve Path 2 (NodeJS), we produce larger bundles for Path 3 (Webpack/Browserify) with UglifyJS. In the case of React, the bundle can be as much as 40% larger gzipped, so it's very significant (38 vs 53kb).
If process.env didn't have this performance penalty, then we could just go on writing if (process.env.NODE_ENV !== 'production'), as most library authors already do, envify and Uglify can handle this, and there'd be no problem.
Is there a real reason to use env for this? You can use global.__REACT_ENV the same way
It's possible. It appears to be the attitude of the devs that they want to avoid polluting globals whenever possible.
Despite that, this is a more widespread problem that doesn't just affect React.
The performance penalty is still not mentioned in the docs. On the contrary, because changes don't affect the shell, the docs seem to imply that env is just like any other JS object.
because changes don't affect the shell
They do, child processes created with the child_process module inherit the modified environment unless told otherwise.
Are you thinking of seeing changes to process.env reflected in the parent shell? That's not how setenv(3) and friends work. No program works like that.
Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?
We accept pull requests, you're welcome to have a stab at improving the documentation.
Having said that, any JS object can be "magic" and really be a bunch of C++ code; process.env is not special in that respect.
I'd have to investigate the node.js code to have confidence making a statement. Several people in this discussion are already claiming to know how it works. I'm hoping that someone who wants to be helpful wouldn't mind taking a minute to update the repo they already have checked out.
It looks like Node is caching process.env nowadays:
⌚ 15:46:00
$ export FOO=one
⌚ 15:46:07
$ export BAR=two
⌚ 15:46:09
$ export BAZ=three
⌚ 15:46:12
$ node
> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.849ms
undefined
> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.224ms
undefined
> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.076ms
undefined
> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.085ms
undefined
> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.077ms
undefined
> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.085ms
undefined
$ node --version
v10.6.0
I'd like to warn you that previous comment is not the case. Node.js does not cache env vars, so each time you use process.env.VAR you actually call getnev(). This might be a performance issue if you have lots of env vars (this is especially important in case you run your node.js app in k8s since k8s creates env vars for all endpoints in a namespace, see https://github.com/kubernetes/kubernetes/issues/60099 for details)
Proof (running node.js which prints some env var in an endless loop, attaching to the node.js process via gdb and changing this var):
root@it100msk:~# nodejs -v
v10.11.0
root@it100msk:~# cat test.js
while (true) {
console.log(process.env.var1);
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, 3000);
}
root@it100msk:~# export var1=1
root@it100msk:~# nodejs test.js
1
1
1
1
1
another console on the same host
root@it100msk:~# ps auxf | grep node
root 20226 0.0 0.0 15648 1032 pts/2 S+ 02:16 0:00 | \_ grep --color=auto node
root 20200 0.6 0.1 557612 30580 pts/1 Sl+ 02:16 0:00 \_ nodejs test.js
root@it100msk:~# gdb
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 20200
Attaching to process 20200
[New LWP 20201]
[New LWP 20202]
[New LWP 20203]
[New LWP 20204]
[New LWP 20205]
[New LWP 20206]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8cc38b4ed9 in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7ffe46e9cd00, expected=0, futex_word=0x4498250) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
142 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) call putenv("var1=2")
$1 = 0
(gdb) detach
Detaching from program: /usr/bin/node, process 20200
(gdb) quit
back to the first console
2
2
2
2
2
Most helpful comment
Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?