Node: Cache accesses to process.env?

Created on 28 Sep 2015 · 21Comments · Source: nodejs/node

I've found in multiple projects that accessing process.env within a hot section of the code leads to major slowdown.

This really hurts in React server-side rendering (issue) and has caused them to rearrange how they access the env.

It would sense to cache already-accessed properties rather than reaching out to the actual environment. And of course, update the cache on assignment.

I would be happy to contribute a patch if anyone could point me in the right direction to get started, as I'm new to Node core dev.

process

Source

STRML

Most helpful comment

Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?

jtlapp on 1 Jun 2016

👍4

All 21 comments

See https://github.com/nodejs/node/pull/1648.

ChALkeR on 28 Sep 2015

Thanks @ChALkeR. I understand the argument for keeping it as-is, but if it is kept this way, I think it would help to document that values should be cached whenever possible as serious performance problems can result.

Alternatively, a function like process.env[freeze | unfreeze]() would be great to create this cache from inside our programs in case of misbehaving modules.

STRML on 28 Sep 2015

Is it actually possible to change the env externally once the process is running? If not, why not cache this?

Fishrock123 on 28 Sep 2015

@Fishrock123 Any native module (or a library that is used by it) could call setenv directly. Though the likelihood of this is near zero, it's still possible. And it should affect process.env.

process.env behaves as it should, and caching it on the Node.js side will break it. There is no actual problem — anything that uses it in hot code path could cache it if it's fine with that.

-1 on caching, +1 on the minor note in the docs.

While we are here, the docs for the process.env should state things a bit more clear about how does the env work, because now «changes won't be reflected outside of your process» could be misunderstood. As well as the normal env, changes affect new child processes.

ChALkeR on 28 Sep 2015

Also keep in mind that there are modules that assign to process.env.

@STRML why not clone process.env to a local object once in your application?

silverwind on 28 Sep 2015

@silverwind It's really more of a problem with child modules, like React itself (which is finally caching process.env properly new versions). And I've seen many more that do this in hot paths.

I think @ChALkeR has the right approach; I underestimated how much potential breakage there is. But documenting it properly and making it known that process.env accesses are _slow_ would be great.

Perhaps it should have been function getter/setters, which would have been more expected to have slow behavior. Too late now of course.

STRML on 28 Sep 2015

@silverwind

Also keep in mind that there are modules that assign to process.env.

If that was the only possible way to set the env from node, then it could be covered by updating the cache in the setter. But it's not.

ChALkeR on 28 Sep 2015

Doesn't seem worthwhile. Closing, please cache it yourself if need be. :)

Fishrock123 on 16 Oct 2015

Just wanted to ping this, we're having quite a hard time handling this in React, and caching NODE_ENV results in a 100% (!) speedup in benchmark code. In real-world testing, I see roughly 50-70%. It's significant.

I'm getting sick of issuing PRs to repos caching access to NODE_ENV, and there really isn't a good solution for modules that need to hide features behind debug flags and want them eliminated in bundling. If we cache the environment lookup, UglifyJS is unable to eliminate the dead code in bundling unless we use a const, which then breaks older browsers.

In reality, is it really useful to rely on external changes to the env (say, from native code)? What proportion of users really need this, and is it a valuable tradeoff compared to the performance benefits we could get ecosystem-wide?

Would the Node core contributors consider a semver-major patch? My thoughts are to make process.env a plain object of the env available at boot (still global and able to be set internally), then added either a process.refreshEnv() method or a process.liveEnv object that had the old semantics.

STRML on 2 Dec 2015

@STRML could you detail what problems you are having caching it locally?

Fishrock123 on 2 Dec 2015

@STRML Another option, make a module that does this and load it in -r/--require:

const env = process.env
delete process.env // might not be necessary
process.env = env

Fishrock123 on 2 Dec 2015

I've been doing:

process.env = JSON.parse(JSON.stringify(process.env));

Which seems to work fine.

Okay - so the issue with a library like React caching locally is that there are three main usage paths and two goals.

The goals:

Eliminate the runtime penalty from accessing process.env in Node environments.
Successfully optimize out dev/debug flags so entire branches can be removed during a minification step.

As a library author, you usually deliver pre-bundled source (dist.js, dist.min.js), built source (say, with babel, in a lib/ folder), and sometimes pre-built source (ES2015 etc).

There are three main usage paths:

Path 1: Pre-bundled source in the browser (or even via require('mylibrary/dist/lib.min.js')); this is the easiest, we can easily stub process.env.NODE_ENV or __DEV__ flags with constants in our own build steps. Goal 1 and 2 are easily attained.
Path 2. Inside NodeJS. We need to have a build step from src/ to lib/ that's replacing __DEV__ flags or calls to process.env.NODE_ENV to cache them so process.env.NODE_ENV is not accessed in hot functions. That's what this PR does. You can do the same by simply hoisting yourself in smaller projects. Goal 1 can be attained with some fiddling, goal 2 is irrelevant.
Path 3. Through browserify/webpack by bundling source. This presents a problem in combination with Path 2 because we've just hoisted our env accesses, so the minifier needs to be smart enough to still be able to do dead code elimination. UglifyJS is smart enough to do this, but only if you use a const declaration, which is not portable to < IE11, so it's not viable in most situations. It _will not_ be able to figure out:

// This would work if it were a const
var NODE_ENV = process.env.NODE_ENV;
//...
if(NODE_ENV !== 'production') { // this will not be eliminated
  // debug code
}

Closure compiler has a /** @const */ comment which works. I want to port this to UglifyJS which would solve the problem.

As a result, if we solve Path 2 (NodeJS), we produce larger bundles for Path 3 (Webpack/Browserify) with UglifyJS. In the case of React, the bundle can be as much as 40% larger gzipped, so it's very significant (38 vs 53kb).

If process.env didn't have this performance penalty, then we could just go on writing if (process.env.NODE_ENV !== 'production'), as most library authors already do, envify and Uglify can handle this, and there'd be no problem.

STRML on 2 Dec 2015

👍4

Is there a real reason to use env for this? You can use global.__REACT_ENV the same way

vkurchatkin on 2 Dec 2015

It's possible. It appears to be the attitude of the devs that they want to avoid polluting globals whenever possible.

Despite that, this is a more widespread problem that doesn't just affect React.

STRML on 2 Dec 2015

The performance penalty is still not mentioned in the docs. On the contrary, because changes don't affect the shell, the docs seem to imply that env is just like any other JS object.

jtlapp on 1 Jun 2016

because changes don't affect the shell

They do, child processes created with the child_process module inherit the modified environment unless told otherwise.

Are you thinking of seeing changes to process.env reflected in the parent shell? That's not how setenv(3) and friends work. No program works like that.

bnoordhuis on 1 Jun 2016

jtlapp on 1 Jun 2016

👍4

We accept pull requests, you're welcome to have a stab at improving the documentation.

Having said that, any JS object can be "magic" and really be a bunch of C++ code; process.env is not special in that respect.

bnoordhuis on 1 Jun 2016

I'd have to investigate the node.js code to have confidence making a statement. Several people in this discussion are already claiming to know how it works. I'm hoping that someone who wants to be helpful wouldn't mind taking a minute to update the repo they already have checked out.

jtlapp on 1 Jun 2016

It looks like Node is caching process.env nowadays:

⌚ 15:46:00
$ export FOO=one

⌚ 15:46:07
$ export BAR=two

⌚ 15:46:09
$ export BAZ=three

⌚ 15:46:12
$ node
> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.849ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.224ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.076ms
undefined

> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.085ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.077ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.085ms
undefined

$ node --version
v10.6.0

JDiPierro on 20 Aug 2018

👍1

I'd like to warn you that previous comment is not the case. Node.js does not cache env vars, so each time you use process.env.VAR you actually call getnev(). This might be a performance issue if you have lots of env vars (this is especially important in case you run your node.js app in k8s since k8s creates env vars for all endpoints in a namespace, see https://github.com/kubernetes/kubernetes/issues/60099 for details)

Proof (running node.js which prints some env var in an endless loop, attaching to the node.js process via gdb and changing this var):

root@it100msk:~# nodejs -v
v10.11.0
root@it100msk:~# cat test.js
while (true) {
    console.log(process.env.var1);
    Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, 3000);
}
root@it100msk:~# export var1=1
root@it100msk:~# nodejs test.js
1
1
1
1
1

another console on the same host
root@it100msk:~# ps auxf | grep node
root     20226  0.0  0.0  15648  1032 pts/2    S+   02:16   0:00  |       \_ grep --color=auto node
root     20200  0.6  0.1 557612 30580 pts/1    Sl+  02:16   0:00          \_ nodejs test.js
root@it100msk:~# gdb
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 20200
Attaching to process 20200
[New LWP 20201]
[New LWP 20202]
[New LWP 20203]
[New LWP 20204]
[New LWP 20205]
[New LWP 20206]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8cc38b4ed9 in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7ffe46e9cd00, expected=0, futex_word=0x4498250) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
142 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) call putenv("var1=2")
$1 = 0
(gdb) detach
Detaching from program: /usr/bin/node, process 20200
(gdb) quit

back to the first console
2
2
2
2
2