Node.js doesn't seem to have a way to read environment variables that aren't valid unicode. For both keys and values. e.g.
$ env -i $'F\xa5B=BAR' node -e 'console.log(process.env)'
{ 'F�B': undefined }
$ env -i $'FOO=B\xa5R' node -e 'console.log(process.env.FOO.codePointAt(1))'
65533
Issue seems to stem from assumption that env vars are utf8. e.g. In the getter and during enumeration
UTF-8 is assumed in a lot of other places, environment variables aren't the exception.
The behavior of process.env can't change because of backwards compatibility (i.e., we can't change the encoding to latin-1) and this seems too minor to add a new API for.
Do you have a real-world example where this is an issue and that can't be solved another way?
UTF-8 is assumed in a lot of other places, environment variables aren't the exception.
Most places in e.g. fs also accept a Buffer/Uint8Array.
The behavior of process.env can't change because of backwards compatibility
I think the current behaviour is broken in a way no one expects. One solution might be that unrepresentable keys/values are returned as a Buffer instead of a string with unicode replacement characters?
Do you have a real-world example where this is an issue and that can't be solved another way?
This was found while working on fengari, a port of lua (including lua standard libraries) to javascript. A test found that some env vars were being read in differently in fengari to normal lua. The code path goes past here: https://github.com/fengari-lua/fengari/blob/0a8bc0b30644c514d822fb32a184318f7353a285/src/loslib.js#L180
Different use case:
Passing encryption keys (e.g. a private EC key) into a container via env var. They'd work fine 80% of the time, but once the secret (binary) key contains the wrong sequence, then you get hard to debug failures.
I think the current behaviour is broken in a way no one expects.
I don’t think this is true; It’s a single data point, but I have never seen environment variables that aren’t conceptually character strings.
One solution might be that unrepresentable keys/values are returned as a
Bufferinstead of a string with unicode replacement characters?
Sometimes returning a string and sometimes returning a Buffer from the getter is a non-starter, I think.
Passing encryption keys (e.g. a private EC key) into a container via env var. They'd work fine 80% of the time, but once the secret (binary) key contains the wrong sequence, then you get hard to debug failures.
80 % of the time sounds like a very optimistic estimate – A nice property of UTF-8 is that invalid UTF-8 will usually be recognizable as such. Even if you read as little as 8 bytes that are ~ uniformly random (like you’d expect from a private key) you get a 99 % chance of the result not being valid UTF-8.
Also, this is a great use case for base64.
I’m not -1 on Node supporting this, but if it does, that should happen through a separate API.
I guess we could add API for that, something like process.env.get(key[, encoding]), where key can be a Buffer and encoding corresponds to encoding of the result, which is a Buffer by default.
process.env.get is not an option, IMO - that would break existing code that tries to read a property called 'get'.
It also makes the environment accessors slow and awkward since they have to check for these magic properties every time.
What might be acceptable is os.getenv() and os.setenv() methods but then again, it seems too minor to merit new APIs.
The binary data example is flawed and will never work; envvars are C strings, nul bytes terminate them.
What might be acceptable is os.getenv() and os.setenv() methods
Also need to be able to iterate env vars.
What about just pushing environ as a Buffer? And it's up to users to parse it themselves?
I'd be okay with a set of functions that work with Buffers.
This currently also leads to an issue with roundtripping right now, which is bad:
$ env -i $'FOR=abc' node -p 'process.env[Object.keys(process.env)[0]]'
abc
$ env -i $'F\xa5R=abc' node -p 'process.env[Object.keys(process.env)[0]]'
undefined
What about just pushing environ as a Buffer? And it's up to users to parse it themselves?
I've toyed with that idea. Constructing an ArrayBuffer mapping environ and moving parsing to JS land is definitely an option, that would let us remove about 150 lines of finicky C++ code.
Some special-casing is needed for Windows because its environment is UCS-2 but otherwise it's pretty straightforward.
I've toyed with that idea. Constructing an ArrayBuffer mapping environ and moving parsing to JS land is definitely an option, that would let us remove about 150 lines of finicky C++ code.
Is anyone working on this? Should I start work on it? (I'll have to set up a node dev environment...)
No one's working on it, but on second thought, you also need a way to sync changes from the ArrayBuffer back to the global environ.
It's probably easier to keep using getenv() / GetEnvironmentVariableW() / etc. but expose (for example) a process.envb that's a class extends Map that only accepts Buffer and Uint8Array instances as keys and values and internally calls out to the platform-native APIs.
class extends Map that only accepts Buffer and Uint8Array instances as keys and values and internally calls out to the platform-native APIs.
But Maps consider different Uint8Array/Buffer keys to be different keys.
I think this is a false-equivalence that is assured to confuse....
That's a fair point.
I've put this into https://github.com/nodejs/node/projects/13 backlog
Put into https://github.com/nodejs/node/projects/13 backlog
Is there some more context on this I could read?
Closing issues in a backlog seems like an odd process.
Hello @daurnimator,
It's actually a new thing I'm suggesting. As I see it we have two issues this can solve: (1) We have too many open issues (2) Among them there are feature requests like this, that have no objections, but also isn't picked up by anyone.
The idea behind the list in https://github.com/nodejs/node/projects/13 is to surface these, both the Collaborators and to members of the community who are looking to contribute. The "close" is just to separate these from bugs.
If GitHub would give us a third item type, that would have been optimal. Until then we are trying to find the best way to use the available metadata options to have visibility into what's going on.
Yet another approach could be to follow what Python 3 does with its surrogateescape method, such as: "\xff" -> "\udcff". This has the advantage of having to neither introduce a new API nor break compatibility with existing codes. Currently node does substitute them with \uFFFD (REPLACEMENT CHARACTER), which indeed has information loss issue compared to the aforementioned method. Of course Windows uses Unicode natively for them, but then we have the option of detecting what OS the application is running on.
Most helpful comment
process.env.getis not an option, IMO - that would break existing code that tries to read a property called 'get'.It also makes the environment accessors slow and awkward since they have to check for these magic properties every time.
What might be acceptable is
os.getenv()andos.setenv()methods but then again, it seems too minor to merit new APIs.The binary data example is flawed and will never work; envvars are C strings, nul bytes terminate them.