Many internal modules call methods defined on prototype objects that can be modified (or monkey-patched) in userland. For example, by overriding Object.prototype.getOwnProperty
, it is possible to affect async hooks.
One way to fix this is to keep copies of these functions at boostrap, and use these untampered copies inside internal modules.
Optionally, these copies can be exported through a new internal module so that user code can get hold of untampered copies if they intend to.
Advantages of this approach:
Advantages of current code:
I think it is desirable to have consensus over whether internal modules should be affected by monkey-patching. This does not have to be a either/or decision.
Coming from the JavaScript spec, there are JavaScript builtins that are affected by monkey-patching too, intentionally. For example, by overriding RegExp.prototype.exec
, the behavior of other RegExp builtins can be changed. Yet another example is the species constructor. Generally though, JavaScript builtins call into internal builtins that cannot be altered.
I think it is important to clearly specify and document which global state (i.e. result of monkey-patching) affect behavior of internal modules rather than having implicit APIs though monkey-patching. Personally, I think that - unless specified otherwise - monkey-patching should not affect internal modules. It needs to be a conscious design choice.
FWIW this problem would become very apparent if Node.js had a formal specification for its internal modules :)
Ref: https://github.com/nodejs/node/pull/18773
CC: @benjamingr @devsnek
I think one of the most important points that you made in https://github.com/nodejs/node/pull/18773 is:
All APIs that V8 offer in C++ use the original builtin. Including v8::Object::GetOwnPropertyNames. Let's say, just hypothetically, Node.js core wants to move the implementation of some functions from JS to C++, it then must not use v8::Object::GetOwnPropertyNames, but instead load and call global.Object.getOwnPropertyNames. Otherwise it would observably change behavior.
That is, every time we move code from/to C++ we break any patching userland has for the said call.
Now, another good thing to consider is that there have been cases where overriding built-ins has been tremendously useful when polyfilling - for example https://github.com/drses/weak-map/blob/master/weak-map.js (the WeakMap polyfill).
if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how fs.readFile works
I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though.
@devsnek
if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how far.readFile works
Probably, but it's more subtle than that - going back to the the WeakMap example case - users changed GetOwnPropertyNames
since a new ("hidden") key was added to an object. That key had to be "secret" and not impact other methods - so if util.inspect
was "tamper proof" then it would have been impossible to polyfill WeakMap
entirely.
Note: Another thing we can do is to explicitly specify that changing built ins is dangerous and doing so is not supported in Node.js .
I'm not sure that's a good idea as it violates "least surprise" and the current behavior is unspecified anyway. Note that fixing this and ensuring builtins are used should be a pretty big code change and will create overhead in maintaining the code since basically built-in instance methods won't be used.
If we do this we need to make sure it doesn't impact performance (or improves it, since technically it's provable which function it is now), and that it's not problematic maintenance wise.
I feel like this is a deep rabbit hole. Anyone could override anything, so now any and all object/prototype methods need to be saved up front, which is crazy because there are so many and they're used throughout the codebase. For example: string.trim()
, string.toLowerCase()
, string.slice()
, Object.keys()
, Number.isNaN()
, etc.
Even if it was decided it's a good idea to do this, we'd need to also maintain a giant list of methods we use anywhere in node core.
@mscdex are you concerned about memory use or time spent upon startup? One way to do it is to simply shallow clone these prototype objects.
@hashseed I'm not even considering performance implications at this point. Just having to touch almost every line of code in node core, getting new collaborators to understand the change, maintaining these methods/prototypes, etc.
I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.
Just having to touch almost every line of code in node core
I don't have a problem with that if it's for the greater good. Most of it is rote search-and-replace, perhaps even mechanical replace.
Could make for a good Code-and-Learn exercise :-)
My position on this aligns with @mscdex. There's some parts that are not completely unintuitive because people have likely seen hasOwnProperty.call(obj, 'something')
before, but then there's stuff like Promise.prototype.then
monkey-patching and needing to have PromiseThen
that is honestly going to be nightmare to explain to new contributors.
Also, it's very likely that uncurryThis
carries a performance penalty so switching over the entirety of core to use it will lead to a broad performance penalty for the whole of Node.js.
if we're going down this route, my preference is to try & find a path that leads to the core being able to lock or have our own versions of these primitives, as mentioned by @benjamingr.
I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.
I know this is a more complicated route (that might not be feasible with what we have available right now) but at least it would keep the code maintainable and easy to understand for new contributors. Having to write PromiseThen(promise, fn);
is — to me — not an appealing proposition.
(And if we're comparing this to the C++ layer, having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions.)
Also, to further elaborate, this would still need to go hand in hand with being more defensive on anything being passed in from the users. For example, if a user is allowed to pass in a settings object then we shouldn't assume that it has hasOwnProperty
or that it even has a non-null prototype.
This aligns with the recent transition to using Reflect.apply
on user provided callbacks.
having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions
This does not work, if you take user-defined objects as arguments and leak internal versions via return values.
This does not work, if you take user-defined objects as arguments
As I mention above, we would still need to be more defensive for user-provided objects but that's a much lesser level of change than having to be defensive about the entirety of internal Object, String, etc. usage. User-provided callbacks were one of the main culprits and then we have some infrequent hasOwnProperty
access, but not much more than that.
leak internal versions via return values.
Hence me saying this might not be possible currently. That said, if our internal versions could be frozen but without the currently present performance impact, then it would technically be possible, no?
The fact is that it's hard to know where to draw the line. What about Buffer? It's something that's provided by node itself but it's crucial to the inner workings of at least half of core. And if Buffer is included, what else is?
in my mind i kind of split this into two example cases:
a) util.inspect getting property keys
b) repl using string replacement
for A i would use unsaved prototype methods to allow users to heck with it, but for B there is no benefit to the repl (in fact it can become impossible to use) if someone overrides String.prototype.replace for some reason, so we would use a saved proto method here.
like hashseed said its ok to continue to have planned monkey patching but it shouldn't just randomly happen.
in my mind i kind of split this into two example cases:
I think @hashseed's point is that this is generally underspecified in our whole API and that we should probably consider specifying this. (feel free to comment if I misrepresented your intention)
That is, that an effort to go over Node's API in its entirety and specify (at least for internal use) what methods may impact it would be beneficial. I agree with that.
I think we should also discuss how we should do this (split work?) and get some initial data.
b) repl using string replacement
I'm really not sure, I honestly can't think of a case where I'd want to override this but I wouldn't forsee the WeakMap for util.inspect
either.
The issue is that this effects _older_ versions of Node (new versions of Node rarely need polyfills).
One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.
I don't know how the shallow cloning of prototypes would work, but currently even if you save hasOwnProperty
early, someone could still override Function.prototype.call
after the fact and still completely disrupt your hasOwnProperty.call(...)
.
@mscdex call(hasOwnProperty, ...)
:D Although I see your point
One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.
Yes, this was kind of what I was hinting at earlier with having to touch the entire codebase.
@mscdex call(hasOwnProperty, ...) :D Although I see your point
Right, and having to code like that would be pretty absurd IMHO, even if you wrote a wrapper for it.
as a reference I created https://gist.github.com/devsnek/76aab1dcd169c96b952fbe8c74404475. around 350 functions are saved. L27 exposes it to users.
We would also need to store the constructors to avoid global.Array = function() {/* user code */}
and pass them into the internal wrappers.
Another way I can think of is to store copies of builtin methods under symbols. In internal modules you would invoke builtins via symbols.
One possible approach that carries along a whole range of it's own complexities is to use a separate context (e.g. as in vm.runInNewContext()
) for internal code vs. user code. While it would be disruptive in a number of cases, it would prevent a number of issues being discussed here.
One possible approach that carries along a whole range of it's own complexities is to use a separate context
Unfortunately that does not work. When you call arg.getOwnProperty
, you call the one passed with the arg
object on the prototype, not the one on the object prototype on from the current context.
I just checked with @bmeurer that TurboFan handles Reflect.apply
. With the following micro-benchmark I observe some regression on Node 8, but no difference at all on current Node master.
const ReflectApply = Reflect.apply;
function uncurryThis(func) {
return (thisArg, ...args) => ReflectApply(func, thisArg, args);
}
var StringStartsWith = uncurryThis(String.prototype.startsWith);
var s = "Hi Matteo";
function test1() {
var r = 0;
console.time("monkey-patchable");
for (var i = 0; i < 1E7; i++) {
if (s.startsWith(i)) r++;
}
console.timeEnd("monkey-patchable");
return r;
}
function test2() {
var r = 0;
console.time("cached original");
for (var i = 0; i < 1E7; i++) {
if (StringStartsWith(s, i)) r++;
}
console.timeEnd("cached original");
return r;
}
test1();
test2();
Is it really that much more unreadable to rename the function and move the receiver to the first argument? You don't have to use some .call
pattern here, and there seems to be no performance penalty.
Having to write
PromiseThen(promise, fn);
is — to me — not an appealing proposition.
I'd argue that you should be writing async/await in most cases anyways.
is anyone still -1 on this? @mcollina is this still a tsc agenda item?
@devsnek I have not put this as tsc-agenda (nor has anyone). Maybe you are referring to a different issue?
I'm +0 with this change. It's unrealistic to think we could put Node into a complete lockdown, as there will be exceptions. It will also complicate our codebase significantly, and it will make backporting harder.
I'm 👍 in exposing untampered versions of the globals. This can be helpful anyway.
I would also recommend using them when there is a need, rather than doing this _everywhere_.
I think what we should do at this point is _research_ about the impact of overriding globals at the moment and then formulate a more concrete opinion that will lead to a resolution of either:
I took a slightly different approach for my implementation. I tried adding builtin generics, much like the primordials concept. It took a while and did complicate things. There was also some perf overhead for all the indirection.
I unwound it a considerable bit by having two contexts instead one. One for internals and the the public exposed one. I then only used the Generic classes, likeGenericArray
, on things that are public facing. This allowed me to write in the familiar JS style as before while isolating the code from tampering. Along with Generic classes for interacting with user-tamperable data I also expose an __external__
to the isolated context so that I can create values like new ExObject
or new ExArray
for things exposed to users.
This has allowed me to avoid rewriting hundreds of method calls, and enabled me to include third-party code, like acorn, while having less than a dozen points where I handle the internal/external boundary.
@jdalton that's a good idea, however as @hashseed mentioned:
Unfortunately that does not work. When you call arg.getOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.
If there is a way around that that would be optimal.
@benjamingr
Unfortunately that does not work. When you call arg.hasOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.
That's not a problem. That's a value coming from one context coming into another. In this case I have a has
helper that is created in the isolated context so I can do has(arg, "foo")
and it doesn't matter if arg
is from the exposed context or the isolated context or if arg
has its own custom hasOwnProperty
method or if arg
has a null
[[Prototype]]
.
One of the other nice things is primitives handled in the isolated context, even if created from a method in the exposed context will be boxed in the isolated context too.
I'm 👍 in exposing untampered versions of the globals. This can be helpful anyway.
-0 on this, maybe if they were abstracted through util or something (util.primordials.ArrayMap
), and definitely need to be locked down as much as possible
I would also recommend using them when there is a need, rather than doing this everywhere.
definitely. i would like to get a team of people together to audit the codebase and find places where we want this.
generics
generics are a tempting idea but as you said yourself they have a perf hit and their behavior might be less clear than explicit calls of well-known methods from javascript's primordials
generics are a tempting idea but as you said yourself they have a perf hit and their behavior might be less clear than explicit calls of well-known methods from javascript's primordials
The terminology of primordials and generics are interchangeable in this bit. Instead of coining mind "primordials" I went with "generics", because it's easier to type and has historical precedent. Their implementation is essentially the same. So when I say there was performance overhead with generics you can read that as performance overhead with primordials.
In the context of this conversation i think primordials
are the big global objects that javascript (and we) create (Array, Symbol) and generics are the methodology from js1.6 where you could use Array.every
(which doesn't exist anymore) on anything iterable. In this case generics are going to be much slower than the uncurryThis we are looking at currently because v8 appears has specific perf case for uncurryThis and generics have to work with many types calling down to other behaviors
I'm referring to primordials in the context of the Node helpers written to handle handle methods from unknown origins. The term used for those helpers was primordials. I pointed to the docs for the source of the name generics, not using the actual Array.every
(that doesn't exist in v8). I'm saying I implemented generics in the same way as uncurryThis
. Who knew the name would throw folks off here :yum:
I think I'll pick this up at the next open source event we're doing and get 10 or so community members messing with functions and reporting how Node behaves in different scenarios - as well as searching GitHub and NPM for initial data.
I think that should get us the initial data we need to present to the TSC in order to make a push for a more consistent API in terms of overriding builtins.
This will likely be mid-late March (and sounds like a fun activity people unfamiliar with our code base can help with), if anyone has an earlier time frame feel free to "steal" this.
I'm going to self assign, I don't think this needs TSC attention until we have said data but feel free to disagree or bring other opinions.
@benjamingr is this something you're interested in doing at a community event?
@devsnek still interested in this but have been a little overwhelmed.
There is a proposal by @domenic that would address this.
@hashseed thanks, that's useful. I should have probably gotten around to this by now but I ended up working on promise-use-cases in the last event. I'll do my best to get to this in the next event (June 16th).
(Random note: I _think_ domenic asked not to be pinged in the nodejs repos by the way - so we should probably not ping him without checking)
That proposal isn’t solely necessary, fwiw - node could load a single file at the start of the process that caches all the intrinsics and provides a module to expose them internally. The challenge - with that or with domenic’s proposal - would be enforcing the pattern’s usage via linting and code review.
@ljharb see https://github.com/domenic/get-originals/issues/14 - I'm thinking about automatic conversion here rather than changing our code.
The proposal is something we'd potentially want to use after we've mapped how our internals react to these changes.
I'm not sure how you could reliably automatically convert prototype methods; babel has the same challenge.
@ljharb I'm not sure what you mean? If you have a concern regarding automatic conversion please do speak up in the get-originals repo.
Should this remain open? Or is this a discussion that has run its course and can be closed?
I’d still like to see a policy decision that trends towards robustness, if possible.
Given that there has not been any further discussion on this in 1.5 years and given that we've started moving to the use of primordials
and protecting key paths from monkeypatching, I believe this issue can be closed.
Most helpful comment
I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though.