From PR #697, each ECMAScript function object now has a [[SourceText]] internal slot. Table 27 says that its type is String. However, most of the steps where it is set are of the form:
Set F.[[SourceText]] to the source text matched by |Nonterminal|.
and source text is "a sequence of [Unicode] code points", which is not a String.
Presumably, [[SourceText]] should get the result of UTF-16 encoding the source text.
(Alternatively, you could defer the UTF-16 encoding to the point where [[SourceText]] is used, in Function.prototype.toString. So the type of [[SourceText]] would be something like "Unicode code points". But then you'd have to UTF-16 decode _sourceText_ in CreateDynamicFunction, which seems silly.)
what if we changed the definition of "source text matched by" to include utf-16 encoding
That would pretty much conflict with other uses of the term "source text".
cc @michaelficarra
I would continue to keep the span of code points in [[SourceText]], UTF-16 encode [[SourceText]] in Function.prototype.toString, and WTF-16 decode the sourceText string built in CreateDynamicFunction.
@michaelficarra: Interesting. Why do you prefer that alternative?
Actual UTF-16 encoding won't work because source text can contain unpaired surrogates, but using UTF16Encoding would be fine.
Right, but note that you can't just pass the whole source text to UTF16Encoding, because it only takes a single code point.
If we take @michaelficarra's preferred approach ([[SourceText]] is code points), encoding only happens in one spot, so it would be enough to use the phrasing that occurs in a couple other places -- the String whose code units are the UTF16Encoding of each code point of [some source text]
But if we take the other approach ([[SourceText]] is a String), then encoding happens in 26 spots, so it might be worth defining an operation for that phrasing. But then you end up saying:
Set F.[[SourceText]] to ThatOperation(the source text matched by |Nonterminal|).
which is a bit clunky.
Instead, it might be better to define an operation that takes the Parse Node as the argument, so you get:
Set F.[[SourceText]] to Whatever(|Nonterminal|).
(I have no good suggestion for the name of either operation.)
If I can get an editorial decision on which way this should go, I'll prepare a PR.
https://github.com/tc39/ecma262/issues/1458#issuecomment-467969794 seems simplest to me, and I鈥檇 generally prefer to defer to @michaelficarra for toString questions anyways.
@michaelficarra: I'm curious about Type(func.[[SourceText]]) is String in F.p.toString. For what cases were you expecting the test to fail?
@jmdyck It's there to prevent Function.prototype.toString from returning non-string values in the event that the host decides to store a non-string in the [[SourceText]] slot.
store a non-string in the [[SourceText]] slot.
That would be non-compliant behavior for an ordinary function, so you're talking about an exotic function that elects to have a [[SourceText]] slot, right?
Any exotic object, yes.
Also, while we're in the neighborhood, I noticed that async arrow functions don't get their [[SourceText]] set. Is there a reason that wasn't added in #697?
That鈥檚 likely an oversight. A PR to fix that would be great.
Done.