Reason: RFC: Towards Custom Accessor Operators

Created on 20 Nov 2018  路  8Comments  路  Source: reasonml/reason

Goals:

There are two ways in which Reason's access operators could be improved.

  1. We'd like data structures to have the ability to customize the accessors - including the syntactic real estate of dataStructure[expr]. For example, it would be nice to create a Dict module that specifies the definition of [ ] access.
  2. It would be nice to provide a more intuitive syntax for FFI access (something better than the x##foo ppx extension point syntax). BuckleScript happens to use this syntax today, but it's not specific to BuckleScript - a C FFI framework could use the same ## syntax.

I have been working with @anmonteiro and @IwanKaramazow to formulate a thorough description of the options. This is the result of those discussions:

Considerations:

We want to avoid breaking any existing code unless we designate a major version change. If there does need to be a major version bump, we would like to begin preparing peoples' code while they use the current version, in a way that will require no/little upgrading.

Challenges:

OCaml doesn't have ad-hoc polymorphism and so in any one scope, a function name can only refer to one datatype's implementation of that indexing operation. So that means if x[expr] is always translated to bracketAccess(x, expr), then you must ensure the right module is opened in scope which defines bracketAccess.

OCaml has recently acquired the ability to customize accessor operators. However, that doesn't allow customizing the best one [ ] because OCaml cannot as easily make changes to the parser with code migration plans.

Potential APIs:

Ideally we would be able to have an API like this:

module Dict = {
  let get = (dict, key) => ...;
  let ([]) = (dict, key) => get(dict, key);

  let set = (dict, key, v) => ...;
  let ([]=) = (dict, key) => set(dict, key);
  ...
}; 
module Vector = {
  let get = (arr, i) => ...;
  let ([]) = (arr, i) => get(arr, i);

  let set = (arr, i, v) => ...;
  let ([]=) = (arr, i, v) => set(arr, i, v);
  ...
}; 

open Dict;
let myDict = Dict.make();
myDict["newField"] = 100;
let res = myDict["newField"];

open Vector;
let arr = Vector.make(100, 0);
arr[0] = 100;
let res = arr[2];

/* Dynamic access */
open Dict;
let res = myDict[getUserSuppliedKey()];
open Vector;
let res = arr[getUserSuppliedIndex()];

Getting Greedy:

Okay, that's pretty cool. There's one main downside and that is that you can't have two modules open at the same time that define the same accessor. There's not much we can do there, however, one observation is that you often want to have one data structure in scope, that is used to access string keys exclusively. In BuckleScript, you could say the "JS Object" is that string-keyed based data structure you most often want to access using [ ] syntax with string keys. Can we somehow use this fact to create a nice syntax that doesn't have the same module scoping/conflict problem?

What if we had two operators [ ] and [" "] and let's suppose we were to somehow treat x["foo"] specially so that it parsed as ([" "])(x, "foo"). Then you could have both Dict and Vector open, right? Sort of - but with some issues:

  1. This is technically a breaking change that could break some Reason code that exists.
  2. This has weird semantic implications, even if no code breaks.
  3. In the future, if we get truly customizable [ ] the amount of code that would be broken or have weird semantic implications would increase.

For #1:
Today x[y] parses to Array.get(x, y) regardless of the nature of y. In existing code, people can defined a module that they've opened named Array that has get, just to use the x[y] syntax. I've done this. By changing how we parse their code so that x["y"] becomes ([" "])(x, y) we would break those people's code.

For #2: This would be the only place in the entire syntax where you cannot substitute expression terms.

let tmp = "name";
let result = data[tmp];

/* Is very different from */
let result = data["name"];

For #3: Maybe it's not so common for people to put an Array.get in scope to get the [ ] syntax today, but there would still be a semantic irregularity today, and when people can customize [] to be anything they want, that irregularity becomes front and center.

Maybe We _Can_ Have It All:

What if a fairly small tweak to that proposal can actually mitigate most of the downsides, while still avoiding lexical scoping collisions.

We will define two operators [ ] and [' '] (along with their corresponding = setter forms).

Dict would add this to its module:

let (['']) = (dict, key: string) => get(dict, key);

We would implement a special lexing rule that would happen for ['anything'], that is similar to JS JSX. By default, everything inside of the [' and '] is implicitly evaluated to a string expression, without having to ever supply " quotes.

Now, in our previous example, we don't need to juggle the modules that are open in scope, because both can be open simultaneously possibly even by the build system.

let myDict = Dict.make();
myDict['newField'] = 100;
let res = myDict['newField'];

let arr = Vector.make(100, 0);
arr[0] = 100;
let res = arr[2];

/* Dynamic Access */
let res = ([''])(getUserSuppliedKey(), myDict);
let res = arr[getUserSuppliedIndex()];

Notice that when the string key was dynamically computed, we could not embedded it in the custom syntax. In that case, we had to use ([''])(getUserSuppliedKey(), myDict) to invoke the operator directly.

But one alternative is to allow an "interpolation" syntax just like we do for JS JSX. You could "break out" of the DSL with {} as you are familiar with myDict['{getuserSuppliedKey()}'].

The single quotes make all the difference in practice because:

  1. We can have two modules open in scope that define array accessing operators, but ones geared towards string keys define a special operator.
  2. We no longer have any of the weird substitution semantic inconsistencies. Why? Because it isn't possible to define a single quote string key outside of the accessors DSL! You could never do: let tmp = 'foo'; dict[tmp], now or ever in the future.
  3. Yet we still have that streamlined string-access kind of feel. Also, single quotes are the default style guide recommendation for JS apps at Facebook, and the React core source code, so they are fairly common/familiar.

FFI Access:

The x['foo'] syntax is also a good candidate for replacing the x##foo FFI access in BuckleScript. It would be a simple transform in the bsppx.

Close, But Not Perfect!

Isn't it great? We can have it all. Familiarity, lack of module juggling, total customization, and that sweet string key access feel. Most importantly, it doesn't have weird substitution/semantic issues.

There's one downside: Single characters. When writing x['c'] it isn't clear if you wanted to do ([])(x, 'c') or ([''])(x, "c"). That is, did you intend to call the "Array" indexing with the character 'c' or the [''] indexing operation with the string "c" key?

This edge case is much less common than what would happen with double quotes, so I'm happy we've found this improvement, however, we should find the best possible disambiguation. One solution is to require that if (for whatever reason) you've redefined [] and supply single characters as keys, that you would need to write that differently. You might need to author it as: [ 'c' ]. This kind of makes sense if we're thinking about the ['foo'] indexing operator as being [' ']. There is a difference between [| |] and [ || ] as well. This still isn't the _best_ solution, but if all we'd have to trade is this once concession to get _all_ of those really awesome features, I think it would be worth it. Still, I'd like to hear if anyone has other ideas.

More Features:

We would also add exception raising forms:

module Dict = {
  ...
  let getExn = (dict, key) => ...;
  let (!['']) = (dict, key) => getExn(dict, key);
  let setExn = (dict, key, v) => ...;
  let (![]=) = (dict, key) => getExn(dict, key);
}; 
module Vector = {
  ...
  let getExn = (arr, i) => ...;
  let (![]) = (arr, i) => getExn(arr, i);
  let setExn = (arr, i, v) => ...;
  let (![]=) = (arr, i) => getExn(arr, i);
}; 

let res = myDict["newField"];
let res = myDict!["This will Throw If not Present"];

let arr = Vector.make(100, 0);
let res = arr[2];
let res = arr![999];

Getting There Practically

Suppose we do decide to follow through on this. How can we avoid breaking people today, when [] is currently defined as Array.get. If we must break people, how can begin formatting their code in preparation today? Are there parts we can accomplish that don't break anything?

There's two things required:

  1. We need to clear up the syntax so that [] isn't hard coded to Array.get. That will be a major release breaking change. Dealing with that elegantly is sort of an open question and we have a lot of options if we put this custom accessor feature in the next major version and include an upgrade script. For example, if [] now means the accessor operator [] instead of Array.get, when converting code to the next version, we could place alet ([]) = Array.get at the top of any file that uses the [].
  1. Implementing ['']. This is likely _not_ a breaking change, except for the smaller issue of x['c'] (char ambiguity) and I think we can get started working on it right now. In the next minor release of refmt, we can begin formatting any instances of x['c'] to x[ 'c' ] (or whatever disambiguation we decide on) so that peoples' code is prepared. (Eager to hear other proposals for disambiguation).

Most helpful comment

I'm not super sold on the "dictionary-style" thing though -- javascript object attributes (in javascript) are accessed via .attribute much more often than they are accessed via ["attribute"]. The latter is generally reserved for when the attribute name is a variable, e.g. when doing something dynamic. Given that dynamism is not allowed by reason's type system, I think it makes sense to avoid having something that looks like ["attribute"] for js attribute access.

All 8 comments

It might be stupid but how about backticks ` `instead of single quotes? It would remove the ambiguity with chars and make the interpolation syntax closer to JS's

With single quotes:

let a = 'a';
dict[a]  /* Ambiguity */

let a = `a`; /* syntax error is here */
dict[a]

@mrandri19 It's not a stupid suggestion at all.

We discussed backticks, however the conclusion was that we may want to save that particular syntax for actual string interpolation in the future.

Why not use Ldot to disambiguate instead of making subtly different operators?
like

open HashMap;
myHashMap["thing"]

myDict.Dict.[access]
// or maybe
myDict.Dict[access]

@jaredly I believe an LDot disambiguation is compatible with everything discussed in this RFC. That is something I would wish to add to this RFC. But the benefit of having an accessor that appears to be "string indexed" is that it is very concise for FFI:

let something = jsObj['field'];

We're trying to do better than jsObj##field here, and I think that jsObj['field'] is better, but I'm not sure that jsObj.Js.['field'] is better than just using ##.

I mean you could open Js.OperatorAccess. But I understand the desire for another syntax. Is it important that it look like array access?
e.g. we could have jsObject."field", which I think shouldn't conflict with anything we currently have, and I think has less possibility for confusion.

It's not critical that it look like Array access syntax, but there was a reason why the initial attempt was to allow x["stringKey"] to represent an improved x##stringKey. I think for dictionary style objects, having it look like [ ] is ideal, though if that is not feasible for one reason or another I think x."field" is better than x##field for example. I just don't know if x."field" is better _enough_ to justify changing anything. I'm pretty sure that a stringy looking [ ] accessor is better enough to justify changing something. Opening modules is okay but there's an opportunity to get one more thing "in scope" for free without needing modular implicits by using the syntactic real estate of ['string'].
If all we did was allow opening modules to redefine [ ] (not [' ']) then still much of this proposal is still relevant. There's two parts to it - allowing anyone to implement [ ], and then allowing a custom [' '] syntax that is separate from [ ] and therefore doesn't require module scope juggling.

I'm not super sold on the "dictionary-style" thing though -- javascript object attributes (in javascript) are accessed via .attribute much more often than they are accessed via ["attribute"]. The latter is generally reserved for when the attribute name is a variable, e.g. when doing something dynamic. Given that dynamism is not allowed by reason's type system, I think it makes sense to avoid having something that looks like ["attribute"] for js attribute access.

@jordwalke just fyi BuckleScript supports obj["name"] access using the [@bs.get_index] extension. I would use it like this:

[@bs.get_index] external ($): (Js.t(_), string) => _ = "";
let bar = foo$"bar";
Was this page helpful?
0 / 5 - 0 ratings

Related issues

rickyvetter picture rickyvetter  路  4Comments

shaneosullivan picture shaneosullivan  路  3Comments

modlfo picture modlfo  路  4Comments

chenglou picture chenglou  路  3Comments

rickyvetter picture rickyvetter  路  3Comments