Reason: [RFC] `import` statement & namespacing

Created on 8 Jul 2016  ·  40Comments  ·  Source: reasonml/reason

Updated proposal [Aug 15]

Jengaboot currently takes the namespacing approach outlined by janestreet, where a file "Baz.ml" of a dependency "Bar" is renamed to Bar__Baz.ml so that files wishing to reference that module do so in a qualified manner. Files within the same package have an auto-opened module that aliases module Baz = Bar__Baz so that they can refer to local packages without the qualifier.

My proposal is to ditch the auto-opened alias file, and introduce an import statement syntax node thus:

// In a file that's part of the `Bar` package

import Baz from self
// let module Baz = Bar__Baz
import Boo from Foo
// let module Boo = Foo__Boo
import {awesomeFunction} from Foo.Boo
// let awesomeFunction = Foo__Boo.awesomeFunction

Qs:

  • should the import statement only be valid at the top level? it doesn't actually have to be
  • we could add a hash to prevent people from manually writing Bar__Baz, like renaming Baz.ml to Bar__Baz__[hash of file].ml... would that be useful? it would force people to use the import syntax...

Old version

Forking from https://github.com/facebook/reason/issues/617#issuecomment-231238528

This assumes that something like #617 happens, such that in order to use values/types from another file/module, you are required to explicitly indicate those dependencies in the file. For background, the current Ocaml system simply has all files implicitly having access to all other files & packages.

Ways of resolving imports:

  • global: a single (non-namespaced) name, regardless of directory
  • absolute: a namespaced name relative to some "base" directory
  • relative: relative to the current file

internal means "part of this package/project"
external means "from another package/project"

Various languages

  • C/C++

    • internal / external: absolute, but multiple (generally rather many) base directories

    • internal: pseudo-global (no automatic discovery, have to specify "search directories")

  • python

    • internal / external: absolute

    • internal: limited relative ~ has some weirdness

  • javascript (cjs, es6 modules)

    • external: absolute

    • internal: relative (w/ the possibility of specifying files / directories that should be global)

    • js is unique in that you can access & load arbitrary paths

  • js + haste

    • internal: global

    • external: absolute

  • java

    • internal / external: absolute

  • clojure(script)

    • internal / external: absolute

  • ocaml

    • internal: global

    • external: absolute

  • golang

    • internal - same directory: all files share a namespace & are mutually visible

    • internal - other directory: absolute or relative

    • external: absolute

  • rustlang

    • internal - same directory: relative

    • internal - other directory: absolute (indistinguishable from external package)

    • external: absolute

  • swift

    • internal: all files share a namespace 😢

    • external: global

RFC

Most helpful comment

First, apologies if this is a stale conversation. I hope I'm not re-treading ground that's been covered elsewhere.

Second, for better or for worse, this commentary is coming from the perspective of a JavaScript developer with zero practical experience working in an OCaml-like environment. I'm aware that this may blind me to some of the benefits of the OCaml module system.

Here are the things that, as a newcomer, I find confusing about the current behaviour of Reason's import syntax (or lack thereof):

  • It's not clear to me which files will be available in a given scope.

    • If I am inside a file foo.re, can I access files in a parent directory?

    • Does this behaviour change depending on my build configuration?

  • The mapping between filename and module name is not explicit

    • I haven't been able to locate any comprehensive documentation for this mapping function (bar.re => Bar, bar/foo.re => ???).

    • (As an aside: at time of writing, the only reference to this mapping in Reason docs is in the context of an include statement. It is not clear that this reference is available without the use of the keyword include. This was pretty confusing to me!)

    • I need to know all of the semantics of this transformation feature in order to understand any given module reference in a codebase.

    • I need to look at the appropriate bsconfig.json file for that project in order to reason about references to modules in a codebase.

    • Reason is not able to lean on a cross-language standard for transforming file-names to module-names. The closest I've seen is python, but it's not an exact replica.

  • JS Interop:

    • Importing modules that were written in JS doesn't seem like it should require different syntax from importing modules written in reason.

  • Tree shaking/minimal compilation

    • Currently, in the bucklescript environment, we need to write a bsconfig.json file which tells the compiler the full set of files which it should treat as modules and link to one another. This seems onerous in large projects. If we specify paths in import statements, then the compiler only needs an entry point. It can trace (and tree-shake!) dependencies from that module. (This would be a big bonus for Webpack interop.)

  • Namespace collisions

    • What happens if I have a folder with the name Bar and a file with the name bar.re? Can I import from both?

  • Circular dependencies

    • My understanding is that the current import behaviour cannot handle circular dependencies? How come? If we eliminate inline import syntax, does this go away?

    • Personal opinion: Circular dependencies may not be desirable in a codebase, and are tricky to support, but are quite a big bonus if Reason wants to make a case for broad production-scale adoption:

      (http://exploringjs.com/es6/ch_modules.html#sec_rationale-cyclic-dependencies).

TL;DR

I think I can boil the above down into two observations.

  1. There is an implicit file-to-module name mapping function that every Reason developer needs to learn before they can reason about dependencies in a project. This feels onerous as a newcomer. I would prefer to be able to refer to files using an established syntax, e.g. filepaths.
  2. Every file's namespace is clobbered by its project context. This seems to open the door to many of the pain-points of global namespaces. It's harder to reason about dependencies; it's easier to introduce dependencies that don't make sense; implementations get coupled to their build system; the chance of a typo accidentally accessing something external increases. (http://wiki.c2.com/?GlobalVariablesAreBad)

To me, the wins of having this implicit import syntax seem pretty small by comparison. We don't have to scroll to the top of a file and perform a few extra keystrokes. If there's something I'm missing, I would love to be enlightened!

All 40 comments

Things I want:

  • it should be apparent whether an import is internal or external
  • for internal files, it should be apparent where the file is located
  • proximity / locality -- files "close" to this file should be more accessible than files far away (either by using relative paths, or as rust does by limiting access to files in other directories)

What's your opinion about the Haskell way?

Perhaps we should make locality an IDE problem, that way we could have modules decoupled from location but still show the file location in the IDE.

For example:
import {foo} from Bar (./something/Bar);.

Where (./something/Bar) is non editable and has different styling within the IDE but doesn't show up in the actual file.

What are the definitions of 'internal' and 'external'?

internal = "part of this package"
external = "from another package"

@SanderSpies but then you need globally unique module names :( having an absolute namespace or a relative path fix that

@IwanKaramazow I'm rather unfamiliar with haskell, unfortunately :( if you want to provide a description that would be awesome!

I'm not following super closely but you can find an outline in purescript/purescript#1901 (roughly comparable to Haskell). Namespaces in other languages leave a lot to want relative to my limited understanding of OCaml modules.

@jaredly, could you say what you like/don't about CommonML namespacing approach? In that hybrid, you have to qualify external modules with their package name, but internal project modules can be accessed directly without needing to qualify.

It doesn't mirror the directory structure within a package, but it could be made to.

let x = React.Dom.Extra.x;

Where the package React contains the directory structure /src/dom/extra.re.

But within MyPackage, which could have directory structure src/myModule.re, I can simply access MyModule directly without having to prefix with MyPackage. This is kind of like relative requires but only in that different packages have different "views" of the world. They are expected to qualify namespaces according to their personal views.

@jordwalke x-post from https://github.com/facebook/reason/issues/617#issuecomment-231479308

What's so special about a file?

Files are units of code, they are a tool we have for organization :D I can imagine a variety of post-file worlds where they don't enter into the equation, but for now they're what we have.
Files form a natural limit to the amount of things I have to hold in my head. If inter-file dependencies are not explicit, then I have to hold the whole project in my head

Also, given that files are modules, making inter-module dependencies explicit has benefits. In the same way that it's a red flag for two OO classes to have lots of dependencies on each other (maybe you split things up wrong), the same might apply to two files/modules.

Maybe you could achieve the same without having to introduce new language semantics for import that are integrated with packaging. For example, it seems modules are sufficient to accomplish what you want.

/* You know it's external because of the leading `Require`
 * module namespace. */
let x = Require.React.Dom.x;
/* No Require means it's not from another package.
 * it's either local to the project or locally defined in scope somewhere. */
let y = SomeInternalModule.y;

The jengaboot build rules can easily setup that convention.

That's almost exactly what CommonML/jengaboot do currently - with the one exception that external dependencies (from another package) don't require the Require leading namespace qualifier. (I could see how it would be helpful, but for me, it just got annoying to always have to type Require when Merlin's Locate feature can reliably tell me the exact _line_ that any module is defined on).

In general, with CommonML's approach, my experience has been that adding new dependencies, forming new dependencies, and adding new files has been extremely lightweight and not error prone (because the static type system and Merlin help catch issues and track dependencies). I wouldn't recommend the approach for a dynamically typed language.

It helps achieve rapid development when you can just refer to React.Dom.div anywhere in your project (so long as your package.json specifies that React is a dependency) (or Require.React.Dom.div if you prefer the leading Require), without having to go through the ceremony of adding a shortcut to div at the top of the file. That ceremony is still an option you can do if you like to organize your code that way. But sometimes I even prefer not to be required to import variables at the top of the file, and instead leave references to React.Dom.div throughout the file so that you are even aware at the place where you use div that it is from the React package for example. I think the approach of using the module system to hint at the origins of a value allow you to achieve everything you're looking for, but without requiring that everyone else abide by your preferred conventions, is this correct?

It helps achieve rapid development when you can just include React.Dom.div anywhere in your project (or Require.React.Dom.div if you prefer), without having to go through the ceremony of adding a shortcut to div at the top of the file.

Makes sense.

Here's a potentially useful thought: code is read more often than it is written (I think well accepted?), and it is read _in more contexts_ than it is written -- e.g. it is frequently written in an editor, but it might be read on github / in git diff / phabricator / etc.

Therefore perhaps a bit more verbosity in the serialization format [1] which can easily be done automatically by the editor [2] is worth it?

1 - e.g. requiring an import @ the top of the file
2 - you start to use React.Dom.div and it adds the import statement ~ a thing that AndroidStudio does, for example. I think @frantic made an Atom plugin to do that for JavaScript as well

Similarly, our editor could hide the Require. prefix & instead color the token differently or have a 🔗 character next to it or whatever... but then when read in the plain (without Merlin at your hip) it's still obvious.

just brainstorming here

@jordwalke Interesting idea. Maybe instead of "Require", we can even have a generic resolver that resolves the path. Something like
import "github.com/facebook/react".React.Dom

[I updated the proposal, let me know what you think!]

My proposal is to ditch the auto-opened alias file - I don't this is possible due to compatibility with OCaml. I think we probably could help with making it more explicit by doing some of the work, but that would require us to introduce a (drumroll...) typing layer.

introduce an import statement AST node - I think you mean syntax node, the OCaml AST will not change due to compatibility.

I'm fine with any solution for import, as long as it becomes more explicit.

How about if the module is written in Reason then we ditch the auto-opened alias file?

And how do you convert OCaml code to Reason code in that case?

not sure what you mean...
I'm thinking:
when processing a .ml file, do the open {name of this package} at the top, and when processing a .re file, don't

You could in theory parse:

import Baz from Something

as

let module Baz = Something.Baz [@@import];

So that you can distinguish the bindings that were written via an import token. Then when printing that AST, you'd print it as import Baz from Something.

This maintains perfect compatibility with OCaml and doesn't add any complicated new semantics to the language - it's truly just a way to document that something should be considered imported from another module, where that could be checked as part of the lint phase.

So, in this proposal, what happens in the following cases:

  1. Suppose after I import Baz from Something, I want to then access a _nested_ module in Baz. I would imagine people would just want to do Baz.Helpers.foo.
  2. Someone doesn't want to import something which would consume a name binding in the environment, and they just want to access it directly: Something.Baz.foo. I see this as a helpful capability because it doesn't require that you pollute the environment with a Something module name, leaving it free.
  3. Suppose I define a nested module inside of my file. Do I need to import that module before I use it (I'm assuming not). What if that nested module has a nested module. Do I need to import _that_?
let module Nested = {
   let module Nested2 = {...};
};
/* In the proposal, would I have to do this? */
import Nested2 from Nested;
Nested2.foo;

/* Or could I do this: */
Nested.Nested2.foo;

/* And/or could I do this: */
let module Nested2 = Nested.Nested2;
Nested2.foo;

I'm guessing there's _some_ cases where you want to be required to use import and other cases where you want to use let module X = Something.X. Otherwise, what would be the point of import if not to signal something unique about the fact that you're realiasing a module locally? So, I'm trying to understand just when exactly you would want to be required to use import, and what it is you're attempting to signal.

And btw: Don't worry about people accessing Foo__Bar directly. We can obscure the heck out of those module names - those names are just there for the compilation artifacts, and we can make it so that you can't even access them (the build system would hide them).

What I want to signal: accessing code that is not in this file :)

  1. Baz.Helpers.Foo is fine
  2. If you want to access Something in any way, you still have to import Something, which would translated to let module Something = Something_{hash or sth} [@@import]
  3. Modules within the file are unobfuscated, and completely accessible.

One thing I noticed, is that there's value in this proposal that is completely separate from marking something as "an outside file". That value is - that you don't have to write out items twice (they are "punned"). Here's what I mean.

Instead of having to write:

let module ReallyLongModuleName = Something.ReallyLongModuleName;

You would only have to write:

import module ReallyLongModuleName from Something;

When importing many variables I think it can make a big difference.

But here's what I'm realizing: It seems odd that you'd have that punning only when importing. I'd want that punning even for locally defined modules:

import ReallyLongSubModule from LocallyDefinedModule;
import (x, y) from LocallyDefinedModule;

How would you imagine separating the punning feature from the ability to mark something as "an outside file"?

Hmmmm I'm not such a fan of overloading "importing" to also apply to modules within the current file. Why not just allow sth like let module {ReallyLongModuleName} = Something if you're interested in punning?

Here's another possible take on the full syntax of things:

  • importing "another file in this project" for use in this scope looks like import module OtherFileName from self or import module OtherFileName as ShortName from self
  • importing something from "another file in this project" looks like import module ChildModule from self.OtherFileName or import someVar, otherVar from self.OtherFileName or import type someType from self.OtherFileName
  • importing a file from "a library" looks like import module SomeTopLevelModule from OtherLibraryName etc.

Which would translate to

let module OtherFileName = Self__OtherFileName__hash;
let module ShortName = Self__OtherFileName__hash;
let module ChildModule = Self__OtherFileName__hash.ChildModule;
let someVar = Self__OtherFileName__hash.someVar;
let otherVar = Self__OtherFileName__hash.otherVar;
type someType = Self__OtherFileName__hash.someType;
let module SomeTopLevelModule = OtherLibraryName__SomeTopLevelModule__hash;

Hmmmm I'm not such a fan of overloading "importing" to also apply to modules within the current file.

I wasn't necessarily suggesting overloading of import for locally defined modules. Just trying to avoid giving import punning ability that isn't possible for local modules, or that is wildly different syntactically from import punning.

What is your proposal for inline imports?

For example, you could import DOM from React to use it.

import DOM from React;
let toRender = DOM.div props;

But what if you just wanted to use it directly inline?

let toRender = React.DOM.div props;

I think the last example doesn't really make it clear that React is not locally defined.

Have you considered an import namespace for these scenarios such as:

let toRender = Import.React.DOM.div props;

Which would make inline use of external dependencies both clearly marked, and also terse.

To do

let toRender = React.DOM.div props;

you'd need to have

import React

above it somewhere.

the Import. is interesting, but maybe isn't worth the added complexity of "things you need to know to understanding imports"?

you'd need to have

import React

above it somewhere.

Isn't that annoying to have to go all the way to the top of a large file to be able to use something. In JS, I would sometimes do an inline require('x').foo and then only after there were more than one callsite, I'd add the require to the top of the file. In JS there's also a larger incentive to add it to the top of the file because there's a runtime hit for each require(), yet even then I would occasionally do them inline. I think we should have a good inline story too, and ideally it's not much more to learn.

I'm cool with that. Import.SomeLib sounds fine

Throwing another idea out there. If we support Import.Foo inline, then:

import DOM from React;

Could just be punning sugar for:

let module DOM = Import.React.DOM;

I'm not sure I see the advantage of inline require() outside of some pretty specific cases, and those cases should probably be a bit ugly to encourage the community to write requires at the top of the file.

One benefit of that, is that it would even work for _local_ imports, inside of a large function body or submodule.

import DOM from React;

let runTestCases = fun () => {
   /* Look, only temporarily shadows within the function scope */
   import DOM from ReactMock;
   DOM.something ();
};

The benefit of an inline Import.React.DOM is that you don't have to jump locations in your file just to start using something, and the autocomplete immediately gives you feedback about the contents of your dependencies. If people didn't think it had _some_ value, why would people ever use inline require in JS?

First, apologies if this is a stale conversation. I hope I'm not re-treading ground that's been covered elsewhere.

Second, for better or for worse, this commentary is coming from the perspective of a JavaScript developer with zero practical experience working in an OCaml-like environment. I'm aware that this may blind me to some of the benefits of the OCaml module system.

Here are the things that, as a newcomer, I find confusing about the current behaviour of Reason's import syntax (or lack thereof):

  • It's not clear to me which files will be available in a given scope.

    • If I am inside a file foo.re, can I access files in a parent directory?

    • Does this behaviour change depending on my build configuration?

  • The mapping between filename and module name is not explicit

    • I haven't been able to locate any comprehensive documentation for this mapping function (bar.re => Bar, bar/foo.re => ???).

    • (As an aside: at time of writing, the only reference to this mapping in Reason docs is in the context of an include statement. It is not clear that this reference is available without the use of the keyword include. This was pretty confusing to me!)

    • I need to know all of the semantics of this transformation feature in order to understand any given module reference in a codebase.

    • I need to look at the appropriate bsconfig.json file for that project in order to reason about references to modules in a codebase.

    • Reason is not able to lean on a cross-language standard for transforming file-names to module-names. The closest I've seen is python, but it's not an exact replica.

  • JS Interop:

    • Importing modules that were written in JS doesn't seem like it should require different syntax from importing modules written in reason.

  • Tree shaking/minimal compilation

    • Currently, in the bucklescript environment, we need to write a bsconfig.json file which tells the compiler the full set of files which it should treat as modules and link to one another. This seems onerous in large projects. If we specify paths in import statements, then the compiler only needs an entry point. It can trace (and tree-shake!) dependencies from that module. (This would be a big bonus for Webpack interop.)

  • Namespace collisions

    • What happens if I have a folder with the name Bar and a file with the name bar.re? Can I import from both?

  • Circular dependencies

    • My understanding is that the current import behaviour cannot handle circular dependencies? How come? If we eliminate inline import syntax, does this go away?

    • Personal opinion: Circular dependencies may not be desirable in a codebase, and are tricky to support, but are quite a big bonus if Reason wants to make a case for broad production-scale adoption:

      (http://exploringjs.com/es6/ch_modules.html#sec_rationale-cyclic-dependencies).

TL;DR

I think I can boil the above down into two observations.

  1. There is an implicit file-to-module name mapping function that every Reason developer needs to learn before they can reason about dependencies in a project. This feels onerous as a newcomer. I would prefer to be able to refer to files using an established syntax, e.g. filepaths.
  2. Every file's namespace is clobbered by its project context. This seems to open the door to many of the pain-points of global namespaces. It's harder to reason about dependencies; it's easier to introduce dependencies that don't make sense; implementations get coupled to their build system; the chance of a typo accidentally accessing something external increases. (http://wiki.c2.com/?GlobalVariablesAreBad)

To me, the wins of having this implicit import syntax seem pretty small by comparison. We don't have to scroll to the top of a file and perform a few extra keystrokes. If there's something I'm missing, I would love to be enlightened!

Hey @ajhyndman thanks for being honest about this.

Skimming our docs about modules it doesn't seem like we mention much about the filename to module name transformation nor about the name spacing. I totally agree with you that we clearly should add:
1) Files are accessible as modules. The only transformation done is that the first letter of the file name is capitalized (if not already).
2) All files within a codebase are globally accessible, folder structure does _not_ matter.

When we're talking about creating a library that you publish on npm for example, you can choose what to expose to the user of that library using the bsconfig. Right now by default the user can access all of your files if you don't do anything. I would recommend having one file that you expose with all of the modules you'd like the user to have access to.

I hope this clears up things. I made https://github.com/facebook/reason/pull/1272 to fix this.

I understand the weariness related to having a global namespace. I think we're still exploring it to see "how bad" it can go in the real world. So far it's worked for all of my relatively small projects (i.e. I've never accessed the wrong thing unintentionally, the type system generally catches that) . Maybe others can share their experiences here too.

@ajhyndman thanks for lending your thoughts! I wholeheartedly agree :D but so far the sentiment from @jordwalke and co seems to be

  1. How modules are resolved isn't really part of the OCaml language spec, so it can be anything we want it to be!
  2. in the meantime, though, all tooling (e.g. merlin) & build systems (ocamlbuild, jenga, bsb) adhere to the "let's just dump every file in the global scope regardless of folder structure" way of things, b/c that's how it's always been done in OCaml. (with imo a variety of unfortunate side effects)
  3. So any folder-based namespacing scheme would be something that we introduce. There's a ton of work to be done that's not that, and so those who care about it (me included) haven't gotten around to introducing it.

I've got some more ideas around how we would move forward with namespacing, which I'll hopefully get around to writing down :)

As to your question of cyclic module dependencies: This is something that the OCaml type system doesn't make easy on us. (OCaml does an amazing amount of inference, and I imagine that ruling out general cyclic dependencies probably makes that job easier/faster). A path forward there would I imagine involve annotating the files that depend on each other, and then in a pre-compile step making a single file that creates a set of recursive module definitions. or some such thing.

At any rate, I rather anticipate that getting a "friendly-to-javascripters" module system will require some fairly heavy lifting in the build system department.

can we move forward with some light-weight solution?

include M.(a,b,c);
include M.(xx as y, b,c);
include M.N.(xx as y);

Those are local rules, which should be pretty solid and also no new key words introduced

@bobzhang That's an interesting/cool approach. Has it been proposed before? What's nice about it is that it is just looks like an extension of include M.

There’s a relevant discussion happening at https://discuss.ocaml.org/t/concise-module-syntax/1344 to add a similar syntax of “selective open” to OCaml.

I don't think it is a good idea to mess with types (include type t) in the syntax level. My proposal is just a local re-write, it is quite helpful and low hanging fruit

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gustavopinto picture gustavopinto  ·  3Comments

chenglou picture chenglou  ·  3Comments

braibant picture braibant  ·  4Comments

rickyvetter picture rickyvetter  ·  3Comments

cristianoc picture cristianoc  ·  4Comments