Nim: [TODO] small amendment to style insensitivity: leading/trailing underscore disables style insensitivity; use case: avoid symbol clashing for FFI / interop

Created on 29 Aug 2018 · 13Comments · Source: nim-lang/Nim

proposal

small amendment to nim’s style insensitivity:
identifiers starting/ending with _ would be compared strictly (not style insensitively); this would help among other things with:

FFI
eg: libmpfr.dylib defines both mpfr_exp2 and mpfr_exp_2 ; FFI wrappers could simply insert trailing (or leading)_ to to avoid symbol clash
other example: libldc.dylib defines both exit and _exit ; etc there are many more examples.
things like user_sort vs use_rsort : under this proposal, we'd be able to have:

user_sort
use_rsort # same as above
use_rsort_ # different from above because of trailing `_`

acronyms: parseRGB_ => won't be callable via parser_GB etc because of trailing _

NOTE: the main use case here is FFI, and I'd be willing to restrict this feature to FFI use case if it boils down to that.

implementation

IIUC, this is a very simple localized change in Nim compiler (and am willing to send PR for it if noone volunteers)

no breaking change

since leading/trailing _ is currently disallowed, this proposal introduces no breaking change

note

I originally suggested this here

@rayman22201

Those kinds of naming conventions are popular in several older C++ codebases that I have worked with (old internal banking systems, not OSS stuff) so your FFI usecase is very reasonable

RFC

Source

timotheecour

👎8 👍1

Most helpful comment

I wrapped plenty of big C++ projects with c2nim, c2nim offers #mangle rules and has a --nep1 switch and it works well.

Araq on 30 Aug 2018

👍3

All 13 comments

But the FFI names have no relationship with the Nim names, you can do

proc exp2() {.importc: "mpfr_exp2".}
proc expB() {.importc: "mpfr_exp_2".}

And that's really better because I have no idea what the difference between mpfr_exp2 and mpfr_exp_2 would be. Again, human beings map written words to sounds in their minds. An underscore has no sound.

Araq on 30 Aug 2018

I would like to point out that is not the full quote from my irc chat. the full quote is:

@timotheecour Those kinds of naming conventions are popular in several older C++ codebases that I have worked with (old internal banking systems, not OSS stuff) so your FFI usecase is very reasonable. FFI symbol clashing is the big sell here for me, but maybe there are other ways to deal with symbol clashing. IDK if I like it elsewhere in the language. From my understanding, Your first example of user_sort vs usersort vs usersort_ is something I think araq specifically wants to be equivalent. So that would not be a selling point for the core devs I think (read araq). It might complicate the lexer more than it helps, but I don't have a strong opinion one way or the other.
https://irclogs.nim-lang.org/29-08-2018.html#19:52:04

The key statement that @timotheecour ommitted being: but maybe there are other ways to deal with symbol clashing.

rayman22201 on 30 Aug 2018

@rayman22201 thanks for the correction. But please be specific about what you mean by "other ways"; see below.

@Araq

proc exp2() {.importc: "mpfr_exp2".}

I'm referring to automated wrapping of foreign API's (ala SWIG), not manual wrapping, so it's not clear how the automatic wrapper is supposed to handle this:
suppose library Foo defines:
void fooBar()

the automatic wrapper wraps it as:
proc fooBar() {.importc: "fooBar".} and Nim code starts relying on this

then later Foo is updated and adds:
void foo_bar()

now the automatic wrapper can't wrap it as:
proc fooBar() {.importc: "fooBar".}
let's say it has some logic that appends a number, so it wraps it as:
proc fooBar2() {.importc: "foo_bar".}

the problem with this approach is that it's super brittle:

now what if Foo adds a symbol fooBar2?
this logic depends on which symbol was processed first by the automatic wrapper; it could change based on compiler flags, platforms, etc leading to symbol mapping ambiguities.

how frequent is that?

I gave some examples above from C world. In C++, this can get a lot worse because of namespaces and nested classes (and the fact nim doesn't supported nested classes, see https://github.com/nim-lang/Nim/issues/7449 where I also mention FFI issues):

in following code,

namespace foo{
  void bar();
  struct Bar{};
};
void fooBar();

struct foo{
  void bar();
  struct Bar{};
};
void fooBar();

how would an automated wrapper tool handle that?
There is currently no Nim support in SWIG and supporting Nim in SWIG would run into those issues.

timotheecour on 30 Aug 2018

I think it's the job of the wrapper(automated or not) to handle this. It could spit out a warning, after all most wrapper code should be adjusted manually until our wrappercode generators are smart enough.

Clyybber on 30 Aug 2018

I wrapped plenty of big C++ projects with c2nim, c2nim offers #mangle rules and has a --nep1 switch and it works well.

Araq on 30 Aug 2018

👍3

Style insensitivity is already a complex rule and people complain of it. An exception was made for the first letter - making the rule even more complex - for a good reason. Let us not make the style insensitivity rule even more complex

andreaferretti on 31 Aug 2018

👍1

Yeah, sorry, this idea is never gonna fly.

Araq on 3 Sep 2018

Automated or manual, the fact that Nim naming is more restrictive than C/C++ means wrapping is a headache.

Consider an example which is on the nimterop issue tracker - https://github.com/genotrance/nimterop/issues/42:

typedef enum
{
    SDL_RANDOM_ENUM
} SDL_GLContextResetNotification;   //  <===

typedef enum
{
    SDL_GL_CONTEXT_RESET_NOTIFICATION, //  <===
} SDL_GLattr;

Neither c2nim nor nimterop generates valid names that Nim compiles. Doing anything automated or manual effectively means doing something hacky.

genotrance on 22 Jan 2019

There is no way to produce fully automated wrappers, C's typing is simply too weak for that. That's why SWIG exists. That's why c2nim has plenty of customizable options (and should have more of these). Eventually you will agree with me so I'm leaving this issue closed.

Araq on 23 Jan 2019

I don't think there's any debate around how much we can automate or what customization is needed. People have been dealing with this problem for decades. However, I don't think this has any bearing on what is being asked here.

Nim's style insensitivity is a controversial topic and while I am a proponent of its benefits (I love using tree-sitter's snake case C API in camelCase), the fact that it causes distinct symbols to look like duplicates is yet another limitation to deal with. It is hard enough dealing with the C/C++ legacy and nothing is going to help with that but perhaps Nim can help.

What is being asked here is not to break or change Nim either, but to see if there's some way to handle this naming scenario a little better. Only the compiler can enable this. If there's no appetite for it, it would simply defer the headache to the wrapper tool writer and every wrapper user. My goal is to make the lives of our users better, I don't mind the tool headache but I need help from the compiler.

@timotheecour has documented some ideas here. An interesting one is proposal 4 - using backticks to help disambiguate duplicates. Proposal 5 is variation using a pragma that marks something style sensitive when conflicts are known.

We are only asking for some flexibility around the flexible styling :)

genotrance on 23 Jan 2019

👍1

Please no.

Let us have a language which has consistent style rules. Either let it be the way it is currently or make it case and style sensitive.
But please no to some symbols are insensitive and some are sensitive.
When using a library, i don't want to look at the definition of every symbol to check whether it is the one I think it is, or does it mean something else because it defies the rules of the language.

nc-x on 23 Jan 2019

👍1

I agree with that sentiment 100% and most of the ideas are not satisfactory due to this exact problem of exceptions to the rule. As a user, it would be a lousy experience to guess.

Right now, all we can do is have a tool complain and have the user fix it. In c2nim, he can use the various flags or edit the source but c2nim doesn't yet help with highlighting this up front. It generates the code which only fails when Nim compiles it. This is still okay since it is obvious what the issue is. In nimterop, we are attempting to address it by detecting dups and complaining with clear guidance on how to fix via flags for simple cases as well as a user callback (proposal #6) for more complicated use cases. It still becomes the user's problem which is what I'm trying to solve.

I am asking the community if there's a way to solve this creatively because I'm not 100% confident that there is no other solution than dumping it on the user. We still need the user to solve scenarios where it isn't obvious what to do like char ** being a var cstring or ptr cstring like Araq mentioned but also having users deal with names just seems meh.

Is different style insensitivity across symbol types possible? Like two proc names cannot overlap but a proc and a type can? I know UFCS might make some of that a challenge but who knows.

genotrance on 23 Jan 2019

In nimterop, we are attempting to address it by detecting dups and complaining with clear guidance on how to fix via flags for simple cases as well as a user callback (proposal #6) for more complicated use cases.

This is my preferred solution to this problem. I will pick a tool with a user configurable option and a good error message over a complex language rule with exceptions every time. As a user, I want the tool to help me do the right thing, not hide the right thing from me.

rayman22201 on 23 Jan 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings