Packages: Discussion: Scopes for keywords affecting flow and type (e.g. `function` and `class`)

Created on 15 Aug 2016  ·  19Comments  ·  Source: sublimehq/Packages

Currently, there appears to be a discrepancy with scope names for keywords that indicate a function or class body (or similar), where some get a storage.type scope name and some a keyword scope.

Quoting myself from #550:

In Python, def and class _are_ keywords as well, but they not only indicate a different type but also have different effects on the following block (and what goes into the parens following the identifier). To me, this is a pretty sure indicator that scoping both def and class as storage.type is wrong since they actually have keyword-like effects on code. storage.type should most likely only be used for staticly typed languages like C and Java.

The exception to this would be the async "keyword", which certainly should get a storage scope, although storage.modifier. Furthermore, the global and nonlocal keywords are semantically tied to referring to storage, so it makes sense for those to get a storage scope too. storage.scope or similar?

In JavaScript we have var, let and const. I would most likely mark those as storage.type and give class and function a keyword. scope for the same reason as for Python.


Carrying this over from #550, which is where this was raised initially. I highly suggest reading that issue in full.

RFC

Most helpful comment

@Thom1729 I'm inclined towards something like storage.type.function keyword.declaration.function. This gives up backwards compatibility via storage.type, but includes storage.type.function that is occasionally used, but also a path forward for these special type of keywords via keyword.declaration.*. Modern color scheme can use keyword.declaration to highlight var, def, func, etc, and existing color schemes keep working.

All 19 comments

My current implementations and the official Sublime Text docs are currently based on the TextMate docs, which explicitly state that keywords such as class and function should be storage.type.

It does seem in popular usage that scoping has split where some use storage.type and others use keyword. In one sense, keyword is sort of broad, and can be argued for almost anything. Additionally, storage.type does not strictly mean a "type" as in an integer. Instead, I think about it more as "classification", or something like that. Considering storage.modifier, most of those are "keywords" also, but there is value is distinguishing them.

I think one of the major downsides of changing storage.type to keyword for class, function, trait, enum, impl, etc is the color shift among users. This will affect a broad number of users, so I don't want to take the decision too lightly.

I think it's definitely odd the way it's done currently. The biggest confusion for me is that things like def, class and such almost always color identically to primitives like int, void and such. This identical coloring stems from the fact that the major _and_ minor scopes are identical (storage.type), with the only distinction coming in the third scoping (storage.type.primitive vs storage.type.class or storage.type.function). That's super-weird and it definitely doesn't align with how non-C languages would group these keywords. It's made even worse by the fact that most color schemes just highlight storage and ignore all of the subscoping, so even moving to storage.modifier (which I agree would be an improvement from the current situation) wouldn't address the issue in practice.

I would be in favor of def, class and so on being scoped as keyword.declaration, and then we make it clear that keyword.declaration is meant for these sorts of things. Right now, the keyword.declaration scope is sort of under-utilized and misappropriated in cases where it _is_ utilized (e.g. Scala's case keyword is, oddly, scoped as keyword.declaration.scala). Things such as void, int and so on should remain as storage.type.primitive. Similarly, var in C# (and similar constructs) should probably rename as storage.type, since it is semantically replacing a type with a "special magic type" that automatically infers. var/val in Scala (and similar languages) would be keyword.declaration, since they're not syntactically replacing types but rather just representing declaration syntax.

I think that a keyword-derived scoping would match user intuition that these are very primitive keywords. An even better example here might be def in Clojure, which makes _absolutely_ no sense scoped as storage.type, but makes all the sense in the world as some sort of keyword. The problem of course is exactly what @wbond pointed out: nearly everything _could_ be a keyword. And as with storage.type, most color schemes don't provide special highlighting for keyword.declaration, even though they probably should.

I'm obviously in favor of making the change, but it's definitely a change which will have immediate and obvious impacts on a large set of users, with the primary benefits to the change being delayed by the lead time on color scheme adjustments.

So we had a discussion about this issue today with @wbond, @FichteFoll and @mitranim

Here is what's happening for major languages.
C++:

    ReturnType method() {}
//  ^^^^^^^^^^ no scope

Python:

    def name() -> ReturnType
#   ^^^ storage.type.function.python
#                 ^^^^^^^^^^ only meta scopes

Java, C#:

    ReturnType method() {}
//  ^^^^^^^^^^ support.class.java

Go:

    func name() ReturnType
//  ^^^^ storage.type.keyword.function.go
//               ^^^^^^^^^^ storage.type.go

C++ and Python approach is very lazy, and I think we can do better.

The problem with Java approach is that there is then no more difference between user defined types and language level ones. It can make sense for some users but not for others: (#1795, #1803) On the other hand it provides a visual distinction in most schemes.

The Go approach seems tedious for me who "just want" to have a distinct color for "func" and types. My color scheme will look like:

storage.type,
storage.type.keyword:
  color: $keyword_color

storage.type.go,
storage.type.other_language_which_respect_the_new_convention,
... :
  color: $type_color

Alternative (A): mark "fn" as keyword.

    fn name() -> ReturnType
//  ^^ keyword.storage.xx
//               ^^^^^^^^^^ storage.type.xx

This will change the color of "fn" for all themes where "keyword" and "storage.type" are different.

Alternative (B): use a double scope for types

    fn name() -> ReturnType
//  ^^ storage.type.xx
//               ^^^^^^^^^^ storage.type.xx new_scope_for_type.xx

The nice thing is that if we introduce a new scope, it won't impact existing color schemes.
Color schemes will need to add a new key to specifically target this scope for users who want it to be different than now.

Alternative (C): use a nested scope for types

    fn name() -> ReturnType
//  ^^ storage.type.xx
//               ^^^^^^^^^^ storage.type.return

Note that using a scope like storage.return_type would break all schemes that don't directly target storage.

So WDYT, which alternative do you prefer ?

I think that there are several intertwined questions here.

The first is what to do with fn. This is a keyword, and in a perfect world it should be scoped something like keyword.declaration.function. Given the need for backward compatibility, it might be best to use storage.type.function and agree that storage.type really means keyword.declaration or somesuch.

The second is how to mark a token that syntactically represents a type. In a perfect world, I think that variable.type would usually be appropriate. (In some cases, a constant scope would be appropriate, and in many cases a support scope should be added.) In general, variable really means "identifier or special identifier-like thing", whether or not the thing is actually a variable. In the example, ReturnType seems to be the name of a user-defined type, so a generic variable.type.python would be appropriate.

(Alternatively, we could omit type-specific highlighting and just use a generic expression. In Python, the contents of a function return type annotation are simply an expression following the same rules as any other expression. In order to apply type-specific highlighting to such an expression, we'd have to duplicate much of the existing expression code.)

The third is how to mark one or more tokens representing the return type of a function. In this example, the only token is ReturnType, but it could easily be a complex expression. Not all languages allow complex type expressions, but enough do that we should consider it a core use case. This sounds like a job for a meta scope; marking a whole expression with storage or another non-meta scope seems wrong.

It's fair to say that my opinion boils down to "give up storage as a bad idea (but keep it anyway for compatibility)." The docs say that storage is for "[t]ypes and definition/declaration keywords". This is pretty clearly written for a) languages with var-like keywords and no explicit types and b) languages like C and Java where a type name effectively substitutes for a var keyword. In that context, it kind of makes sense, but when we consider a language like var foo: int; it's obvious that var and int are very different kinds of things.

What I suggest is:

  • For a keyword like var or function, use storage.type.* as a backward-compatible substitute for the hypothetical keyword.declaration.
  • For the name of a type, use a variable scope (except where a constant scope or something would be more appropriate, and adding support as needed). Do not use storage.
  • For an entire type annotation, use a meta scope (TBD). Don't wrap an entire potentially-complex type expression in storage.
  • Keep using storage.modifier as usual.

This should be reasonably unintrusive. The biggest change is that in languages like C, the int in int x; would be recolored or even lose color. In deference to the original intent of the storage scope, and to avoid breaking color schemes, we could keep scoping int as storage.type in C/Java-like languages. (I'd rather do this while acknowledging that it doesn't seem like the best scope than try to find an interpretation in which it is the best scope.) In languages where type names do not stand in for var-like keywords, we would still remove storage from type names.

Alternative (D):

    fn name() -> ReturnType
//  ^^ storage.type.function
//            ^^^^^^^^^^^^^ meta.annotation.return-type (or something)
//               ^^^^^^^^^^ variable.type

@Thom1729 I'm inclined towards something like storage.type.function keyword.declaration.function. This gives up backwards compatibility via storage.type, but includes storage.type.function that is occasionally used, but also a path forward for these special type of keywords via keyword.declaration.*. Modern color scheme can use keyword.declaration to highlight var, def, func, etc, and existing color schemes keep working.

That does sound like the best of both worlds. What do you think about removing storage from type names (optionally leaving them in for C-like declarations)?

Let me try to formalize and enumerate the concepts involved. I apologize for the long post, and hope this will put us on a more solid ground.

Definitions

  1. _type identifier_: identifier referring to type: int, SomeType

  2. _type declaration_: keyword followed by type identifier: type (Go, Rust), data, newtype (Haskell)

  3. _type expression_: keyword followed by type body: func, struct (Go)

  4. _value identifier_: identifier referring to anything

  5. _value declaration_: keyword followed by value identifier: var, let, func

  6. _value expression_: pretty much anything

  7. _function expression_, subset of _value expression_: function keyword followed by function definition

Note: I include function and method declaration under _value declaration_. Many statically typed languages allow functions to be assigned to variables and used as values (C, Go, Rust). Many languages don't differentiate a function declaration from a value declaration where the right-hand side is a function literal (JS, Python, Clojure, Lua).

Examples

Go is a rare language where all these concepts are syntactically distinct. Most have dedicated keywords:

    type A struct {Field Type}
//  ------ type declaration
//         ------------------- type expression

    var A struct {Field Type}
//  ----- value declaration
//        ------------------- type expression

    var A Type
//  ----- value declaration
//        ---- type identifier

    func A(arg Type) Type {}
//  ------ value declaration

    func(arg Type) Type {}
//  ------------------- type expression
//  ---------------------- function expression

Some languages conflate _type expression_ with _type declaration_ by not allowing the former without the latter. Example from Rust:

    let A: struct{field: Type}; // invalid syntax
//  ----- value declaration
//         ------------------- type expression

    let A: Type;
//  ----- value declaration
//         ---- type identifier

    struct A {field: Type}
//  -------- type declaration
//           ------------- type expression

    type A = Type;
//  ------ type declaration
//           ---- type identifier

In C and derivatives, _type declarations_ tend to use special keywords, while _value declarations_ tend to be preceded by a _type identifier_. Note that in C, type identifiers may be composite due to "namespaces". Examples from C:

    struct A {};
//  -------- type declaration

    struct A some_func() {return (struct A){};}
//  -------- type identifier
//  ------------------ value declaration

    struct A some_value;
//  -------- type identifier
//  ------------------- value declaration

As shown above, well-designed static languages allow to syntactically differentiate _type identifiers_ from _value identifiers_. Example from Go:

    var ValueIdent TypeIdent = ValueExpr
//      ---------- value identifier
//                 --------- type identifier
//                             ---------- value expression

Dynamic languages, where types are first-class values, naturally conflate _type_ concepts with _value_ concepts.

Example of JS conflating _type declaration_, _value declaration_, _type expression_, and _value expression_:

class {method() {}}         // type expression OR value expression
class A {method() {}}       // type declaration  followed by type  expression
let A = class {method() {}} // value declaration followed by value expression

Example of JS conflating _type identifier_ and _value identifier_:

    class Type {}
//  ---------- type declaration
//        ---- type identifier

    new Type()
//  ---------- value expression
//      ---- type identifier?

    let ValueIdent = Type            // both are "types"!
//  -------------- value declaration
//      ---------- value identifier
//                   ---- value identifier

One could say that purely dynamic languages like JS or Erlang simply don't have the concept of a type. However, Python and TypeScript complicate the matters by having both _runtime types_ and _static types_. Example from Python:

def is_blank(value: any) -> bool:
#                   --- static type
#                           ---- static type
    value = str(value)
#           --- runtime type
    return value == '' or value.isspace()

Languages with a function keyword tend to heavily overload it. The function keyword typically has three distinct purposes:

  • function declaration: _value declaration_ followed by function definition:
    func A(arg Type) Type {}
//  ------ value declaration
//        ------------------ function definition
  • _function expression_: function keyword followed by function definition:
    var A = func(arg Type) Type {}
//          ---------------------- function expression
  • _type expression_ for a function type:
    var A func(arg Type) Type
//        ------------------- type expression

Suggestions

Putting this together, I would lean towards:

  • have a special "type" scope, i.e. storage

  • _type identifiers_ receive this "type" scope

    • in C and derivatives, this includes type identifiers denoting a _value declaration_:
Type some_func() {}
^^^^

Type some_value;
^^^^
  • declaration keywords receive a "keyword & declaration" scope, not a "type" scope; examples:
type A = B   // keyword: type declaration
^^^^

var A = B    // keyword: value declaration
^^^

func A() {}  // keyword: value declaration (of a function)
^^^^
  • _type expression_ keywords receive a "keyword & type" scope; visually, they may be styled as keywords _or_ types; most current users would prefer the latter; examples:
var A struct {Field Type}         // keyword: type
      ^^^^^^

var A func(arg Type) ReturnType   // keyword: type
      ^^^^

Note that these points cover 2 out of 3 known uses of a function keyword, leaving _function expression_ unspecified. Right now, I'm not ready to suggest a scope for it.

Addendum to the previous post: added the missing concept of a function expression; edited the definition, examples, and suggestions accordingly.

This issue should be labeled "RFC".

so a generic variable.type.python would be appropriate.

Why variable ? Please don't! storage is what types are meant to be scoped and it feels very correct.

Agree with @wbond about storage.type.function keyword.declaration.function. def, fun, fn are no types. They are just special keywords to indicate a function definition/declaration. This is different to what C/C++ looks like.

The third is how to mark one or more tokens representing the return type of a function. In this example, the only token is ReturnType, but it could easily be a complex expression. Not all languages allow complex type expressions, but enough do that we should consider it a core use case. This sounds like a job for a meta scope; marking a whole expression with storage or another non-meta scope seems wrong.

This is what I actually see with Erlang's Typing Language. Return types can easily be sophisticated expressions. But this is what we have meta.function.return-type for. There is not need for new scopes for return types. I scoped the single types with meta.type-call the same way as we do with meta.function-call because of the analogy of defining and using types in Erlang.

-spec function(Arg1 :: int(), Arg2 :: int() | atom()) -> {int(), int() | string() | list(int())}
%     ^^^^^^^^ meta.function.erlang
%             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^meta.function.parameters.erlang
%                                                    ^^^ meta.function.erlang
%                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.function.return-type.erlang
%                                                                                   ^^^^ meta.type-call.erlang
%                                                                                       ^^^^^^^ meta.type-call.arguments.erlang
%              ^^^^ variable.parameter.erlang
%                      ^^^ storage.type.erlang
%                                     ^^^ storage.type.erlang
%                                             ^^^^ storage.type.erlang
%                                                         ^^^ storage.type.erlang
%                                                                ^^^ storage.type.erlang
%                                                                        ^^^^^^ storage.type.erlang
%                                                                                   ^^^^ storage.type.erlang

A big 👍 for @wbond's proposal. As I wrote a few years ago above, I'm a huge fan of getting keyword.declaration into the game, as it really is the ideal scope for these sorts of things. storage.type remains for backwards compatibility, and to allow for things like C#'s var (which I argue should remain storage.type and not convert over to keyword.declaration, since it is a type placeholder).

C#'s var which I argue should remain storage.type and not convert over to keyword.declaration

X( Please keep var with the same scope than func. It's not because var is written at the same position than int that it should have the same scope.

wbond storage.type.function keyword.declaration.function seems the most compatible with existing color scheme, because IIUC it will be styled like a storage.type.

@wbond what are the next steps ? Update the official scope naming guidelines and start applying the new scope in the existing syntaxes ?

Yes, I am working on updating the scope naming docs with various RFCs now. Unfortunately most of these discussions don't lead to easy additions to the docs. :-)

I do think that idea of var in C# and auto in C++ should probably be storage.type and not storage.type keyword.declaration. This is because they are implicit/polymorphic type names, and not a keyword defining a new type. They really should get the same highlighting as int and not func.

Closing as these updated guidelines will be part of the docs with the next release.

Nice to see this change finally coming through. However, the current scope guidelines have an ambiguity I'd like to clear up.

Keywords for classes, structs, interfaces, etc should use the following scopes – this list is not exhaustive.
...
storage.type.struct keyword.declaration.struct
storage.type.interface keyword.declaration.interface

The guideline implies that these keywords are _always_ declarations. Example from C:

struct A {};
^^^^^^ keyword.declaration
       ^ entity.name

But this isn't always true. Example from C:

struct A some_var = {};
^^^^^^ <namespace?>
       ^ storage.type
         ^^^^^^^^ <variable declaration>

In C, when struct doesn't declare a type, it acts as a _namespace_ of sorts. Perhaps it could be storage.modifier, but certainly not keyword.declaration.

Go never uses struct, interface and so on, for declarations. They _always_ denote an anonymous type, which _may_ be typedef-ed or aliased via the type keyword:

type A = struct {}
^^^^ keyword.declaration
         ^^^^^^ <???>

type B = interface {}
^^^^ keyword.declaration
         ^^^^^^^^^ <???>

Would appreciate thoughts on how we should scope these.

struct, class , ... are definitely keyword.declaration in all use cases. It's totally weird to think about it as namespace. Even in variable declarations it is used to make clear A is a struct type. We may argue about A to be a storage.type or entity here, maybe.

Sounds like we have different ideas about "declarations". From my perspective, there's a big difference between: (A) a keyword/operator that _adds something to the scope_, and (B): a keyword/operator denoting some type or structure _without_ adding it to the scope. I expected declaration to be reserved for A. Scope-modifying declarations are arguably special, because they're used for the symbol index. Example from Go:

var globalVar = someValue
func globalFunc() {}
type GlobalType struct{field Type}

The last example is particularly interesting, because type is the keyword that adds something to the scope, while struct isn't. In this case, struct with its contents merely denotes an _anonymous type_. Are you sure they should be scoped the same?

Yes, we have different ideas: You are thinking too much like a compiler, while I prefer a sene compromise between sematical meaning and consistent highlighting of keywords.

(The following will sound pretentious. Sorry, couldn't find better phrasing!) Interesting point about thinking like a compiler. Writing and reading software requires a mental interpreter, or several. I've put a lot of effort into making mine as accurate as possible. You've provided an external observation, unprompted, that my syntax suggestions are geared towards helping this mental interpreter, making it easier to see code by its role in the compiler. Very interesting. I hope this isn't seen as the grounds to dismiss my arguments; while we aren't machines, we do have mental compilers which should be aided.

I'm also all for simplicity, and don't have a particularly strong opinion on this declaration stuff. Just want to make sure that the choice of declaration for keywords which don't always _declare_ new types or variables wasn't made with an erroneous assumption that they always declare. If that's a conscious compromise for simplicity sake, that's perfectly fine. Thanks for helping clear this up.

Was this page helpful?
0 / 5 - 0 ratings