Packages: [C/C++] Last function parameter highlighting breaks with newline before ending paren

Created on 7 Jul 2017  路  2Comments  路  Source: sublimehq/Packages

I've been using the new Mariana color scheme which highlights function parameters, and noticed this issue. I used the ScopeHunter plugin for the following screenshots to show that the last function parameter doesn't get the variable.parameter.c++ scope with the newline before the closing paren.

cpp_syntax_scope_error_1

cpp_syntax_scope_error_2

Most helpful comment

I made an attempt at writing a fix for this, but this led me down a rabbit hole of trying to disambiguate variable declarations versus function invocations. Consider:

// ...
int f(x);
int main()
{
  return 0;
}

What would int f(x); mean here? It could mean:

  • A global variable named f of type int initialized with the value held by variable x, or
  • A function declaration named f that returns an int and takes an unnamed parameter of type x by value.

The current syntax def decides that f should always be a function declaration. Coincidentally, this works out nicely by allowing you to find the global variable (if it were a global variable) in the global index list.

Consider now this program:

// ...
int f(x * y); //1
int g(x + y); //2
int h(x - y); //3
int k(x / y); //4
int i(x % y); //5
int j(x ^ y); //6
int main() { return 0; }

Clearly (2), (3), (4), (5) and (6) are global variables, but (1) can still mean two things:

  • A global variable named f of type int initialized with the value resulting from the expression x * y, or
  • A function declaration named f that returns an int and takes a pointer to a type x that we give the name y.

So who cares, right? Just put the expressions context inside the parentheses and be done with it. Well, that's basically what happens now. But we do want to scope those identifiers after the type of a function argument as variable.parameter.c++. Except that gets kind of tricky because how do we know, in an arbitrary expression inside the parameter list of a function, what the name of the variable is? The current syntax def does it as follows: Try to match an "identifier" word followed by a comma or a closing parenthesis. That identifier word will be assigned the variable.parameter.c++ scope. Now you see why your example clearly fails, because the word someFloat is not followed by a comma or a closing parenthesis. Contrast this to someInt, and someBool, and the someFloat) of SomeOtherFunction.

The question then becomes: how can we assign the scope variable.parameter to the identifier without looking for a comma or a closing parenthesis?

The idea I came up with is to "ping-pong" between two states inside the function parameter list: First consume the type, including the words const, volatile, pointers * and references &, and then switch to a "name" context that will wait for an identifier, scope that as variable.parameter, and then wait for a comma or a closing parenthesis or optionally consume an equals sign = and expect an expression (for assigning a default argument to the identifier). Here's a proof-of-concept screenshot:

schermafbeelding 2017-07-07 om 20 50 04
schermafbeelding 2017-07-07 om 20 50 27

As you can see, as an added bonus, this allows us to scope the * and & not as a keyword.operator, but in fact as storage.modifier, as it should be.

But problems arise with variables incorrectly scoped as functions now. Consider for instance (2) above. The parser encounters g(, and it'll scope it as a function. So we start using our "ping-pong" algorithm. It'll see x and decides that it's a type. It'll then encounter +, but + is not a valid identifier (nor is it any kind of storage modifier). However, it's too late to go back, Sublime doesn't allow you to "go back a few characters and do something else". So currently what I do here is I check for invalid characters and then switch to a general "expressions" context, realizing that we were wrong to assume it's a function. This works okay, but I have to ponder on it longer because it'll be quite a large change.

All 2 comments

I made an attempt at writing a fix for this, but this led me down a rabbit hole of trying to disambiguate variable declarations versus function invocations. Consider:

// ...
int f(x);
int main()
{
  return 0;
}

What would int f(x); mean here? It could mean:

  • A global variable named f of type int initialized with the value held by variable x, or
  • A function declaration named f that returns an int and takes an unnamed parameter of type x by value.

The current syntax def decides that f should always be a function declaration. Coincidentally, this works out nicely by allowing you to find the global variable (if it were a global variable) in the global index list.

Consider now this program:

// ...
int f(x * y); //1
int g(x + y); //2
int h(x - y); //3
int k(x / y); //4
int i(x % y); //5
int j(x ^ y); //6
int main() { return 0; }

Clearly (2), (3), (4), (5) and (6) are global variables, but (1) can still mean two things:

  • A global variable named f of type int initialized with the value resulting from the expression x * y, or
  • A function declaration named f that returns an int and takes a pointer to a type x that we give the name y.

So who cares, right? Just put the expressions context inside the parentheses and be done with it. Well, that's basically what happens now. But we do want to scope those identifiers after the type of a function argument as variable.parameter.c++. Except that gets kind of tricky because how do we know, in an arbitrary expression inside the parameter list of a function, what the name of the variable is? The current syntax def does it as follows: Try to match an "identifier" word followed by a comma or a closing parenthesis. That identifier word will be assigned the variable.parameter.c++ scope. Now you see why your example clearly fails, because the word someFloat is not followed by a comma or a closing parenthesis. Contrast this to someInt, and someBool, and the someFloat) of SomeOtherFunction.

The question then becomes: how can we assign the scope variable.parameter to the identifier without looking for a comma or a closing parenthesis?

The idea I came up with is to "ping-pong" between two states inside the function parameter list: First consume the type, including the words const, volatile, pointers * and references &, and then switch to a "name" context that will wait for an identifier, scope that as variable.parameter, and then wait for a comma or a closing parenthesis or optionally consume an equals sign = and expect an expression (for assigning a default argument to the identifier). Here's a proof-of-concept screenshot:

schermafbeelding 2017-07-07 om 20 50 04
schermafbeelding 2017-07-07 om 20 50 27

As you can see, as an added bonus, this allows us to scope the * and & not as a keyword.operator, but in fact as storage.modifier, as it should be.

But problems arise with variables incorrectly scoped as functions now. Consider for instance (2) above. The parser encounters g(, and it'll scope it as a function. So we start using our "ping-pong" algorithm. It'll see x and decides that it's a type. It'll then encounter +, but + is not a valid identifier (nor is it any kind of storage modifier). However, it's too late to go back, Sublime doesn't allow you to "go back a few characters and do something else". So currently what I do here is I check for invalid characters and then switch to a general "expressions" context, realizing that we were wrong to assume it's a function. This works okay, but I have to ponder on it longer because it'll be quite a large change.

I would generally tackle this from the same angle as you did: try to see if it's a function declaration by trying to match the first segment as a parameter declaration (which requires a type, optionally followed by a variable name). It should be possible to assume that something is using this pattern when it has a type (built-in, known or in TitleCase) followed by any number of asterisks and an identifier (optionally followed by a function type declaration). I don't know of any other syntax that could be valid after seeing this, so it should be safe to assume we're within a function declaration.
There may be other interesting cases when the type isn't as Easily understood, in which case you could require asterisks to have a space after but not before to separate multiplication using common syntax.

Afterwards we may or may not assume that any following parameters are also declarations and not expressions.

Am I missing something here? C(++) sure is a fun language.

Was this page helpful?
0 / 5 - 0 ratings