Rubberduck: Floating-point literals with comma decimal separator gives parsing error

Created on 3 Jun 2019 · 8Comments · Source: rubberduck-vba/Rubberduck

Rubberduck Version:

2019-06-03 12:06:23.8281;INFO-2.4.1.4627;Rubberduck.Common.LogLevelHelper;
    Rubberduck version 2.4.1.4627 loading:
    Operating System: Microsoft Windows NT 6.2.9200.0 x64
    Host Product: Visual Basic x86
    Host Version: 6.00.9782
    Host Executable: VB6.EXE;

Description
There was already a bug with this problem fixed in #1178.
My problem is that our source code is totally mixed with
FontSize = 8,25
and
FontSize = 8.25
also in the same files. VB doesn't have any problems with it, but RD gives many parsing errors.

To Reproduce
Create a form with 2 TextBoxes, manually edit the file and change FontSize.
I think that happened a long time ago with different Windows and VB6 language
versions.

The code looks like this:

_ExtentX        =   9393
_ExtentY        =   5980
FontSize        =   8,25
TextVertical    =   1
TextHorizontal  =   0
PictureVertical =   0
PictureHorizontal=   0
Begin VB.CheckBox Check1
   Appearance      =   0  'Flat
   BackColor       =   &H00E2D3C9&
   Caption         =   "Check1"
   ForeColor       =   &H80000008&
   Height          =   225
   Left            =   705
   TabIndex        =   5
   Top             =   90
   Width           =   3750
End
Begin MSForms.TextBox Text1
   Appearance      =   0  'Flat
   BorderStyle     =   0  'None
   BeginProperty Font
      Name            =   "MS Sans Serif"
      Size            =   8.25
      Charset         =   8
      Weight          =   700
      Underline       =   0  'False
      Italic          =   0  'False
      Strikethrough   =   0  'False
   EndProperty
   Height          =   2535
End

Expected behavior
RD should parse both values correctly

Logfile

2019-06-03 12:06:50.2190;WARN-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.TokenStreamParserBase;SLL mode failed while parsing the AttributesCode version of module FRM_CHARLOCK at symbol , at L20C28. Retrying using LL.;
2019-06-03 12:06:50.2995;ERROR-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.ModuleParser;Syntax error; offending token ',' at line 20, column 28 in the AttributesCode version of module FRM_CHARLOCK.;
2019-06-03 12:06:51.0552;WARN-2.4.1.4627;Rubberduck.Parsing.VBA.Parsing.TokenStreamParserBase;SLL mode failed while parsing the AttributesCode version of module FRM_CHARGE at symbol , at L22C28. Retrying using LL.;

antlr bug difficulty-03-duck hacktoberfest localization

Source

Tragen

All 8 comments

I took the liberty to reformat this a bit and change the screenshotted code into text. This allows devs to just copy-paste it into their local environment for verification :)

Vogel612 on 3 Jun 2019

As far as I can tell, this issue arises from how the lexer defines floating point literals: https://github.com/rubberduck-vba/Rubberduck/blob/99e90943d30196036caf0ede0924fd802d5f141d/Rubberduck.Parsing/Grammar/VBALexer.g4#L255-L261

Note the explicit use of '.' as a decimal point.

Considering how "early in the parsing pipeline" this is, any changes to the rule are potentially catastrophic.
A possible patch might be something along the following lines:

-     | DECIMALLITERAL '.' DECIMALLITERAL? EXPONENT?
-     | '.' DECIMALLITERAL EXPONENT?;
+     | DECIMALLITERAL DECIMAL_SEPARATOR DECIMALLITERAL? EXPONENT?
+     | DECIMAL_SEPARATOR DECIMALLITERAL EXPONENT?;
+ fragment DECIMAL_SEPARATOR : [,.];

This would open us up to misparsing different numerical arguments in VBA code, though.
Considering an invocation like:

Foo(123,152)

We need to get an argumentList with two arguments here. The precedence chain in the current grammar dictates parsing for an expression, which falls into literalExpression -> numberLiteral. That numberLiteral rule could then parse this as a floating point literal.

That is obviously incorrect. After all, this is two interger literal arguments.

In my personal opinion a change like this is highly dangerous.
I'd be happy to accept a rigorously tested patch, but I don't think the changes necessary stand in relation to the benefit we gain here.

Vogel612 on 3 Jun 2019

👍1

I agree, we can't just parse a decimal separator interchangeably as a comma or a dot.

I think we might need a separate VBAFormLayoutLexer for this.

retailcoder on 3 Jun 2019

Shouldn't this
Foo(123,152)
not be
Foo(123, 152)
The space makes the difference.
But I see your point.

Tragen on 3 Jun 2019

That would be a pretty-printed version; however, AIUI, Foo(123,152) is grammatically valid and is documented so in MS-VBAL. The parser follows MS-VBAL, rather than the version you see in the IDE editor and thus is more lenient in that respect.

bclothier on 3 Jun 2019

The code you read in the VBE in a validated line of code isn't the code you typed in the editor: it's a translation of the compiled instruction - that's why/how the VBE "prettifies" code and "autocorrects" certain constructs; that space is an artifact of this translation, it's nowhere in the language specifications; (see argument list) - having parser logic depend on how the VBE prettifies user code is something we need to avoid whenever possible.

retailcoder on 3 Jun 2019

I understand. Thanks.
I still cannot use RD when I fix this errors as I get an OutOfMemory exception also.

Tragen on 3 Jun 2019

Since this error appears in the form section, which we do not evaluate anyway (We just have to parse it to get to the actual module.), I would like to suggest to just patch up the parser rule for the forms part with an additional alternative INTEGER COMMA INTEGER. That should work around the issue that apparently forms can be saved with different regional settings, in contrast to the actual VBA code.