Antlr4: Reports strange error when combining grammars named X.g4 and XLexer.g4

Created on 2 Aug 2015  Â·  10Comments  Â·  Source: antlr/antlr4

With ANTLR 4.5.1, following the example on page 36 of the book where the lexer and parser grammars are split up resulting in "LibExpr.g4". I split my grammar into lexical and parsing rules. I named them "X.g4" and "XLexer.g4" following the example of the CSharp example grammar. However I got an error like:

error(113): X.g4:2:7: combined grammar X and imported lexer grammar XLexer both generate XLexer

Firstly, that error meant nothing to me and gave absolutely no indication what the problem was. Turns out if I name my lexer grammar anything except "XLexer" the error goes away. I presume this is the result of some check for duplicate named grammars (to prevent cycles?) combined with some rule that drops "Lexer" off the name of lexer grammars so that if you generate a lexer directly from them it won't be result "XLexerLexer.java". However, I think in this case the error is a bug since it shouldn't matter in this case. Indeed, these seem like eminently reasonable names for grammar files.

Most helpful comment

@sharwell thanks, I am coming to understand that. However, that is very at odds with the way The Definitive ANTLR 4 Reference presents it. New users who learn using the reference (which is really the only way since there is no documentation to speak of outside of it) are going to naturally start with combined grammars and importing. I feel like there is a disconnect there.

All 10 comments

Does your lexer grammar start with:

lexer grammar ....

and your parser grammar start with

parser grammar ...

Otherwise the tool will assume you have a combined grammar that will also
generate the lexer of the same name.

Jim

On Sun, Aug 2, 2015 at 11:08 PM, Jeff Walker Code Ranger <
[email protected]> wrote:

With ANTLR 4.5.1, following the example on page 36 of the book where the
lexer and parser grammars are split up resulting in "LibExpr.g4". I split
my grammar into lexical and parsing rules. I named them "X.g4" and
"XLexer.g4" following the example of the CSharp example grammar
https://github.com/antlr/grammars-v4/tree/master/csharp. However I got
an error like:

error(113): X.g4:2:7: combined grammar X and imported lexer grammar XLexer both generate XLexer

Firstly, that error meant nothing to me and gave absolutely no indication
what the problem was. Turns out if I name my lexer grammar anything except
"XLexer" the error goes away. I presume this is the result of some check
for duplicate named grammars (to prevent cycles?) combined with some rule
that drops "Lexer" off the name of lexer grammars so that if you generate a
lexer directly from them it won't be result "XLexerLexer.java". However, I
think in this case the error is a bug since it shouldn't matter in this
case. Indeed, these seem like eminently reasonable names for grammar files.

—
Reply to this email directly or view it on GitHub
https://github.com/antlr/antlr4/issues/966.

Following the book example, the "XLexer.g4" contains a lexer grammar, but "X.g4" contains a combined grammar, because it is importing "XLexer.g4". I am expecting it to generate the lexer and parser together when I run

java org.antlr.v4.Tool X.g4

The simplest grammars I could come up with to reproduce this are

XLexer.g4

lexer grammar XLexer;
Char : .;

X.g4

grammar X;
import XLexer;
file : Char* EOF;

Running the tool on "X.g4" _only_ produces the error. However, if I change the "XLexer.g4" to some other name such as "XTest.g4" and change the file contents accordingly, then it will produce the correct set of files. That includes "XParser.java" and "XLexer.java".

If I may make an observation, the book uses combined grammars for most of its examples (though they may import lexer grammars). However, combined grammars seems to have more issues. Namely, this and issues with importing grammars using channels and modes and the fact that channels aren't supported in combined grammars. The combined grammars and importing mechanism was more intuitive to me. I'd like to see support for them improved.

You cannot import a lexer with the same name as the lexer generated by your
combined grammar - so the tool is trying to tell you that you would have
two lexers with the same name, which would generate two vocab files with
the same name, etc. That's why it rejects it.

If you use the same name as the parser for the grammar, then you just
import the vocab file that that lexer generates in to the parser. This is
just keeping the lexer and parser in separate source code files.

I have not seen any issues with combined grammars that are not caused by
misunderstandings or mistakes in the grammar. By all means report any
issues that you find. ANTLR does take a little experience before it all
fits together in an obvious manner.

Jim

On Mon, Aug 3, 2015 at 10:31 AM, Jeff Walker Code Ranger <
[email protected]> wrote:

Following the book example, the "XLexer.g4" contains a lexer grammar, but
"X.g4" contains a combined grammar, because it is importing "XLexer.g4". I
am expecting it to generate the lexer and parser a together when I run

java org.antlr.v4.Tool X.g4

The simplest grammars I could come up with to reproduce this are

_XLexer.g4_

lexer grammar XLexer;
Char : .;

_X.g4_

grammar X;
import XLexer;
file : Char* EOF;

Running the tool on "X.g4" _only_ produces the error. However, if I
change the "XLexer.g4" to some other name such as "XTest.g4" and change the
file contents accordingly, then it will produce the correct set of files.
That includes "XParser.java" and "XLexer.java".

If I may make an observation, the book uses combined grammars for most of
its examples (though they may import lexer grammars). However, combined
grammars seems to have more issues. Namely, this and issues with importing
grammars using channels and modes and the fact that channels aren't
supported in combined grammars. The combined grammars and importing
mechanism was more intuitive to me. I'd like to see support for them
improved.

—
Reply to this email directly or view it on GitHub
https://github.com/antlr/antlr4/issues/966#issuecomment-127102001.

_I still think this is a bug._ If the behaviour isn't changed, then the error message at least needs improved.

Why it is a bug:
In this situation you are not generating the lexer file. You are importing it. An imported grammar is merged into the grammar in question. It shouldn't matter what that grammar is named. Consider that if I import lexer grammar Foo into my X.g4 grammar, none of the files output (including the token vocab file) will have Foo anywhere in the name. Also, if you introduce an intermediate grammar (i.e. X imports Foo and Foo imports XLexer) then the error is not reported. If you were generating all imported grammars then that should be an error too, but you are not. They are just being merged together.

The reason the error message is confusing:
Two reasons. First, I didn't ask it to generate a lexer for XLexer.g4 so I have no idea what it is talking about when it says XLexer generates something. Second, when it says "both generate XLexer" it has switched its meaning from how it used "XLexer" earlier in the sentence from a grammar to an output lexer. The second use needs some qualifiers to indicate what it is talking about.

About combined grammars
Here is what I am referring to:

  • "error(164): custom channels are not supported in combined grammars"
  • importing a lexer grammar with channels into a combined grammar produces invalid code #965
  • "error(120): lexical modes are only allowed in lexer grammars"
  • importing a grammar with modes into a combined grammar produces invalid code #970
  • imported grammars are not validated for file name matching the grammar name #892
  • issues with tokens section in combined grammar #338

That seems like a lot issues and limitations. If these were fixed and removed then combined grammars with importing would actually be a really intuitive easy way of working.

P.S. I do now understand the use of the tokenVocab option, but I didn't when I started this because it is buried at the end of the book and not listed on the website #969.

That seems like a lot issues and limitations. If these were fixed and removed then combined grammars with importing would actually be a really intuitive easy way of working.

I recommend only _ever_ using a lexer grammar paired with a separate parser grammar, and _never_ using import. In addition to being the most straightforward grammars to reason about, this strategy will block the use of certain ANTLR features (specifically the use of string literals to define tokens in a parser rule) which are easy to use incorrectly, introducing difficult to detect bugs into your application.

:memo: This is my personal opinion, but it reflects the manner in which I created a number of reasonably large projects that successfully used ANTLR in some key capacity.

@sharwell thanks, I am coming to understand that. However, that is very at odds with the way The Definitive ANTLR 4 Reference presents it. New users who learn using the reference (which is really the only way since there is no documentation to speak of outside of it) are going to naturally start with combined grammars and importing. I feel like there is a disconnect there.

i usually use combined grammars but never import ;)

On Aug 4, 2015, at 6:51 AM, Sam Harwell [email protected] wrote:

That seems like a lot issues and limitations. If these were fixed and removed then combined grammars with importing would actually be a really intuitive easy way of working.

I recommend only ever using a lexer grammar paired with a separate parser grammar, and never using import. This strategy will block the use of certain ANTLR features (specifically the use of string literals to define tokens in a parser rule) which are easy to use incorrectly, introducing difficult to detect bugs into your application.

—
Reply to this email directly or view it on GitHub.

On Tue, Aug 4, 2015 at 8:41 PM, Jeff Walker Code Ranger <
[email protected]> wrote:

_I still think this is a bug._ If the behaviour isn't changed, then the
error message at least needs improved.

Perhaps the error message isn't clear enough if you are just starting out
and could be improved.

_Why it is a bug:_
In this situation you are not generating the lexer file. You are importing
it. An imported grammar is merged into the grammar in question. It
shouldn't matter what that grammar is named. Consider that if I import
lexer grammar Foo into my X.g4 grammar, none of the files output (including
the token vocab file) will have Foo anywhere in the name. Also, if you
introduce an intermediate grammar (i.e. X imports Foo and Foo imports
XLexer) then the error is not reported. If you were generating all imported
grammars then that should be an error too, but you are not. They are just
being merged together.

Well, you are basically trying to redesign the idea. It wasn't intended to
do what you want it to do. It has to track the imports somehow, ignoring
what it actually outputs afterwards.

_The reason the error message is confusing:_
Two reasons. First, I didn't ask it to generate a lexer for XLexer.g4

By using grammar X you did in fact ask it to generate lexer X. Hence you
cannot import another lexer grammar also called X.

That seems like a lot issues and limitations. If these were fixed and
removed then combined grammars with importing would actually be a really
intuitive easy way of working.

I don't think import is that intuitive myself. I always use a separate
lexer and parser grammar - then you could import lexers in your lexer
grammar, but if you try to import a lexer grammar of the same name as the
lexer grammar you are importing, then which grammar should it import -
itself?

Also, why use a combined grammar, then import the lexer? I think that the
intent (and I may be putting words into people's mouths here) was that you
might have some common lexer stuff such as SqlKeywordsLexer or
FloatingPoint etc and then import them in to other grammars. In practice, I
have not found that to be so practical.

For what it's worth ...
As relative newbie to Antlr4 I agree completely with WalkerCodeRanger's comments.
Error message is terribly confusing.

So the recommendation is to not use import?
The book says "It's a good idea to break up very large grammars into logical chunks, just like we do with software".
I would think a good size for a 'chunk' would be about 20 rules.

Was this page helpful?
0 / 5 - 0 ratings