Antlr4: toString on empty CodePointCharStream causes exception.

Created on 13 Jul 2017  路  9Comments  路  Source: antlr/antlr4

Run the following code snippet:

CodePointCharStream s = CharStreams.fromString("");
s.toString();

and you get in the toString call

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.checkBounds(String.java:383)
    at java.lang.String.<init>(String.java:462)
    at org.antlr.v4.runtime.CodePointCharStream$CodePoint8BitCharStream.getText(CodePointCharStream.java:160)
    at org.antlr.v4.runtime.CodePointCharStream.toString(CodePointCharStream.java:137)

The underlying cause seems to be that an Interval is created with a negative end (size - 1).
I have this fix in mind but because I seem unable to build to software I'm reluctant to put up a pull request.

diff --git runtime-testsuite/test/org/antlr/v4/runtime/TestCodePointCharStream.java runtime-testsuite/test/org/antlr/v4/runtime/TestCodePointCharStream.java
index 25c4c09..c40c404 100644
--- runtime-testsuite/test/org/antlr/v4/runtime/TestCodePointCharStream.java
+++ runtime-testsuite/test/org/antlr/v4/runtime/TestCodePointCharStream.java
@@ -23,6 +23,7 @@ public class TestCodePointCharStream {
                CodePointCharStream s = CharStreams.fromString("");
                assertEquals(0, s.size());
                assertEquals(0, s.index());
+               assertEquals("", s.toString());
        }

        @Test
diff --git runtime/Java/src/org/antlr/v4/runtime/CodePointCharStream.java runtime/Java/src/org/antlr/v4/runtime/CodePointCharStream.java
index 491ef69..6245197 100644
--- runtime/Java/src/org/antlr/v4/runtime/CodePointCharStream.java
+++ runtime/Java/src/org/antlr/v4/runtime/CodePointCharStream.java
@@ -134,7 +134,7 @@ public abstract class CodePointCharStream implements CharStream {

        @Override
        public final String toString() {
-               return getText(Interval.of(0, size - 1));
+               return getText(Interval.of(0, Math.max(0 ,size - 1)));
        }

        // 8-bit storage for code points <= U+00FF.

Most helpful comment

Got the major work done for 4.7.1. will start final scan for stuff soon

All 9 comments

Interval.of(0, size - 1) is actually a valid interval that getText must be able to handle. It is the correct way to represent the entire input for all cases, including empty.

The problem is actually in the implementation of getText, which incorrectly computes the value of startIdx and len:

 @Override
 public String getText(Interval interval) {
-   int startIdx = Math.min(interval.a, size - 1);
-   int len = Math.min(interval.b - interval.a + 1, size);
+   int startIdx = Math.min(interval.a, size);
+   int len = Math.min(interval.b - interval.a + 1, size - startIdx);

@sharwell : FYI: I put up a fix for this issue a few weeks ago. #1977

Hi!

I have experienced the same, even for non-empty input, as you can see fro the index 51 below.
This is obviously a bug and the fix above looks like the right one, I was wondering whether it was committed already and should we expect to see it as part of the 4.7.1 release? (btw, any ETA?)

Thanks,
Sefer

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 51
at java.lang.String.checkBounds(String.java:385)
at java.lang.String.(String.java:462)
at org.antlr.v4.runtime.CodePointCharStream$CodePoint8BitCharStream.getText(CodePointCharStream.java:160)
at org.antlr.v4.runtime.Lexer.notifyListeners(Lexer.java:360)
at org.antlr.v4.runtime.Lexer.nextToken(Lexer.java:144)
at linqmap.config.antlr.ConfigLexer.nextToken(ConfigLexer.java:150)
at org.antlr.v4.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:169)
at org.antlr.v4.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:152)
at org.antlr.v4.runtime.BufferedTokenStream.consume(BufferedTokenStream.java:136)
at org.antlr.v4.runtime.Parser.consume(Parser.java:571)
at org.antlr.v4.runtime.Parser.match(Parser.java:203)

i hope to have 4.7.1 out the door within two months. this bug was fixed by the PR, but it did not have the proper commit message and so it was not auto closed.

I think I got this bug too. Waiting for 4.71 eagerly.

Got the major work done for 4.7.1. will start final scan for stuff soon

@parrt Quick question, in current 4.7.1, has the getText(Interval interval) been deprecated?
If yes, then is there any solution to the problem?

Both the documentation you point to (of ANTLRInputStream) AND the accepted answer in the question you point to both clearly indicate the use of the CharStreams interface which is not deprecated at all.

Thanks for your reply. Then my problem becomes how to retrieve the CharStream.

public String visitDeclaration1(parser.Declaration1Context ctx) {
...
...
int a = ctx.declaration().start.getStartIndex();
int b = ctx.declaration().stop.getStopIndex();
Interval interval = new Interval(a,b);
**CharStream decStr = CharStreams.fromStream(ctx.declaration()......);**
String declaration_1 = decStr.getText(interval);
...
...
}

Could you please let me know how to do it?


SOLVED! Thank you, @nielsbasjes!

Was this page helpful?
0 / 5 - 0 ratings