The backslash / continued line \ does not work as intended on TruffleRuby when it's used at the end of a line within a squiggly heredoc.
Take this example:
x = <<~HERE
a
b\
c
HERE
puts x
MRI removes both the newline and indentation of the next line.
➜ chruby 2.7.2
➜ ruby test.rb
"a\nbc\n"
On TruffleRuby, no newline is added, but the indentation of the next line remains.
➜ chruby truffleruby-jvm
➜ ruby test.rb
"a\nb c\n"
Presumably, what we think happens is that TruffleRuby first removes the newline before dedenting.
Here is an example if what I mean:
x = <<~HERE
a
b\
c
HERE
In this example, what happens first is TruffleRuby sees the backslash and then removes the new line between b and c, making it:
b c
After that, it dedents the string:
b c
Returning "a\nb c\n"
What is happening in MRI is the opposite.
The dedenting happens before removal of the backslash and newline:
x = <<~HERE
a
b\
c
HERE
Dedent:
b\
c
Remove newline and backslash:
bc
Returning: "a\nbc\n"
TestSyntax.rb for it called test_dedented_heredoc_continued_line. The test is currently excluded and not run on TruffleRuby. JRuby fixed this issue among several other issues in this commit.
The issue seems to be within the RubyLexer.java and more specifically the public void heredoc_dedent() function. But I don't have any context into how that works yet, which leads me to...
That makes sense to me, thanks for the analysis.
@norswap Could you look into this and try to figure out what's needed to fix it and help @wildmaples ?
Great analysis!
I had a look in the parser (RubyParser.y and the lexer (RubyLexer.java), searching for "heredoc" and "~".
A few findings:
tSTRING_BEG, which is generated by method RubyLexer.hereDocumentIdentifierlex_strterm, which is used in the main lex method (yylex) to parse the contents of the string.HeredocTerm#parseString. I expected this is where the issue will be found.I can have a deeper look in there tomorrow if needed. @wildmaples Let me know if you have any questions.
@wildmaples made good progress on this by backporting the JRuby commit to TruffleRuby, but that change is quite broad and causes another test to fail. We’re investigating what subset of that change we can apply to resolve our issue without breaking anything else.
It seems to me like the original assumption about the bug is correct.
The dedent is handled in RubyLexer#heredoc_dedent, which will in turn RubyLexer#dedent_string to dedent each string part. Normally, HeredocTerm#parseString emits one string component per line. But if the heredoc can do interpolation, the parsing of the string is delegated to StringTerm#parseStringIntoBuffer, which handles the line continuation (\\) and coalesces the two lines as one. As a result, the following indent does not get dedented.
Thanks for looking into it @norswap ! I just ran into this issue as well, so I was wondering if it was on your WIP or To-Do list, and if not I'll take a stab at it.
There is an internal PR by @norswap to fix it, it should be merged soon.
Merged in fa25299bcc25afc978e059e2e41103194862c18e
Most helpful comment
Merged in fa25299bcc25afc978e059e2e41103194862c18e