Truffleruby: Extra whitespace in squiggly heredoc with escaped newline

Created on 26 Jan 2021  Â·  7Comments  Â·  Source: oracle/truffleruby

Background

The backslash / continued line \ does not work as intended on TruffleRuby when it's used at the end of a line within a squiggly heredoc.

Take this example:

x = <<~HERE
  a
  b\
  c
HERE

puts x 

Expected behaviour

MRI removes both the newline and indentation of the next line.

➜ chruby 2.7.2
➜ ruby test.rb
"a\nbc\n"

Actual behaviour on TruffleRuby

On TruffleRuby, no newline is added, but the indentation of the next line remains.

➜ chruby truffleruby-jvm
➜ ruby test.rb
"a\nb  c\n"

What we think is happening on TruffleRuby

Presumably, what we think happens is that TruffleRuby first removes the newline before dedenting.

Here is an example if what I mean:

x = <<~HERE
     a
     b\
     c
HERE

In this example, what happens first is TruffleRuby sees the backslash and then removes the new line between b and c, making it:

     b     c

After that, it dedents the string:

b     c

Returning "a\nb c\n"

What is supposed to happen

What is happening in MRI is the opposite.

The dedenting happens before removal of the backslash and newline:

x = <<~HERE
     a
     b\
     c
HERE

Dedent:

b\
c

Remove newline and backslash:

bc

Returning: "a\nbc\n"

More information about the issue

  • There is no Ruby spec for this case, but I've created a spec on this draft PR.
  • Despite having no Ruby spec for this, there is a test on TestSyntax.rb for it called test_dedented_heredoc_continued_line. The test is currently excluded and not run on TruffleRuby.
  • There is also an old MRI issue for this behaviour.
  • JRuby fixed this issue among several other issues in this commit.

  • The issue seems to be within the RubyLexer.java and more specifically the public void heredoc_dedent() function. But I don't have any context into how that works yet, which leads me to...

Questions for the TruffleRuby team:

  • Is there any truth to what seems to be the problem (as explained above)?
  • Can you point me to the place where the issue might be?

Most helpful comment

Merged in fa25299bcc25afc978e059e2e41103194862c18e

All 7 comments

That makes sense to me, thanks for the analysis.

@norswap Could you look into this and try to figure out what's needed to fix it and help @wildmaples ?

Great analysis!
I had a look in the parser (RubyParser.y and the lexer (RubyLexer.java), searching for "heredoc" and "~".
A few findings:

  • The heredoc opener is token tSTRING_BEG, which is generated by method RubyLexer.hereDocumentIdentifier
  • This method sets the field lex_strterm, which is used in the main lex method (yylex) to parse the contents of the string.
  • This is in turn delegated to HeredocTerm#parseString. I expected this is where the issue will be found.

I can have a deeper look in there tomorrow if needed. @wildmaples Let me know if you have any questions.

@wildmaples made good progress on this by backporting the JRuby commit to TruffleRuby, but that change is quite broad and causes another test to fail. We’re investigating what subset of that change we can apply to resolve our issue without breaking anything else.

It seems to me like the original assumption about the bug is correct.

The dedent is handled in RubyLexer#heredoc_dedent, which will in turn RubyLexer#dedent_string to dedent each string part. Normally, HeredocTerm#parseString emits one string component per line. But if the heredoc can do interpolation, the parsing of the string is delegated to StringTerm#parseStringIntoBuffer, which handles the line continuation (\\) and coalesces the two lines as one. As a result, the following indent does not get dedented.

Thanks for looking into it @norswap ! I just ran into this issue as well, so I was wondering if it was on your WIP or To-Do list, and if not I'll take a stab at it.

There is an internal PR by @norswap to fix it, it should be merged soon.

Merged in fa25299bcc25afc978e059e2e41103194862c18e

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ashishbista picture ashishbista  Â·  23Comments

thbar picture thbar  Â·  20Comments

deepj picture deepj  Â·  20Comments

chrisseaton picture chrisseaton  Â·  23Comments

jjyr picture jjyr  Â·  23Comments