marked 0.3.14 hangs on the attached file, worked in 0.3.9

Created on 18 Feb 2018  路  16Comments  路  Source: markedjs/marked

Expectation

example.txt
should be converted to html. This worked with marked 0.3.9.

Result

With marked 0.3.14 marked "hangs" when given the file.

What was attempted

npm install [email protected]
node_modules/marked/bin/marked

L0 - security

Most helpful comment

I can edit projects but I'm not used to them (I've been using Trello for some time though :laughing:) so I don't have a defined workflow.

All 16 comments

I'm investigating this.

Commit a8f2d7ff6 introduced this behavior.

It is caused by this piece of markdown:

~~~
move.rs:9:9: 9:22 error: cannot move out of borrowed content
move.rs:9         &MyEnum::Y(y) => println!("{}", *y)  // <-- Error, y cannot be moved out of a reference
                  ^~~~~~~~~~~~~
move.rs:9:20: 9:21 note: attempting to move value to here
move.rs:9         &MyEnum::Y(y) => println!("{}", *y)  // <-- Error, y cannot be moved out of a reference
                             ^
move.rs:9:20: 9:21 help: to prevent the move, use `ref y` or `ref mut y` to capture value by reference
move.rs:9         &MyEnum::Y(y) => println!("{}", *y)  // <-- Error, y cannot be moved out of a reference
                             ^
~~~

This causes a hang:

> marked.InlineLexer.rules.tag
/^<!--[\s\S]*?-->|^<\/?[a-zA-Z0-9\-]+(?:"[^"]*"|'[^']*'|\s[^<'">\/]*)*?\/?>/

> marked.InlineLexer.rules.tag.exec(`<-- Error, y cannot be moved out of a reference
...                              ^`)

I don't really understand the reason underlying this.

Wow, quick turn around. My guess is that the regex backtracks and that it hangs because of this (Kind of like here: http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016). Can't fully pinpoint the issue though. More info here: https://www.regular-expressions.info/catastrophic.html

Thanks @paulkoerbitz, I too thought immediately to catastrophic backtracking but that group (?: ... ) _should not_ be affected by it because its branches are mutually exclusive.
I've worked a patch for this but I wanna be sure it doesn't break anything before posting it.
I'll post it here in a bit and any help would be welcome, since I cannot figure out _why_ that works and the current one doesn't.

@Feder1co5oave thank you for the quick replies. I can't really pinpoint it either (but it seems you know a lot more about this than me anyway ;) ) - I've worked around the issue by pinning my dependency to 0.3.9, so from my side the need for a fix is not super urgent.

@paulkoerbitz and thank you for reporting and helping. Unfortunately regexes are tricky, and there is no way to determine they run smoothly on every possible input. What I'm trying to work on is to widen our test base.

I think I found it is in fact exponential backtracking. I wrote a script to test this:

const regex = /^<!--[\s\S]*?-->|^<\/?[a-zA-Z0-9\-]+(?:"[^"]*"|'[^']*'|\s[^<'">\/]*)*?\/?>/;
var src = '<-- Error, y cannot be moved out of a reference';

for (let i = 0; i < 18; i++) {
    let begin = process.hrtime();
    regex.exec(src);
    end = process.hrtime(begin);
    console.log(`Matching '${src}' Took ${end[0] * 1000 + end[1] / 10**6} milliseconds`);
    src += ' ';
}

this prints out:

Matching '<-- Error, y cannot be moved out of a reference' Took 0.28264 milliseconds
Matching '<-- Error, y cannot be moved out of a reference ' Took 0.099293 milliseconds
Matching '<-- Error, y cannot be moved out of a reference  ' Took 0.094701 milliseconds
Matching '<-- Error, y cannot be moved out of a reference   ' Took 0.102704 milliseconds
Matching '<-- Error, y cannot be moved out of a reference    ' Took 0.156877 milliseconds
Matching '<-- Error, y cannot be moved out of a reference     ' Took 0.291632 milliseconds
Matching '<-- Error, y cannot be moved out of a reference      ' Took 0.516443 milliseconds
Matching '<-- Error, y cannot be moved out of a reference       ' Took 0.993562 milliseconds
Matching '<-- Error, y cannot be moved out of a reference        ' Took 2.872591 milliseconds
Matching '<-- Error, y cannot be moved out of a reference         ' Took 3.970916 milliseconds
Matching '<-- Error, y cannot be moved out of a reference          ' Took 8.493158 milliseconds
Matching '<-- Error, y cannot be moved out of a reference           ' Took 16.972238 milliseconds
Matching '<-- Error, y cannot be moved out of a reference            ' Took 35.138719 milliseconds
Matching '<-- Error, y cannot be moved out of a reference             ' Took 65.447452 milliseconds
Matching '<-- Error, y cannot be moved out of a reference              ' Took 140.613324 milliseconds
Matching '<-- Error, y cannot be moved out of a reference               ' Took 263.678523 milliseconds
Matching '<-- Error, y cannot be moved out of a reference                ' Took 504.905385 milliseconds
Matching '<-- Error, y cannot be moved out of a reference                 ' Took 1031.746128 milliseconds

with any additional space added, it takes roughly double the time and gets terribly slow pretty soon. So that group needs to be rewritten.

there is no way to determine they run smoothly on every possible input

This is a good starting place: https://github.com/NicolaasWeideman/RegexStaticAnalysis
Some setup required.

@davisjam Thanks! I've been looking for something like this for a while!

Two very different errors were revealed by this issue:

This code fence should not end at line 3 because ^~~~~~~~~~~~~ is not a valid closing fence (must be on a new line). So the rest of the code block is parsed as a paragraph.

~~~
move.rs:9:9: 9:22 error: cannot move out of borrowed content
move.rs:9         &MyEnum::Y(y) => println!("{}", *y)  // <-- Error, y cannot be moved out of a reference
                  ^~~~~~~~~~~~~
move.rs:9:20: 9:21 note: attempting to move value to here
move.rs:9         &MyEnum::Y(y) => println!("{}", *y)  // <-- Error, y cannot be moved out of a reference
                             ^
move.rs:9:20: 9:21 help: to prevent the move, use `ref y` or `ref mut y` to capture value by reference
move.rs:9         &MyEnum::Y(y) => println!("{}", *y)  // <-- Error, y cannot be moved out of a reference
                             ^
~~~

Then, <-- Error, y cannot be moved out of a reference causes the inline html rule to do catastrophic backtracking and marked hangs.

Created #1074 to give us a place to discuss tool recommendations and whatnot. Think this might also be a good opportunity to revisit the concept of projects to help with prioritization and tracking; otherwise we could end up with a lot of half-finished work.

@Feder1co5oave, @UziTech, and @styfle: Can you see the projects and move cards now?

https://github.com/markedjs/marked/projects/2?

I can edit projects but I'm not used to them (I've been using Trello for some time though :laughing:) so I don't have a defined workflow.

See my analysis of the html.closing regex here.

I believe this is fixed in #1083.

I tried the sample from the initial report. It hangs on master but not with the fix in #1083.

If the root cause analysis was correct (I haven't checked) then the test case I added in #1083 should suffice to cover this case. But @Feder1co5oave perhaps you still want to add this input file as a standalone test case?

Fixed in #1083

This fence issue is still not fixed: demo

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bennycode picture bennycode  路  4Comments

eGavr picture eGavr  路  4Comments

raguay picture raguay  路  4Comments

chunhei2008 picture chunhei2008  路  3Comments

toc
zoe-cjf picture zoe-cjf  路  3Comments