Marked: Html detection for sanitize is not reliable

Created on 29 Aug 2018  路  6Comments  路  Source: markedjs/marked

Describe the bug
When cleverly adding backslashes to the input string, the html detection can be bypassed and thus the html is not sanitized, even when sanitize is set to true.

To Reproduce
Save the following html as a .html file and open it in a browser:

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>

<body>
    <div id="unrecognized"></div>
    <div id="recognized"></div>    
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
    <script>        
        marked.setOptions({
            sanitize: true,
        });
        const unrecognizedHtml = "\\<h1\\>This is not recognized as html\\</h1\\>"
        document.getElementById('unrecognized').innerHTML = marked(unrecognizedHtml);

        const recognizedHtml = "<h1>This is recognized as html</h1>"
        document.getElementById('recognized').innerHTML = marked(recognizedHtml);
    </script>
</body>

</html>

Actual behavior
Some text is rendered as html:
image

Resulting html:
image

Expected behavior
No html inside input string should be rendered as html, when sanitize is true.

Most helpful comment

Glad it worked for you!

The more we see these problems, the more it leads me to believe that marked should not attempt to sanitize html inside marked. This is a difficult problem space.

I'm going to close this and direct users to #1232

All 6 comments

Sanitizing HTML is very difficult which is why we are thinking of depecating the sanatize option #1232

I understand this. Is there a good workaround for this today?

I can see from the documentation that I can use an external sanitizer by setting the sanitizer option. However by looking at the source code, it would seem it only calls the sanitize function if it detects html, which I have shown here it doesn't do 100%.

I see in #1232 that you suggest to use DOMPurify.sanitize(marked(...)). But wouldn't that remove some of the rendered html from the marked.js conversion? What about html inside of code blocks?

I believe DOMPurify is smart enough to allow html inside a pre. If not that seems like an issue for DOMPurify

I can't think of any example that wouldn't work with DOMPurify

You can try to break it here

Thank you so much for your help.

I will try this tomorrow within our system and see if all behaves as expected, will update this issue as soon as I have done this.

Hi again, I tried DOMPurify and it worked perfectly! Thank you again for the suggestion.

I will leave it up to you guys if you want to keep this issue open. I think it's actually pretty severe because in our angular application at work I managed to even render and exceute script tags via this workaround.

In the raw example above I managed to render script tags, but they wouldn't execute, so I didn't mention it in my example, as that could have to do with angular or something else.

Glad it worked for you!

The more we see these problems, the more it leads me to believe that marked should not attempt to sanitize html inside marked. This is a difficult problem space.

I'm going to close this and direct users to #1232

Was this page helpful?
0 / 5 - 0 ratings

Related issues

james4388 picture james4388  路  3Comments

pigtooter picture pigtooter  路  4Comments

raguay picture raguay  路  4Comments

eGavr picture eGavr  路  4Comments

bennycode picture bennycode  路  4Comments