Marked: Question: Markdown, storage and security

Created on 28 Mar 2018 · 16Comments · Source: markedjs/marked

Hi guys!

I realise this is a bug reporting section so I hope it's OK to post this, but I was hoping for some expert advice regarding Markdown and security.

Firstly, my situation is that I want to allow members on our website to have a section of their user profile that they can edit using the markdown format, parsed by Marked.

Importantly, the data cannot be trusted - anyone can register for an account so I need to be careful with what can/can't be uploaded and displayed.

Here's how things currently stand on my development site:

I edit the raw parsedown code in a textarea and click submit.
The following is sent by POST to the server:

DataA: marked([raw-unsanitized-data], { sanitize: true })
DataB: The original raw data

DataA is put through HTMLPurifier and then stored in a .htm file which is retrieved by http GET upon user profile load.
DataB is stored in the MySQL database as-is, as original data obviously required for when re-editing.
DataA (hopefully now sanitized) is returned and displayed on the user profile.

Hopefully that makes sense. Now, my question(s)..

Is that overkill? To the best of your knowledge is that safe enough?

In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

Main question: Would it be horribly irresponsible to just work with the raw data and just output:

marked([raw-unsanitized-data], { sanitize: true })

Possibly as an additional safeguard use DOMPurify:

DOMPurify.sanitize(marked(<raw-unsanitized-data>, { sanitize: true }))

? I can't help but feel like it probably would be, but I'd like to hear it from a real expert.

PS Further info, all site data is SSL and all cookies are secure/HttpOnly etc.

OK, if you're still here - thanks for staying awake!

But seriously, I'm no expert on this by any means so I really appreciate any insight/advice you can give.

Best Regards,

Rob

question

Source

oasiz

All 16 comments

I don't think pre-rendering the html will helping with XSS since the same user generated content will show on the page regardless of which method you take.

The only security issues I have seen in marked have been ReDoS vulnerabilities. If you want to protect against someone using up resources on your servers in a ReDoS attack, you should pre-render the markdown in a separate process, and if that process takes longer than some threshold (more than a second maybe) cancel it and show an error to the user.

I can't really say how well marked's sanitize option does or whether DOMPurify.sanitize would make the output HTML any safer.

/cc @davisjam @Feder1co5oave

UziTech on 28 Mar 2018

👍1

I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

On our project we always store the raw, and render/sanitize on request only since as you stated it's super fast. Our establishing owner wanted to store the processed raw data however with sanitizers some security issue may come up (there is precedence) and you would need to regenerate the statics again. That's a hefty CPU task to continually do and could in itself generate a DoS of sorts. e.g. we don't currently cache this process.

As far as doing it twice, perhaps on client side, Userscripts can usually defeat this so you wouldn't want that security issue... so it's best to have it done server side as much as possible. e.g. CPU time is server side instead of client side (which sucks but is a fact of life in our arena).

Martii on 29 Mar 2018

Hi, thanks for the replies - much appreciated, that's some useful info.

I've noticed that Github uses server-side technology only, no JS rendering. Presumably they're storing both raw and rendered versions of all posts. That's a lot of database storage!

I'm still unsure of the best way to do this, it seems like there's no definitive answer.

Storing the raw data and rendering client-side with JS would obviously be ideal, but only if it's going to be secure.

I do wonder what other users of markedjs do with the issue of data/storage/security..

oasiz on 30 Mar 2018

@oasiz The reason why GitHub renders on the server-side is because they use Ruby to render Markdown to HTML (Kramdown I think). They literally can't render on the client (unless they find a ruby-to-wasm compiler or start using marked.js 😄)

styfle on 3 Apr 2018

In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.

@oasiz Can you clarify what security concerns you have with rendering the raw markdown on the client side?

I'm guessing the scenario you're imagining is:

Attacker Sets profile to malicious content.

Victim On visiting the attacker's profile, the server returns the raw data. While rendering it something bad happens.

Is the "something bad" what you're concerned about, or am I misunderstanding?

davisjam on 4 Apr 2018

My 2 cents. You either:

save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients, OR

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.

ps. Github parses markdown content into html by using a customized version of the C implementation of commonmark

Feder1co5oave on 4 Apr 2018

save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients

This approach should work fine. I would use a worker pool to render the JS with a timeout.

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

I think this is a better approach. Since the markdown being rendered is untrusted, perhaps render it in a WebWorker and enforce a timeout. Hopefully this project doesn't have any more vulnerable regexes but you never know. Possible instructions here.

Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.

+1. Under the "malicious client pre-renders the HTML" design, the user can upload whatever HTML they want to. They don't have to execute your code.

davisjam on 4 Apr 2018

Hi, appreciate the replies :-)

Is the "something bad" what you're concerned about, or am I misunderstanding?

Yes, absolutely correct.

you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.

That's what I'd like to do if you guys feel it's secure, as it only involves storing the original data and leaving the rest client-side.

Less storage space, less server CPU. Win-win right?!

oasiz on 4 Apr 2018

That's what I'd like to do if you guys feel it's secure

I'd recommend you consider the WebWorker-based approach I suggested above. Does it seem feasible for you?

That ensures that if we slip up and let a REDOS issue through, your clients can't be attacked by a malicious profile.

davisjam on 6 Apr 2018

@davisjam Can you add a web worker example to the docs?

styfle on 6 Apr 2018

@styfle Good idea.

In the interest of laziness, @oasiz any chance that:

You'll be using WebWorkers, and
The code will be open source? (well obviously any code you're running client-side is "open source" one way or another, but...)

If so I'll wait and use you as an example. Free advertising!

davisjam on 6 Apr 2018

😄1

@davisjam Yes I will be implementing whatever is necessary in order to secure the app, so I'll be happy to help in any way I can. Always like a bit of free advertising!

Hopefully this won't be a problem, but the site I'm working on isn't currently open to the public - although I'm hoping to release a basic version in a couple of weeks or less.

I've never used WebWorkers, so looking forward to learning something new :-)

oasiz on 6 Apr 2018

@oasiz Are you still interested in adding WebWorkers to the docs?

styfle on 19 Oct 2018

I am googling how to use DOMPurify with marked and found this issue. Shouldn't DOMPurify.sanitize(marked(markdown, {sanitize: true})) be DOMPurify.sanitize(marked(markdown))? Because {sanitize: true} will convert < to <, and DOMPurify.sanitize should take a dirty HTML as the argument. Thanks.

sgon00 on 17 Jan 2019

Please ignore my last comment. I was wrong. It's right to have {sanitize: true} for this issue. The reason why I thought {sanitize: true} was not right is because my project allows other html tags such as <center>. That's why I don't have this {sanitize: true}, but it's not the situation in this issue. Sorry about that.

sgon00 on 17 Jan 2019

There is a PR to add web workers and node worker threads to the docs #1432

The PR is just waiting for worker threads to be stable in node.

UziTech on 11 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings