Hi guys!
I realise this is a bug reporting section so I hope it's OK to post this, but I was hoping for some expert advice regarding Markdown and security.
Firstly, my situation is that I want to allow members on our website to have a section of their user profile that they can edit using the markdown format, parsed by Marked.
Importantly, the data cannot be trusted - anyone can register for an account so I need to be careful with what can/can't be uploaded and displayed.
Here's how things currently stand on my development site:
Hopefully that makes sense. Now, my question(s)..
Is that overkill? To the best of your knowledge is that safe enough?
In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.
Main question: Would it be horribly irresponsible to just work with the raw data and just output:
marked([raw-unsanitized-data], { sanitize: true })
Possibly as an additional safeguard use DOMPurify:
DOMPurify.sanitize(marked(<raw-unsanitized-data>, { sanitize: true }))
? I can't help but feel like it probably would be, but I'd like to hear it from a real expert.
PS Further info, all site data is SSL and all cookies are secure/HttpOnly etc.
OK, if you're still here - thanks for staying awake!
But seriously, I'm no expert on this by any means so I really appreciate any insight/advice you can give.
Best Regards,
Rob
I don't think pre-rendering the html will helping with XSS since the same user generated content will show on the page regardless of which method you take.
The only security issues I have seen in marked have been ReDoS vulnerabilities. If you want to protect against someone using up resources on your servers in a ReDoS attack, you should pre-render the markdown in a separate process, and if that process takes longer than some threshold (more than a second maybe) cancel it and show an error to the user.
I can't really say how well marked's sanitize option does or whether DOMPurify.sanitize would make the output HTML any safer.
/cc @davisjam @Feder1co5oave
I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.
On our project we always store the raw, and render/sanitize on request only since as you stated it's super fast. Our establishing owner wanted to store the processed raw data however with sanitizers some security issue may come up (there is precedence) and you would need to regenerate the statics again. That's a hefty CPU task to continually do and could in itself generate a DoS of sorts. e.g. we don't currently cache this process.
As far as doing it twice, perhaps on client side, Userscripts can usually defeat this so you wouldn't want that security issue... so it's best to have it done server side as much as possible. e.g. CPU time is server side instead of client side (which sucks but is a fact of life in our arena).
Hi, thanks for the replies - much appreciated, that's some useful info.
I've noticed that Github uses server-side technology only, no JS rendering. Presumably they're storing both raw and rendered versions of all posts. That's a lot of database storage!
I'm still unsure of the best way to do this, it seems like there's no definitive answer.
Storing the raw data and rendering client-side with JS would obviously be ideal, but only if it's going to be secure.
I do wonder what other users of markedjs do with the issue of data/storage/security..
@oasiz The reason why GitHub renders on the server-side is because they use Ruby to render Markdown to HTML (Kramdown I think). They literally can't render on the client (unless they find a ruby-to-wasm compiler or start using marked.js 馃槃)
In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.
@oasiz Can you clarify what security concerns you have with rendering the raw markdown on the client side?
I'm guessing the scenario you're imagining is:
Attacker Sets profile to malicious content.
Victim On visiting the attacker's profile, the server returns the raw data. While rendering it something bad happens.
Is the "something bad" what you're concerned about, or am I misunderstanding?
My 2 cents. You either:
DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.
ps. Github parses markdown content into html by using a customized version of the C implementation of commonmark
save markdown content on the server, convert it there (with the time check proposed by UziTech) and serve cached (and purified) html content to the clients
This approach should work fine. I would use a worker pool to render the JS with a timeout.
you only store markdown content, serve it to clients and
DOMPurify.sanitize(marked(markdown, {sanitize: true})in the browser.
I think this is a better approach. Since the markdown being rendered is untrusted, perhaps render it in a WebWorker and enforce a timeout. Hopefully this project doesn't have any more vulnerable regexes but you never know. Possible instructions here.
Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user.
+1. Under the "malicious client pre-renders the HTML" design, the user can upload whatever HTML they want to. They don't have to execute your code.
Hi, appreciate the replies :-)
Is the "something bad" what you're concerned about, or am I misunderstanding?
Yes, absolutely correct.
you only store markdown content, serve it to clients and DOMPurify.sanitize(marked(markdown, {sanitize: true}) in the browser.
That's what I'd like to do if you guys feel it's secure, as it only involves storing the original data and leaving the rest client-side.
Less storage space, less server CPU. Win-win right?!
That's what I'd like to do if you guys feel it's secure
I'd recommend you consider the WebWorker-based approach I suggested above. Does it seem feasible for you?
That ensures that if we slip up and let a REDOS issue through, your clients can't be attacked by a malicious profile.
@davisjam Can you add a web worker example to the docs?
@styfle Good idea.
In the interest of laziness, @oasiz any chance that:
If so I'll wait and use you as an example. Free advertising!
@davisjam Yes I will be implementing whatever is necessary in order to secure the app, so I'll be happy to help in any way I can. Always like a bit of free advertising!
Hopefully this won't be a problem, but the site I'm working on isn't currently open to the public - although I'm hoping to release a basic version in a couple of weeks or less.
I've never used WebWorkers, so looking forward to learning something new :-)
@oasiz Are you still interested in adding WebWorkers to the docs?
I am googling how to use DOMPurify with marked and found this issue. Shouldn't DOMPurify.sanitize(marked(markdown, {sanitize: true})) be DOMPurify.sanitize(marked(markdown))? Because {sanitize: true} will convert < to <, and DOMPurify.sanitize should take a dirty HTML as the argument. Thanks.
Please ignore my last comment. I was wrong. It's right to have {sanitize: true} for this issue. The reason why I thought {sanitize: true} was not right is because my project allows other html tags such as <center>. That's why I don't have this {sanitize: true}, but it's not the situation in this issue. Sorry about that.
There is a PR to add web workers and node worker threads to the docs #1432
The PR is just waiting for worker threads to be stable in node.