React: How Much XSS Vulnerability Protection is React Responsible For?

Created on 20 Mar 2015  Â·  43Comments  Â·  Source: facebook/react

There was a security hack that mentions React here: http://danlec.com/blog/xss-via-a-spoofed-react-element

Ultimately this is a server-side bug and NOT a bug in React itself. This issue is about figuring out if there is something we can do to mitigate issues when you have a JSON parsing bug or some server-side issue.

isValidElement

React is designed to work with plain objects as input, and in fact, we're even getting rid of the _isReactElement as a way to verify this. We'll allow any JSON object. IMO, there is no problem with the verification here.

All string values are sanitized before inserted into the DOM (except for CSS styles which is a known wontfix issue for IE8).

In earlier versions we used instanceof checks but that didn't work well with multiple Reacts, it makes it difficult to optimize (inline objects are much faster) and couples JSX permanently to React, which we would like to avoid.

dangerouslySetInnerHTML

One possible solution is to disable this feature and require it to be used imperatively (React.findDOMNode(ref).innerHTML = '') which makes for worse performance at insertion time.

However, I don't believe this is the only bad thing once you can insert arbitrary HTML tags. It is certainly the easiest way to gain access to XSS though. You can also insert arbitrary Web Components which could expose data. You can render form elements that can potentially pass data.

What else can we do?

Ultimately this is an issue where <div>{userData}</div> seems like a valid use case, but if your userData is compromised, it becomes dangerous.

Should React be responsible for protecting itself against arbitrary JSON as children?

Most helpful comment

I might be missing something, but why does React need to complicate inserting stuff like conditional JSX elements by forcing users to wrap them, when it's not issue of React nor of 99% of use-cases when server is written properly.

As far as we understood from original report, the "issue" was that server was accepting JSON object of any schema (wat?) and saving/sending it as-is (wat??), while most use-cases pick needed fields and, of course, validate them against type/range expectations. That's the golden rule of code security - never trust data from client as-is and perform server validation. If your server accepts any random data objects and sends them over to clients, you have much bigger problems than you think, whether you're using React or not.

/cc @alexeyraspopov @zerkms

All 43 comments

Hi, Albert from Yahoo Security team here. I was working with @mridgway on studying this issue. Just wanna share more thoughts.

Re: disabling dangeriouslySetInnerHTML, it is a good move, but it would not be sufficient. React allow inject "script" and "style" tag, or "style" attributes. All of those would allow script execution. Also, in general, we worried about UI redressing attack that attackers can create arbitrary overlay on a page that could steal all the input submitted.

I don't think "script" tags will execute because we use innerHTML to create them. Do the "style" tag and "style" attributes allow you to execute (in context) scripts in modern browsers? I think it is only older versions of IE.

Regardless, the UI redressing attack is still an equally valid concern.

I was also having the same doubt on script tag when I saw that it was supported in facebook.github.io/react/docs/tags-and-attributes.html, and then I figured out that script tag is used in server side react and it would be executable. And indeed most style attacks only works in IE8 or minor. But it still contributes to quite some amount of traffic.

I am more inclined to see _isReactElement be used as a validation mechanism. The issue here is that we can't easily distinguish if an JSON object would have execution context or data context. While it is true that server side validation could help, it only works if every time we get an JSON we have a schema to validate it. The nature of JSON, as you suggested, is plain and free form. Otherwise it will be falling back as XML validation that would discourage developers to use it.

In a nutshell, we really need a way to distinguish if a JSON object has executable context (i.e. _isReactElement). Thanks!

None... I think React should stay as light and lean as it can be. Unix philosophy fan here.

Note that JSON rendering of a virtual DOM tree is a perfectly valid use case. E.g. you can prerender a high level component into a smaller level virtual DOM tree on the server and then render the result on the client. This is also one of the ideas we had for web worker rendering where it would construct a virtual DOM that can be postMessaged and rendered on the other side.

Therefore, we also have to consider what options we have if we chose not to protect ourselves against JSON data. How much can we mitigate that scenario?

Possible solution, use different syntax for inserting strings as children:

  • <span>{=this.props.username}</span>: Coerces username to a string.
  • <a href={this.props.url}> .. </a>: Coerces url to a string (as now).
  • <div>{this.renderFooter()}</div>: If this.renderFooter() returns a string or number, warns in 0.14, error in 0.15. Explicitly wrap in a span or use {=str}.

Feels ugly but would solve the problem, wouldn't it?

Coercion doesn't catch mistakes where you miss it. We would need to make that a requirement, e.g. by wrapping strings in some placeholder that is distinct from elements: { __thisIsSupposeToBeAString: str }

That's why I suggested making it an error to pass a string to the non-coercing syntax (plain {}).

I see. That would be difficult to enforce since they can come nested arrays or non-JSX sources.

dangerouslySetInnerHTML is really useful when interfacing with existing Markdown libraries. (I know of one that generates React elements, but that isn't the point I'm trying to make.) Removing this feature over a bug that isn't even React's fault seems pretty scorched earth to me.

@sebmarkbage As schemaless storages are becoming somewhat popular, this is going to become more prevalent. At some level it is a repeat of inserting foreign HTML markup as a result of lack of escaping (React made it a goal to make it a thing of the past), but instead the vulnerability moved to React components where wrongly typed values can lead to insertion of foreign hierarchies.

I'd argue that even if we could somehow guarantee that insertion of foreign hierarchies wouldn't be _dangerous_, it would always be _very bad_. Malicious data will always wreak havoc on a system and React is just one part of it. The danger I see with React being targeted (apart from being easily exploited due to being debuggable client-side) is malicious data being rendered for other users of the same system, it's hard to guarantee that the malcious data won't be able to act freely under the assumed identity of those users.


I think the way forward is to drop implicit wrapping of primitive values. Printable values should be elements like everything else from React's perspective, they should not be implicitly wrapped like they are today. A printable value should look something like React.createValue('foobar') => {type: 'value', props: 'foobar'}, it's still serializable but it's not exploitable. JSX could have some simple syntax sugar for it too, say {=foobar}, and inline text would be automatically wrapped.

While it may seem rather draconian in a sense, I think this is the sensible way forward, HTML introduced a lot of conviences that turned out to be really inconvenient (for user interfaces), this is another one. I think the practical implication of this is not as bad as it seems; any sensible user interface backend would expose labels as <label value={...} />, buttons as <button label={...} />, etc. React.createValue would be largely exclusive to rich-text components where it actually makes sense.

I think the way forward is to drop implicit wrapping of primitive values.

I think to provide safety it would actually have to be the reverse - it should be easier to render a primitive value than to render a component. This is the approach that other front-end frameworks have converged on over time.

For instance <p>{user.name}</p> should always be a primitive render, which is known to be safe. The potentially unsafe version should look different, which causes the reader/author to look more closely, eg. <p>{!:user.name:!}</p> or something similarly wacky. This is possibly too much of a departure from the current way of doing things to be viable?

Note that JSON rendering of a virtual DOM tree is a perfectly valid use case.
In a nutshell, we really need a way to distinguish if a JSON object has executable context (i.e. _isReactElement). Thanks!
React is designed to work with plain objects as input, and in fact, we're even getting rid of the _isReactElement as a way to verify this.

These are conflicting goals, so I think React would have to make a clear choice here, the options I can think of are:

1) Decide that all data is renderable, and increase documentation warning about this
2) Decide that all data is primitive by default, and introduce new JSX syntax to render data intended to be non-primitive
3) Find a way to mark data as "safe".

For point 3, to support the JSON cases, as well as createElement, there might need to be some sort of hydrateElement which can take a plain object and mark it as safe to render eg. <p>{React.hydrateElement(someJSONobj)}</p>.

As proved by the bug which prompted the discussion, a boolean property is not enough to mark some data as executable, as I understand it, the only "safe" option would be an object reference, perhaps something like {type: "div", React: React}, which can be checked using React === React at render-time. This re-introduces the multiple-react problem discussed above, and while that could be alleviated slightly by sharing some arbitrary flag object via global (window || global).___reactMarker = {}, this doesn't really help for iframes, web workers, or multiple windows.

A bit of a brain dump, (1) seems simplest but unsafe. (2) is safest but a large change, (3) seems fiddly and error prone.

Hopefully someone else has a better idea!

Having discussed this with several people now, I'm leaning towards the "mark as trusted" solution. It doesn't have to be a reference identity. It can be a unique enough token string or number. This token has to be shared between environments in a similar way as CSRF. By default we may share it with the realm through the global.

It could also be completely disabled if you chose to.

On Mar 24, 2015, at 10:59 PM, Glen Mailer [email protected] wrote:

I think the way forward is to drop implicit wrapping of primitive values.

I think to provide safety it would actually have to be the reverse - it should be easier to render a primitive value than to render a component. This is the approach that other front-end frameworks have converged on over time.

For instance

{user.name}

should always be a primitive render, which is known to be safe. The potentially unsafe version should look different, which causes the reader/author to look more closely, eg.

{!:user.name:!}

or something similarly wacky. This is possibly too much of a departure from the current way of doing things to be viable?

Note that JSON rendering of a virtual DOM tree is a perfectly valid use case.
In a nutshell, we really need a way to distinguish if a JSON object has executable context (i.e. _isReactElement). Thanks!
React is designed to work with plain objects as input, and in fact, we're even getting rid of the _isReactElement as a way to verify this.

These are conflicting goals, so I think React would have to make a clear choice here, the options I can think of are:

1) Decide that all data is renderable, and increase documentation warning about this
2) Decide that all data is primitive by default, and introduce new JSX syntax to render data intended to be non-primitive
3) Find a way to mark data as "safe".

For point 3, to support the JSON cases, as well as createElement, there might need to be some sort of hydrateElement which can take a plain object and mark it as safe to render eg.

{React.hydrateElement(someJSONobj)}

.

As proved by the bug which prompted the discussion, a boolean property is not enough to mark some data as executable, as I understand it, the only "safe" option would be an object reference, perhaps something like {type: "div", React: React}, which can be checked using React === React at render-time. This re-introduces the multiple-react problem discussed above, and while that could be alleviated slightly by sharing some arbitrary flag object via global (window || global).___reactMarker = {}, this doesn't really help for iframes, web workers, or multiple windows.

A bit of a brain dump, (1) seems simplest but unsafe. (2) is safest but a large change, (3) seems fiddly and error prone.

Hopefully someone else has a better idea!

—
Reply to this email directly or view it on GitHub.

@glenjamin

I think to provide safety it would actually have to be the reverse - it should be easier to render a primitive value than to render a component. This is the approach that other front-end frameworks have converged on over time.

By explicitly having to put all primitive values in a value element it becomes inherently safe, primitive values cannot render outside of it and non-primitive values are stringified inside of it (or w/e).

Personally this makes a lot of sense to me, we already implicitly wrap primitive values internally, this is also how any non-markup/style-inheriting backend would work. There's no such thing as rendering inline text in iOS and most other user interface libraries, it only really makes sense for rendering rich-text content.

This is also how you'd do it if it was implemented in say C/C++ where you would want a single type for all children.

By explicitly having to put all primitive values in a value element it becomes inherently safe, primitive values cannot render outside of it and non-primitive values are stringified inside of it (or w/e).

I see what you mean now, as long as the primitive and non-primitive syntaxes are separate then this is safe. Which is subtly different from HTML escaping of strings, where most strings work correctly when escaping is forgotten.

@glenjamin

Which is subtly different from HTML escaping of strings, where most strings work correctly when escaping is forgotten.

Yeah, but then again, React.createValue() would be the equivalent of document.createTextNode() so it's already there today, it's just that HTML markup implicitly creates text nodes, just like React currently does. But that's also a source of XSS in HTML and now in React too, but React hierarchies are object-based rather string, so React is vulnerable to malicous objects instead.

@sebmarkbage So it seems weird to me to add another "feature" to work around this when it isn't an inherent issue, it's introduced by implicit wrapping of primitive values (which we borrow from HTML, it still doesn't make sense for me for the purpose of user interfaces). I'm quite sure that performance implications of this should be minimal, but it obviously does significantly affect the code in one way or another. That is no doubt a very important consideration, but React has challenged many of the wrong-doings and inconvenient conveniences of HTML so far, I think this is another one that should be.

It doesn't have to be a reference identity. It can be a unique enough token string or number. This token has to be shared between environments in a similar way as CSRF. By default we may share it with the realm through the global.

This sounds like it should work - is the idea that by default it would be randomly generated on the client. but there'd be a top-level API to read/write it?

I like the idea of trust, and React.createElement being the mechanism to declare trust in the current environment.

const trust = element => typeof element === "object" && element
  ? React.createElement(element.type, element.props, ...[].concat(element.children))
  : !element ? false
  : String(element);
// throw if not a string or trusted element
<div>{stringOrDie}</div> 

// this span is trusted because React.createElement
<div><span>hey</span></div>

// explicitly React.createElement the POJO
<div>{trust(stringOrElement)}</div> 

The simplest way to do this is to have each element have a common property on the prototype. This can be exposed globally and/or allowed to be set externally for multiple React instances and integration with other libraries. Alternatively it could be an own non-enumerable property, or R.cE could set a toJSON that excludes this property. It'll cause confusion if you end up sending this to the server, and get an error about an incorrect 'token', rather than a missing one.

Adding extra syntax for this would be unfortunate. A simple helper function exposed on React would be good, because this is an explicit opt-out to a (future) react security feature, like angular's $sce.trustAsHtml.

The case where you have a serialized element is cool, but I don't think asking for a little more explicitness will harm anyone... users, or the person maintaining the code. "Oh, hey this is rendering something we get from the server, I need to be careful with this code."

Also... please throw, don't warn or String(it).

I might be missing something, but why does React need to complicate inserting stuff like conditional JSX elements by forcing users to wrap them, when it's not issue of React nor of 99% of use-cases when server is written properly.

As far as we understood from original report, the "issue" was that server was accepting JSON object of any schema (wat?) and saving/sending it as-is (wat??), while most use-cases pick needed fields and, of course, validate them against type/range expectations. That's the golden rule of code security - never trust data from client as-is and perform server validation. If your server accepts any random data objects and sends them over to clients, you have much bigger problems than you think, whether you're using React or not.

/cc @alexeyraspopov @zerkms

@RReverser :+1:

Also, @sebmarkbage are there any links with demo code to reproduce the issue?

There is my attempt to reproduce this _dangerous_ issue. As you can see, HTML code is added to body, but script doesn't work. Client-side rendering is protected.

Ignore that difficult string concatenation, JSFiddle uses client-side version of JSXTransformer.

I might be missing something, but why does React need to complicate inserting stuff like conditional JSX elements by forcing users to wrap them, when it's not issue of React nor of 99% of use-cases when server is written properly.

@RReverser That applies to HTML as well and look at how that turned out :) React is obviously less susceptible as it relies on objects, but one missing/insufficient check somewhere (and a susceptible backend, say JSON schema) and an attacker can potentially act on behalf of all users visiting the site, that's the only reason I see why React should make an effort to prevent it (like it did with HTML markup).

@alexeyraspopov __html: "<img onload='alert()' src='/favicon.ico' />".

@brigand okay :) And what can we tell about backend developer which allows server to receive JSON without fixed schema?

UPD: this type of XSS works in _every_ JS framework/lib which allows you to render HTML.

@alexeyraspopov

Consider mustache, it provides two flavours of template iterpolation: {{}} and {{{}}}.

<p>{{name}}</p> <!-- no XSS if name contains HTML -->
<p>{{{name}}}</p> <!-- XSS if name contains HTML -->

The default is safe, and template authors can opt-in to unsafe behaviour, with the understanding that the variable used in the unsafe portion will be checked more stringently.

React also has a feature, where the unsafe version is even more awkward - so it cannot be missed:

<p>{name}</p> {// safe}
<p dangerouslySetInnerHTML={{__html: name}} /> {// unsafe}

However, because React is now translating createElement calls into literals, the "safe" version is not safe at all.

This creates a false sense of security. Good security is like an onion, it should have many layers.

So yes, developers shouldn't allow arbitrary data - but there's a reason the top entries on the OWASP top 10 are related to untrusted data passing through the system - it happens in practice.

Any reasonable steps popular libraries can take which makes it hard to makes these mistakes has a very high net gain.

@syranide I see your point. You might be right. The usability and performance issues might be deal breakers - especially when other solutions are less invasive.

I'm still trying to wrap my head around where the line can be drawn between various data types. What is is that makes a string safe when its wrapped and when it is not. What makes an object not safe and when is it safe?

I am concerned that it is still too easy to do the wrong thing even with an explicit syntax:

The difference between something like this is very subtle:

<div>{=this.props.user}</div>
<div>{this.props.user}</div>

And both are legit use cases. You would have to actively test these code branches to notice that you did something wrong when you expected a string. Everyone has these kind of edge case code branches that go into production without ever being properly tested. Would this be a false sense of security as well?

Ultimately, I don't think that validating data structures is something that React needs to necessarily concern itself with. (We have type systems for that.)

The reason I think we need to address this particular issue is because the cost (XSS) is too high. I think it is fine if this mistake causes your site to break. I even think it would be ok if we had something that allows for UI redressing in this scenario. There are plenty of other ways to get to that point.

However, it is not ok that a simple mistake like this causes full cross-site script execution. This is too high of a cost/probability ratio. The risk is too high even when the probability is low.

I think the argument for the sourceID solution is that it specifically addresses the tag creation scenario which is causes these high risk scenarios.

There are still other issues that are not solved by simple data types alone:

<a href={this.props.userProvidedURL} />
<div style={{ border: this.props.userProvidedBorder }} />

With React.dangerouslyTrustAllSources(); available there is a high chance that people will blindly allow it since they "found it on stackoverflow" or "read in an article written by that guy" which would mitigate all the "protection" brought.

As a result - people who have no idea what they do will make them "vulnerable" themselves. And people with security in mind would continue writing as protected code as before (but now using more complex tool that requires more attention).

The question: what have the product (reactjs) and the community gained from it then?

I didn't know react allowed style={{border: '1px solid black; background: red'}} or href={'javascript:alert();void 0;'}... these seem like a simple fixes that should be done.

The href is bad because it's common; the style is very bad because it's very unexpected. <div style={{foo: 'bar'}}/> => div.style.foo = 'bar' is expected, and safe escaping or error for the innerHTML/renderToString case.

If someone really needs them, they can set it manually or user __html; I doubt any real react code would be broken by these.

I'm still trying to wrap my head around where the line can be drawn between various data types. What is is that makes a string safe when its wrapped and when it is not. What makes an object not safe and when is it safe?

@sebmarkbage I think to me it kind of comes down to "Foobar", it's currently both an element and a primitive. If you instead imagine something like <MyButton label={createValue('Foobar')} />, whatever you put inside createValue becomes opaque, toString-ed and safe to render. Supplying a primitive value label would throw.

In this example, I would argue that MyButton.label should sensibly only accept primitive values, <MyButton label="Foobar" />. MyButton would internally call createValue(this.props.label). So whatever you value you pass label will be safe to render. If MyButton would instead accept elements only, then passing a non-element would throw.

I am concerned that it is still too easy to do the wrong thing even with an explicit syntax:
The difference between something like this is very subtle:
...

Yeah (regardless of syntax really). But createElement('div', {}, 'Foobar') will throw, and createValue(createElement('div')) will helpfully throw, warn or toString it to something harmless. So yes, you _can_ still mess it up, but if you run it and it looks correct then it is correct, a rendered value can no longer turn into arbitrary elements if given malicious data which it previously could.

Ultimately, I don't think that validating data structures is something that React needs to necessarily concern itself with. (We have type systems for that.)

With static type checking you would catch the above mistake without running the code, implicit wrapping prevents that as far as I see.

The usability and performance issues might be deal breakers - especially when other solutions are less invasive.

I suspect that performance won't be an issue in the end, static values can be trivially reused and pooling should make dynamic values virtually free. But, yes there are some obstacles.

Usability is a very valid point. Any frontend that doesn't have style inheritance wouldn't understand createValue and would be unaffected, a string wouldn't be renderable by itself, to render text you explicitly render <label value="Foobar" ... /> (or <label ...>Foobar</label>). So it's only frontends with style inheritance (like HTML) or rich-text components that would be affected by the introduction of createValue.

I don't have it all laid out in my head yet, but implicit behavior can be fragile/dangerous and we've seen that implicit wrapping is, workarounds will only reduce the problem, not fix it. So I think it's a matter of realizing that the implicit model is flawed in some way, can we take the explicit model and somehow make it almost as friendly as the implicit for such frontends? I suspect the answer is yes.

PS. I realize this really should be a separate discussion, it's not really about solving this XSS issue in any immediate sense. It's about whether or not the implicit model is too flawed. Less invasive solutions are far more realistic at this point.

@brigand We've already evaluated and decided against protecting against invalid styles. It is not a simple fix. When we last evaluated it, we found that:

1) It was prohibitively expensive to do this automatically for you for any given value.
2) A whitelist of safe patterns would be massively huge and we actively want to move away from whitelists in React and fallback to the underlying DOM.
3) Any blacklist we could come up with would likely be incomplete anyway.

IE's CSS expressions is another one. Various url and paint expressions. Semicolons was not the only concern, but also various complex parsing rules.

I agree that this is bad because it relies on an implementation detail of React (HTML serialization instead of element.style.foo =). Hopefully one day, the slowest browsers will be fast enough to use that technique instead.

As for javascript: in href. Perhaps that should be protected against but it is also arguable that would be React overreaching. I do agree that the cost of accidentally doing this is very high which is why we've added additional protection against innerHTML.

At the end of the day, we can't protect against everything crazy in the DOM. We target it to interop with the environment that is there. We can put higher demands on new abstractions such as React Native.

It is also highly recommended that you build your own set of high level components instead of targeting HTML directly. This is what we do at Facebook because "semantic HTML" is not enough.

@syranide I think this is a valuable conversation to have in this thread since this is a generic catch-all issue and not necessarily only related to one issue alone (see the title).

I don't really think that there is anything special about primitive values. A string is only one kind of data structure and we'd want to support many different ones. For higher level components you might have objects that represent the same kind of values. Even strings could be reimplemented with a different data structure in something like immutable.js. That may also be JSON serializable.

The key issue here is the union type. You have a set that accepts one of multiple types and you want to make sure that the user explicitly decides which one they put there. I think that ultimately that become prohibitively verbose. That is why we have union types (or overloads for that matter).

FWIW, we have a lot of components at FB that accept both strings or rich text. We need a convenient way to pass rich text through out the system too. It should also be possible to build it up in one place, componentize and use it in another. Since these rich text elements could include animation etc. it might make sense to allow them to be stateful too.

In that scenario, you would want the underlying component rendering of a label to accept a subset of components for its rendering. That's what we have a type system for (except propTypes is a dev-only system which doesn't protect against this kind of attack).

@sebmarkbage Perhaps I wasn't entirely clear (I think). It seems to me that "everything is rich-text" model of HTML is flawed; it's great for documents but it's immensely fragile for user interfaces. Documents can be rendered within a user interface, but you don't render a user interface within a document, so it doesn't make sense to me why we should be using the document model of HTML as the fundamental design language if it weren't for it being the current render target. I would argue that any mature React user interface should do its best to avoid the HTML style inheritance/document model. I.e.

<div>
  <span style={...}>Click that</span>
  <button style={...}>Click me</button>
</div>

...and not...

<div style={...}>
  Click that
  <button style={...}>Click me</button>
</div>

In a more simpler more traditional setup I would write that as:

<panel>
  <label value="Click that" style={...} />
  <button label="Click me" style={...} />
</panel>

...or if you want a more generic button implementation (whatever is in children is the label)...

<panel>
  <label value="Click that" style={...} />
  <button>
    <label value="Click me" style={...} />
  </button>
</panel>

There is no longer a universal understanding of how to render a string. Components are inherently isolated and reusable from a style perspective, primitives cannot be rendered so the this XSS issue goes out the window and even white-space rules become irrelevant. There's an inherent technical beauty in that if you will that's as unambigious as it gets (for whatever it's worth), it could be a very formal no nonsense syntax that would be hard to argue with... it could even be marketed as a new general purpose primitive type that's not limited to user interfaces (ahem, we're basically talking about an XML subset now...).

That should be our starting point in my opinion. The only thing that's stands out is rich-text components which usually finds themselves useful in most user interfaces, so they should definitely not be forgotten. But it doesn't necessarily mean that rich-text should have a dedicated syntax or perhaps not the current syntax.

React is currently targeting HTML which makes this a daunting aspect, but that's only because HTML provides all these conveniences, if we would have targeted anything else I'm quite sure React/JSX wouldn't have walked down this path without significant consideration. I'm not sure what the solution is, but HTML sets itself squarely apart from all other user interface targets and doesn't really prove itself exceptionally competent so I think it's only sane to question whether HTML is really an ideal role model for React/JSX user interfaces.

React Native's <Text> almost works like that, though you can have both strings and other Text elements as children of a Text. Other components don't accept text children like that.

@syranide Mid/Long term I think you're right that this is the kind of architecture that React and its users should be moving towards. However it is not an easy shift to do immediately.

I don't see that the trusted source solution needs to block any important features that we would've gotten, by shifting to the newer architecture. So it seems like a safe short term solution that also doesn't block anything we were planning to do anyway.

@sebmarkbage Agree and agree.

I've been thinking about the priorities here. As I see it there are three threats:

A) XSS through dangerouslySetInnerHTML, href="javascript:..." or script tags etc.
B) Clickjacking by <a href="..." />, <form /> or similar.
C) Other types of UI redressing scenarios that doesn't involve interactivity.

If we agree that React is the secondary layer to an already broken issue, and the real solution always is to fix the JSON pass-through hole. Then it follows that React is not responsible for providing the permanent solution. React is only responsible for mitigating the impact, IF this hole occurs.

Full XSS (Threat A) is clearly a too high of a cost of this bug and even if you fix it quickly the harm is already done.

Getting an embarrassing redressing message or phishing attempt through a static message (Threat C) might not be as bad. Even if they do happen, there's no guarantee that there is any significant harm done. They can then be fixed by addressing the pass-through hole, before any real harm is done.

If we agree that we only need to protect against A and possibly B, then we have a lot more possibilities to limit the scope of the problem to a few sensitive attributes.

Another thing that I would like to start thinking about is.... What if we ran React in a worker? What if we allowed a cross-origin script to communicate through the worker and render arbitrary ReactElements (which could be targeting higher level abstractions or real DOM nodes)? What kind of things would we need to protect against then?

The point of that scenario is explicitly to allow arbitrary UI elements in a subtree of the page (probably with overflow: hidden) which means explicitly allowing for Threat C and probably even B.

@sebmarkbage As far as I'm concerned those are all related to HTML and not React (except for this particular issue). So while they should be addressed, doing so without implementing a much safer abstraction of HTML (which it seems we won't) doesn't seem worthwhile and I'm sure there would still be many tricks that ReactDOM might simply not be able to guard against. I think publicly documenting and explaining all the known pitfalls would go a long long way. It seems to me that the _biggest_ issue right now is the general lack of awareness, not mistakes.

E.g. what are all the dangers of a user supplied href? javascript: is one, but it applies to 3rd-party protocols too, including urls without a protocol/domain part, so you should always block everything but say http:// and https://. You should also beware of links that point to the current domain which could possibly perform operations on behalf of that user. React simply cannot guard against this, it needs to be documented and possibly be made available as ready-made helpers.

It's also important to document that style={{backgroundColor: userColor}} is also exploitable and allows injection of arbitrary CSS. It won't stop the problem, but just having it publicly documented is a big step (seeing as it's unlikely that we can ever out-right prevent it when targeting HTML).

How about taking another approach here? Instead of wrapping createElement wrap component functions and classes. that way the output of these functions will be plain objects and in big trees of elements it will significantly reduce the function calls.

const MyComponent = (props) => <div />;

to

const MyComponent = component((props) => <div />);

component() will handle adding signatures for the generated objects.

The problem is that wrapping after render returns doesn't help you distinguish between <div>{good}</div> and <div>{evil}</div>.

When is this needed @spicyj ?

I think the original post explains the issue in detail. If this is problematic for you for some reason, please open a new issue.

@sebmarkbage

All string values are sanitized before inserted into the DOM (except for CSS styles which is a known wontfix issue for IE8).

Do you mind explaining why other, modern browsers cannot be coerced into XSS via CSS styles?

Was this page helpful?
0 / 5 - 0 ratings