React: head > meta > content escaping issue

Created on 12 Oct 2018 · 10Comments · Source: facebook/react

Do you want to request a feature or report a bug?

I'm guessing it's a bug.

What is the current behavior?

The following source code,

<meta property="og:image" content="https://onepixel.imgix.net/60366a63-1ac8-9626-1df8-9d8d5e5e2601_1000.jpg?auto=format&q=80&mark=watermark%2Fcenter-v5.png&markalign=center%2Cmiddle&h=500&w=500&s=60ec785603e5f71fe944f76b4dacef08" />

, is being escaped once server side rendered:

<meta property="og:image" content="https://onepixel.imgix.net/60366a63-1ac8-9626-1df8-9d8d5e5e2601_1000.jpg?auto=format&amp;q=80&amp;mark=watermark%2Fcenter-v5.png&amp;markalign=center%2Cmiddle&amp;h=500&amp;w=500&amp;s=60ec785603e5f71fe944f76b4dacef08"/>

You can reproduce the behavior like this:

const React = require("react");
const ReactDOMServer = require("react-dom/server");
const http = require("http");

const doc = React.createElement("html", {
  children: [
    React.createElement("head", {
      children: React.createElement("meta", {
        property: "og:image",
        content:
          "https://onepixel.imgix.net/60366a63-1ac8-9626-1df8-9d8d5e5e2601_1000.jpg?auto=format&q=80&mark=watermark%2Fcenter-v5.png&markalign=center%2Cmiddle&h=500&w=500&s=60ec785603e5f71fe944f76b4dacef08"
      })
    }),
    React.createElement("body", { children: "og:image" })
  ]
});

//create a server object:
http
  .createServer(function(req, res) {
    res.write("<!DOCTYPE html>" + ReactDOMServer.renderToStaticMarkup(doc)); //write a response to the client
    res.end(); //end the response
  })
  .listen(8080); //the server object listens on port 8080

editor: https://codesandbox.io/s/my299jk7qp
output : https://my299jk7qp.sse.codesandbox.io/

What is the expected behavior?

I would expect the content not being escaped. It's related to https://github.com/zeit/next.js/issues/2006#issuecomment-355917446.
I'm using the og:image meta element so my pages can have nice previews within Facebook :).

capture d ecran 2018-10-12 a 14 15 26

Which versions of React, and which browser / OS are affected by this issue? Did this work in previous versions of React?
16.5.2

Server Rendering Needs Investigation

Source

oliviertassinari

👍16 👀2 🚀2

Most helpful comment

This is the change needed to get the behavior you expect:

Replace https://github.com/facebook/react/blob/ee409ea3b577f9ff37d36ccbfc642058ad783bb0/packages/react-dom/src/server/ReactPartialRenderer.js#L383

with an escape hatch:

if (tagVerbatim === 'meta' && propKey === 'content') {
  markup = 'content="' + propValue + '"';
} else {
  markup = createMarkupForProperty(propKey, propValue);
}

This would explicitly exempt the meta tag's content attribute from being properly escaped which wouldn't help @oliviertassinari's issue of wanting <span data-src={'&'}></span>.

A more generic solution would involve having something like dangerouslySetAttributes

<span
  dangerouslySetAttributes={{__attributes: [{name: 'data-src', value: '&'}]}}
/>

This could easily lead to parsing errors and unexpected results if any value after the & is a named character reference e.g. &copy (without the ;)

Again, the issue was with the HTML parser FB was using for the Sharing Debugger, not React. It is properly parsing the escaped paths now.

jbraithwaite on 18 Aug 2020

👍2

All 10 comments

Has this ever worked as you intend? Can you send a fix?

gaearon on 1 Nov 2018

We are solving the problem this way:

import Entities from 'html-entities/lib/html5-entities'

const entities = new Entities()
const contentRegExp = /content="([^"]+)"/g
const handleContent = (match, content) => {
  return `content="${entities.decode(content)}"`
}

html = html.replace(contentRegExp, handleContent)

We spend ~1ms per request in the path. It's not too bad. I can give it a look at some point.

oliviertassinari on 1 Nov 2018

I have found this related issue: #6873. Digging into the implementation, the behavior comes from
https://github.com/facebook/react/blob/0005d1e3f54b79fe4707fbccc44b89e0fb4ce565/packages/react-dom/src/server/DOMMarkupOperations.js#L61
⬇️
https://github.com/facebook/react/blob/b87aabdfe1b7461e7331abb3601d9e6bb27544bc/packages/react-dom/src/server/quoteAttributeValueForBrowser.js#L17
⬇️
https://github.com/facebook/react/blob/b87aabdfe1b7461e7331abb3601d9e6bb27544bc/packages/react-dom/src/server/escapeTextForBrowser.js#L108

Now, all the escaping tests I could find are covering the children use case:
https://github.com/facebook/react/blob/b87aabdfe1b7461e7331abb3601d9e6bb27544bc/packages/react-dom/src/__tests__/escapeTextForBrowser-test.js#L23-L24
I have limited knowledge of web escaping related security issues.
I don't see any harm potential with:

 const response = ReactDOMServer.renderToString(<span data-src={'&'}></span>); 
 expect(response).toMatch('<span data-reactroot="" data-src="&"></span>');

oliviertassinari on 14 Jan 2019

I have the same problem in the content of <style> elements:

const React = require("react");
const ReactDOMServer = require("react-dom/server");

console.log(ReactDOMServer.renderToStaticMarkup(
  <html>
    <head>
      <link
        href="https://fonts.googleapis.com/css?family=Source+Sans+Pro"
        rel="stylesheet"
      />
      <style>{`
        html {
          font-family: "Source Sans Pro", sans-serif;
        }
      `}</style>
    </head>
    <body>
      <p>Test.</p>
    </body>
  </html>
));

This outputs:

<html><head><link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro" rel="stylesheet"/><style>
        html {
          font-family: &quot;Source Sans Pro&quot;, sans-serif;
        }
      </style></head><body><p>Test.</p></body></html>

By the parsing rules in the HTML spec (I'm consulting WHATWG here), the contents of elements style, xmp and iframe (as well as noscript, noframes and noembed when they're not being rendered) are parsed with the RAWTEXT tokenizer state, which treats everything as plaintext until it finds a matching closing tag.

Escaping the contents of style elements _is_, however, valid (in fact, mandatory for angled brackets) in the XML syntax of HTML; and indeed, adding an xmlns="http://www.w3.org/1999/xhtml" attribute to the <html> element results in valid XML. But if the intention of ReactDOMServer is indeed to render XML syntax, that should be explicitly noted in the documentation, because there are a number of tools (such as Next.js) which serve the output of these functions with content-type text/html.

andreubotella on 6 Mar 2019

@andreubotella This is a different problem, you should use dangerouslySetInnerHTML. Can an admin mark the comments as "resolved"?

oliviertassinari on 6 Mar 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution.

stale[bot] on 10 Jan 2020

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

stale[bot] on 17 Jan 2020

This is not a bug in React. Using an entity reference for & (e.g. &) is the correct behavior for xhtml documents:

In both SGML and XML, the ampersand character ("&") declares the beginning of an entity reference (e.g., ® for the registered trademark symbol "®"). Unfortunately, many HTML user agents have silently ignored incorrect usage of the ampersand character in HTML documents - treating ampersands that do not look like entity references as literal ampersands. XML-based user agents will not tolerate this incorrect usage, and any document that uses an ampersand incorrectly will not be "valid", and consequently will not conform to this specification. In order to ensure that documents are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as an entity reference (e.g. "&"). For example, when the href attribute of the a`element refers to a CGI script that takes parameters, it must be expressed ashttp://my.site.dom/cgi-bin/myscript.pl?class=guest&name=userrather than ashttp://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user`.

In the HTML spec you do not need to use a character reference for & as long as what follows it is not a string that forms a named character reference.

The example they give is:

<a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference -->
<a href="?art&amp;copy">Art and Copy</a> <!-- the & has to be escaped, since &copy is a named character reference -->

Personally, I feel like React made the right call with escaping & since that works in both XHTML and HTML5.

jbraithwaite on 17 Aug 2020

👎1 👍1

In meta tags escaped paths don't work... otherwise, this bug would not have be opened.

equinusocio on 18 Aug 2020

👍1