Lit-html: Unicode characters inside style of a html template are not recognized

Created on 31 May 2018  路  15Comments  路  Source: Polymer/lit-html

Flloing js file is returning "undefined" for the policy.

import {html} from '@polymer/lit-element/lit-element.js';

export const policy = html`
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type"></meta>
    <style type="text/css">
        .list>li:before {
            content: "\0025a0  "
        }
    </style>
    <title>Test unicode in lit-html </title>
</head>

<body>
    <ul class="list">
        <li> list item 1</li>
        <li> list item 2</li>
    </ul>
</body>

</html>`
Medium Bug

Most helpful comment

This works. Escape with '\'.

content: "\\0025a0";

All 15 comments

I think I'll need more information here. The html template tag should never return undefined - it always returns a TemplateResult.

Does the value really change depending on the unicode escape sequence?

I haven't investigated this whatsoever, but this could be related to how unicode escape sequences are handled by tagged template literals:

image

this could be related to how unicode escape sequences are handled by tagged template literals

Correct. This is because \0 falls into weird octal escape rules. The correct way is to use either hex escape (\x00), or unicode escape (\u0000).

This works. Escape with '\'.

content: "\\0025a0";

Although template tag html only uses the raw strings, older JS engines still needed to compute the cooked strings.

See Stage 1 Draft / July 26, 2016 Template Literal Revision

Remove the restriction on escape sequences.

Lifting the restriction raises the question of how to handle cooked template values that contain illegal escape sequences. Currently, cooked template values are supposed to replace escape sequences with the "Unicode code point represented by the escape sequence" but this can't happen if the escape sequence is not valid.

You can see this in

function f(strings) {
  console.log(`cooked=${ JSON.stringify(strings) }`);
  console.log(`raw=${ JSON.stringify(strings.raw) }`);
}

function g() {
  "use strict";
  f`\01`;
}

g();

In older JS engines you'd get an error message like the one you describe but in newer ones you get:

cooked=[null]
raw=["\01"]

Although template tag html only uses the raw strings

html doesn't use raw strings. For invalid escape sequences, the item this.strings[i] is undefined. This will cause a reference error here but if you don't use interpolation, you'll get 'undefined' because of this line.

image

html doesn't use raw strings.

Wow! Thanks for pointing that out.

IMO, it really should.
I would expect

html`<style>li.inline:after { content: "\2c" }</style>`

to correspond to that HTML and not fail because \2 is octal.

IMO, it really should.

I agree.
But in the meantime we can build our own raw-string-version of html:

const raw_html = (strings, ...values) => {
    const newStrings = [...strings.raw];
    newStrings.raw = strings.raw;
    html(newStrings, ...values);
}

We tried using raw strings for Polymer 3's template strings, but hit issues with developers expecting JavaScript escape sequences to work properly. For instance, they expected this to work:

html`<pre>a\nb</pre>`

And output "a" and "b" on separate lines.

@justinfagnani Yeah, but it seems easier to do

html`<pre>a${'\n'}b</pre>`

to embed newlines than to have to remember to double-escape in

html`<script>alert('\n')</script>`

@mikesamuel Are you sure? That's a JavaScript string embedded in another JavaScript string. We always have to escape in that case. I think it's more unfamiliar to developer to say that _in this case_ you don't have to double escape.

I also hope that devs are almost never putting JavaScript inside their lit-html templates. It doesn't do much anyway.

I think reasonable engineers can disagree here on the basic point of whether lit-html templates are JavaScript strings or HTML. Given that we have other deviations from plain HTML, I tend to think of them as JavaScript strings still, that contain markup.

That's a JavaScript string embedded in another JavaScript string. We always have to escape in that case.

String.raw`<script>alert('\n')</script>`;

That's what I always do when I have to handle code in JS because you can just write code like you always do.

Scripts inside templates aren't the only problem. The original reason for this issue was simple CSS.
I also had issues when the template contained Java code (embedded in a code tag for display).

Given that we have other deviations from plain HTML

@justinfagnani Are these documented somewhere?

I filed an issue on eslint-plugin-lit to warn on illegal escape sequences. cc @43081j

consider supporting using a special placeholder for Unicode escape in lit-eleemtn env.

@justinfagnani We could probably pursue a change to spec to remove the octal reservations from https://tc39.es/ecma262/#prod-NotEscapeSequence. It's a simple enough change, and has a clear cut use case.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

depeele picture depeele  路  3Comments

justinfagnani picture justinfagnani  路  3Comments

justinfagnani picture justinfagnani  路  4Comments

erichiggins picture erichiggins  路  4Comments

justinfagnani picture justinfagnani  路  3Comments