Typescript: Don't escape valid Unicode characters in strings

Created on 14 Jan 2020  路  6Comments  路  Source: microsoft/TypeScript


TypeScript Version: 3.7.4

Code

const sf = createSourceFile(
  'aaa',
  'const a: string = "鍝堝搱"',
  ScriptTarget.Latest
)
// try to do sth in transfrom.
const result = transform(sf, [])
const printer = createPrinter()
const printed = printer.printNode(
  EmitHint.SourceFile,
  result.transformed[0],
  sf
)
console.log(printed)

Expected behavior:
const a: string = "鍝堝搱"

Actual behavior:
const a: string = "\u54C8\u54C8";

I am trying to use compiler api to do some transform. but the Printer seems could not generate the decoded unicode characters. wonder how to do this right?

Bug Difficult help wanted

Most helpful comment

Hitting same issue. Our workaround:

    let content = printer.printFile(file);
    content = unescape(content.replace(/\\u/g, "%u"));

All 6 comments

i am seeing the api here.

const realPath = path.resolve(__dirname, './utf8.ts')
const program = createProgram([realPath], {
  target: ScriptTarget.ES2017,
  module: ModuleKind.ES2015,
  allowJs: true,
  jsx: JsxEmit.Preserve,
})
// use it, got expected answer
// program.getTypeChecker()
const result = transform(sf, [])
const printer = createPrinter()
const printed = printer.printNode(
  EmitHint.SourceFile,
  result.transformed[0],
  sf
)
console.log(printed)

same here, use the program api, the file content is basic: 'const a: string = "鍝堝搱"'.
but got result: const a: string = "\u54C8\u54C8";
but when i use: program.getTypeChecker(), i got expected answer like: const a: string = "鍝堝搱".
wonder why this happens?

It's not that you're doing anything wrong - our implementation just escapes any characters outside of the printable range of ASCII characters. Nowadays e might be equipped to do a little better given that we have the set of valid unicode identifier characters.

Is there a reason this emit is a problem for you?

characters

we use the transform api to deal our source code, for example

const a:string = '鍝堝搱' => const a: string = i18n('鍝堝搱'), so we can search our codebase to replace all the chinese string to use i18n, but if typescript escapes any characters outside of the printable range of ASCII characters, our code base will be wired

is there any solutions let me keep my chinese string, thanks

I don't think we should escape these unless there's some hard necessity.

No, it was strictly ease of implementation at the time. I'm marking this as Difficult because any contribution needs very thorough test code.

Hitting same issue. Our workaround:

    let content = printer.printFile(file);
    content = unescape(content.replace(/\\u/g, "%u"));
Was this page helpful?
0 / 5 - 0 ratings

Related issues

dlaberge picture dlaberge  路  3Comments

Antony-Jones picture Antony-Jones  路  3Comments

blendsdk picture blendsdk  路  3Comments

weswigham picture weswigham  路  3Comments

MartynasZilinskas picture MartynasZilinskas  路  3Comments