Cheerio: Chinese characters are converted to unicode

Created on 26 Feb 2014  Â·  8Comments  Â·  Source: cheeriojs/cheerio

before:

<a target="_blank" title="首页"></a>

after:

<a target="_blank" title="&#x9996;&#x9875;"></a>

How to solve this problem..

Most helpful comment

Cheerio now recognizes the decodeEntities flag, setting it to false should
do the trick.

All 8 comments

Why is this an issue?

@rubyless 首页 -> "&#x9996;&#x9875;" it is right .....

because it is utf8 html encoding ..

@youxiachai but i still want it be a chinese character。

@rubyless The conversion is helpful when inserting arbitrary data to a page (note: It _does not_ prevent XSS). But at least you don't have to worry about character encodings.

Of course, it's helpful to obtain the original encoding while editing a document by hand. But as that shouldn't be a primary use-case of cheerio, I'm closing this.

Thanks a lot

@fb55
I found this problem too.
And when we edit some Chinese character in this file by hand (or others people read or edit these file ), It's unreadable with these unicode characters.

So, May I have a option to decide whether we need convert Chinese Character to Unicode ?

Thank you

Cheerio now recognizes the decodeEntities flag, setting it to false should
do the trick.

@fb55
The method has resolved my problem.

.pipe(cheerio({
    run: function($, file) {},
    parserOptions: {
        decodeEntities: false
    }
}))

Thank you

Was this page helpful?
0 / 5 - 0 ratings

Related issues

becush picture becush  Â·  3Comments

AlbertoElias picture AlbertoElias  Â·  4Comments

gajus picture gajus  Â·  4Comments

francoisromain picture francoisromain  Â·  5Comments

Canop picture Canop  Â·  3Comments