Cheerio: .html() method returns html with UNICODE when Chinese words are passed

Created on 22 Sep 2014  ·  8Comments  ·  Source: cheeriojs/cheerio

var cheerio = require('cheerio');
var $ = cheerio.load('<title>文章抓取</title>');
console.log($('title').html());

<=0.15.0 output:


0.16.0-0.17.0 output:

&#x6587;&#x7AE0;&#x6293;&#x53D6;

Most helpful comment

Try passing decodeEntities: false.

All 8 comments

but

$('title').text()

goes fine, can you guys figure it out? thx a lot.

Try passing decodeEntities: false.

@fb55
Hi, Felix, I've tried all the combinations of _normalizeWhitespace_, _xmlMode_ and _decodeEntities_, still not working, the .html returns decoded UNICODE as usual, but every thing works fine with returning .text() or .attr([ATTR_NAME])

more information, i've test on LINUX, MAC and PC, pure node.js environment, the app.js were saved as 'UTF-8' format, cheerio version:
0.15.0(√)
0.16.0(x)
0.17.0(x)

Every thing goes fine now, weird, sorry for the bother...

@fb55
thanks! it works!

@fb55 Thanks, worked like a charm.

@fb55 thanks

Was this page helpful?
0 / 5 - 0 ratings