Hello,
I've stumbled upon a little issue while parsing XML files with cheerio and I do not know whether things have been intended this way or if there is really an underlying problem here.
If you try to retrieve children from a tag containing self-closing children, you will only get the children up to the first self-closing child.
However I noticed that some tags such as img that are popularly self-closed, don't abide by the same problem. I therefore think that a white list concerning those is used but does not work for the other ones.
Here is a tiny script illustrating the problem (working with cheerio 0.18.0) :
var cheerio = require('cheerio');
var $scxml = cheerio.load('<div><folder></folder><one /><two /><three /></div>'),
$imgxml = cheerio.load('<div><folder></folder><img /><one /><two /><three /></div>'),
$noscxml = cheerio.load('<div><folder></folder><one></one><two></two><three></three></div>');
console.log('Starting test...\n');
// With self-closing tags
console.log('With self-closing tags:');
$scxml('div').first().children().each(function() {
console.log('--' + $scxml(this)[0].name);
});
// With self-closing tags and propably whitelisted ones
console.log('\nWith img and self-closing tags:');
$imgxml('div').first().children().each(function() {
console.log('--' + $imgxml(this)[0].name);
});
// Without self-closing tags
console.log('\nWithout self-closing tags:');
$noscxml('div').first().children().each(function() {
console.log('--' + $noscxml(this)[0].name);
});
console.log('\nDone');
And here is the console output if you run the said script:
Starting test...
With self-closing tags:
--folder
--one
With img and self-closing tags:
--folder
--img
--one
Without self-closing tags:
--folder
--one
--two
--three
Done
Self closing tags are not working as expected in Cheerio.
Consider the html contained a self-closing tag.
var cheerio = require('cheerio'),
$ = cheerio.load('<div> <img src="a.jpg" /> </div>');
var output = cheerio.html();
output here is <div> <img src="a.jpg"> </div> and self-closing tag breaks.
If the html content contained some xml. Browser does not load the broken html.
Nevermind, I found that using the xmlMode setting to true does solve this problem.
var $ = cheerio.load('XMLSTRING', {xmlMode: true});
Most helpful comment
Nevermind, I found that using the
xmlModesetting to true does solve this problem.