Cheerio: Children and self-closing tags

Created on 13 Nov 2014  路  2Comments  路  Source: cheeriojs/cheerio

Hello,
I've stumbled upon a little issue while parsing XML files with cheerio and I do not know whether things have been intended this way or if there is really an underlying problem here.

If you try to retrieve children from a tag containing self-closing children, you will only get the children up to the first self-closing child.

However I noticed that some tags such as img that are popularly self-closed, don't abide by the same problem. I therefore think that a white list concerning those is used but does not work for the other ones.

Here is a tiny script illustrating the problem (working with cheerio 0.18.0) :

var cheerio = require('cheerio');

var $scxml = cheerio.load('<div><folder></folder><one /><two /><three /></div>'),
    $imgxml = cheerio.load('<div><folder></folder><img /><one /><two /><three /></div>'),
    $noscxml = cheerio.load('<div><folder></folder><one></one><two></two><three></three></div>');

console.log('Starting test...\n');

// With self-closing tags
console.log('With self-closing tags:');
$scxml('div').first().children().each(function() {
  console.log('--' + $scxml(this)[0].name);
});

// With self-closing tags and propably whitelisted ones
console.log('\nWith img and self-closing tags:');
$imgxml('div').first().children().each(function() {
  console.log('--' + $imgxml(this)[0].name);
});

// Without self-closing tags
console.log('\nWithout self-closing tags:');
$noscxml('div').first().children().each(function() {
  console.log('--' + $noscxml(this)[0].name);
});

console.log('\nDone');

And here is the console output if you run the said script:

Starting test...

With self-closing tags:
--folder
--one

With img and self-closing tags:
--folder
--img
--one

Without self-closing tags:
--folder
--one
--two
--three

Done

Most helpful comment

Nevermind, I found that using the xmlMode setting to true does solve this problem.

var $ = cheerio.load('XMLSTRING', {xmlMode: true});

All 2 comments

Self closing tags are not working as expected in Cheerio.
Consider the html contained a self-closing tag.

var cheerio = require('cheerio'),
    $ = cheerio.load('<div> <img src="a.jpg" /> </div>');

var output  = cheerio.html(); 

output here is <div> <img src="a.jpg"> </div> and self-closing tag breaks.

If the html content contained some xml. Browser does not load the broken html.

Nevermind, I found that using the xmlMode setting to true does solve this problem.

var $ = cheerio.load('XMLSTRING', {xmlMode: true});
Was this page helpful?
0 / 5 - 0 ratings

Related issues

francoisromain picture francoisromain  路  5Comments

Tetheta picture Tetheta  路  3Comments

robogeek picture robogeek  路  4Comments

misner picture misner  路  3Comments

askie picture askie  路  4Comments