Cheerio: tr and td tag is not possible to parse

Created on 15 Mar 2019  路  2Comments  路  Source: cheeriojs/cheerio

I am able to get element text only when tr or td tag is wrapped in table.

Doesn't work

const cheerio = require('cheerio');
const $ = cheerio.load('<td>Hello world</td>');
console.log($('td').text());

Works fine

const cheerio = require('cheerio');
const $ = cheerio.load('<table><td>Hello world</td></table>');
console.log($('td').text());

Why is that?

Not a bug

Most helpful comment

One good way to get a sense for what Cheerio is doing is by using the static html method to render the document element. Here's how you could do that with your first example:

const cheerio = require('../cheerio');
const $ = cheerio.load('<td>Hello world</td>');
console.log(cheerio.html($('html')));

That returns <html><head></head><body>Hello world</body></html>.

Cheerio interprets the string passed to load as an HTML document. A <td> element can't be a direct descendent of the <body> element, so the permissive nature of HTML ends up producing a document with just a <body> element. This is the same thing your web browser will do if you load a file like that.

Your second example works as expected because it is valid markup--a <td> element may be a direct descendant of a <table> element.

If you want to create document fragments, you can start with an empty document and then use the same approach you would use with jQuery:

const cheerio = require('../cheerio');
const $ = cheerio.load('');
const $td = $('<td>Hello world</td>');
console.log(cheerio.html($td));

That produces the markup you're expecting.

All 2 comments

One good way to get a sense for what Cheerio is doing is by using the static html method to render the document element. Here's how you could do that with your first example:

const cheerio = require('../cheerio');
const $ = cheerio.load('<td>Hello world</td>');
console.log(cheerio.html($('html')));

That returns <html><head></head><body>Hello world</body></html>.

Cheerio interprets the string passed to load as an HTML document. A <td> element can't be a direct descendent of the <body> element, so the permissive nature of HTML ends up producing a document with just a <body> element. This is the same thing your web browser will do if you load a file like that.

Your second example works as expected because it is valid markup--a <td> element may be a direct descendant of a <table> element.

If you want to create document fragments, you can start with an empty document and then use the same approach you would use with jQuery:

const cheerio = require('../cheerio');
const $ = cheerio.load('');
const $td = $('<td>Hello world</td>');
console.log(cheerio.html($td));

That produces the markup you're expecting.

Very well explained @jugglinmike, thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

francoisromain picture francoisromain  路  5Comments

rajkumarpb picture rajkumarpb  路  3Comments

miguelmota picture miguelmota  路  3Comments

M3kH picture M3kH  路  4Comments

askie picture askie  路  4Comments