Cheerio: Can't reach anything under noscript tag

Created on 5 Nov 2017  路  6Comments  路  Source: cheeriojs/cheerio

I've got cheerio.load(pageContent, { decodeEntities: false }), but then when I try to match an element that's under a noscript tag it doesn't work. I thought setting decodeEntities to false would allow this. How do I match that element?

Most helpful comment

try cheerio.load(pageContent, [ xmlMode: true});

All 6 comments

try cheerio.load(pageContent, [ xmlMode: true});

I'm seeing this too. This should not have been closed.

Enabling xml mode fixes this, otherwise parse5 will always strip noscript tags.

Right, but it's still an issue, right? It should stay open until it gets resolved.

Try to match the noscript tag itself, get the html by calling .html() and then load that html with cheerio again. That way you'll be able to match any element under the noscript tag.

This is definitely still a bug with cheerio, as cheerio is essentially a browser without JavaScript support. The noscript tag is intended to provide content for browsers that do not support javascript (which would include search engines, web crawlers, and web scrapers).

"xmlMode: true" has several other side effects (see documentation) which can cause most pages, and especially those in question to fail to parse.

Was this page helpful?
0 / 5 - 0 ratings