Cheerio changes attributes with single quotes into double quotes.
var cheerio = require('cheerio');
// uses 'single quotes'
var $ = cheerio.load('<div attr=\'value\'></div>')
$.html();
// => <div attr="value"></div>
// has "double quotes"
This is useful for me, as I use JSON in HTML attributes for widget settings.
<div data-settings='{ "option": true }'></div>
Which is encoded with HTML entities (possibly breaking JSON) as
<div data-options="{ "option": true }"></div>
Setting decodeEntities: false is encoded and breaks HTML
<div data-options="{ "option": true }"></div>
Ideally, cheerio would preserve which quote character is used. I understand this is an edge case, so I'm reporting it in case others run into it. Similar to #460
This definitely won't break JSON within browsers. IMHO this won't be fixed.
@fb55 is there any chance of this being fixed?
The html my app has to consume is very horribly written and I do not have control over fixing it. Not only does it contain raw json inside of a div tag, but most of the href attributes for anchors only work because of single quotes href='javascript: dostuff("asdf");' which like the issue breaks the tag when double quotes are replaced and the decodeEntities: false is used.
+1, I also store JSON in HTML attributes and would like to second keeping single quoted escapes because it makes the html with embedded json much easier to manipulate and read
+1
gentlemen any update on this one? do you intend to fix this or not at all? its more that 1 year later and task is still open.
@fb55, any update for this? Problem is really important, can't store json in meta tags there.
Switching to parse5 could fix this (it's another open issue). As I said
before, not sure if this will be fixed with the current architecture as it
doesn't break anything.
– Felix
ok @fb55, thank you very much for the feedback.
I fix this.
steps:
else {
output += key + "='" + (opts.decodeEntities ? entities.encodeXML(value) : value) + "'";
}
To
else {
if(/[^\\]\"/.test(value)){
output += key + "='" + (opts.decodeEntities ? entities.encodeXML(value) : value) + "'";
}else {
output += key + '="' + (opts.decodeEntities ? entities.encodeXML(value) : value) + '"';
}
}
Thanks, @gaecom.
I've slightly modified your solution.
if (opts.plainQuotes && /[^\\]\"/.test(value)) {
if (opts.decodeEntities) {
value = entities.encodeXML(value);
if (opts.plainQuotes) { value = value.replace(/"/g, '"'); }
}
output += key + "='" + value + "'";
} else {
if (opts.decodeEntities) {
value = entities.encodeXML(value);
if (opts.plainQuotes) { value = value.replace(/'/g, "'"); }
}
output += key + '="' + value + '"';
}
This variant works with decodeEntities: true.
It makes some double job with encoding and replacing " back, but it is acceptable for better readability.
I also have this issue and the proposed solutions doesn't works for my use case. The HTML parsed by cheerio (via Inky) will also by parsed later by Twig and the quote needs to remain the same. Here is a HTML sample which fail:
{{ "The content may have simple quote like « l' » or « d' », as it is the case for <a href='http://my.link'>tags</a>."|trans }}
We are having this issue with implementing the new AMP pages by google. One of the parameters requires JSON inside a data attribute like so:
<amp-ad rtc-config='{"urls": ["url1",...]}'>
The single quotes get converted to doubles which invalidates the HTML. JSON can't contain single quotes so swapping double and single quotes doesn't work.
Please cheerio, we need you.
SOS. Does anyone have a fix for this without modifying external node_modules?
I just had this problem and fixed it by adding decodeEntities: false to the props when loading the html:
this.$ = cheerio.load(myHtml, {decodeEntities: false})
Most helpful comment
I fix this.
steps:
To