I'm trying the following and getting an error:
var htmlFiles = [];
$("resource[href*='.htm'][d2l_2p0:material_type='content']").each(function () {
htmlFiles.push($(this).attr("href"));
});
return htmlFiles;
\path\to\node_modules\css-what\index.js:152
throw new SyntaxError("Malformed attribute selector: " + selector);
^
SyntaxError: Malformed attribute selector: d2l_2p0:material_type='content']
at parseSelector (\path\to\node_modules\css-what\index.js:152:12)
at parse (\path\to\node_modules\css-what\index.js:82:13)
at compileUnsafe (\path\to\node_modules\css-select\lib\compile.js:31:14)
at select (\path\to\node_modules\css-select\index.js:18:49)
at CSSselect (\path\to\node_modules\css-select\index.js:41:9)
at initialize.exports.find (\path\to\node_modules\cheerio\lib\api\traversing.js:40:21)
at initialize.module.exports (\path\to\node_modules\cheerio\lib\cheerio.js:86:18)
at new initialize (\path\to\node_modules\cheerio\lib\static.js:29:20)
at initialize (\path\to\node_modules\cheerio\lib\static.js:26:14)
at Object.scanForContentFiles [as contentFiles] (\path\to\lib\scanner\content-files.js:6:2)
I'm fairly certain that I've used this exact selector before with no problem. I'm guessing the problem is the colon (:) in the selector which is fairly common in XML.
Please note that I instantiated the cheerio object with the following options:
var cheerioOptions = {
normalizeWhitespace: true,
xmlMode: true,
decodeEntities: false
};
It should be noted that I can access the value of the attribute via the attr method so the following does work (-around):
var htmlFiles = [];
$("resource[href*='.htm']").each(function () {
if($(this).attr("d2l_2p0:material_type") === "content") {
htmlFiles.push($(this).attr("href"));
}
});
return htmlFiles;
Hi there,
The CSS 2.0 spec has this to say:
Attribute values must be CSS identifiers or strings.
...and:
In CSS, identifiers (including element names, classes, and IDs in selectors)
can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0
and higher, plus the hyphen (-) and the underscore (_); they cannot start
with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can
also contain escaped characters and any ISO 10646 character as a numeric code
(see next item). For instance, the identifier "B&W?" may be written as
"B\&W\?" or "B26 W\3F".
So you'll want to modify your code like so:
var htmlFiles = [];
-$("resource[href*='.htm'][d2l_2p0:material_type='content']").each(function () {
+$("resource[href*='.htm'][d2l_2p0\\:material_type='content']").each(function () {
htmlFiles.push($(this).attr("href"));
});
return htmlFiles;
Please let us know if that fixes your problem.
Assuming it does solve the problem, I'm still not sure that I would classify it as "intended behavior" while running with xmlMode: true.
How about something like this:
if(opts.xmlMode) {
selector = selector.replace(/:/g,"\\:");
}
That's how CSS selectors work 🤷♂️
@fb55 IMO, I don't consider this issue closed.
I don't believe the suggested workaround adequately addresses the need for targeting namespaced XML tags and attributes. In addition to being unintuitive, I don't feel that it's reasonable to strictly apply rules relating to CSS with syntax unique to XML.
If this tool is indeed intended for use with XML (which I can attest to it's ease of use over alternatives like x-path) then special exceptions should be welcomed.
CSS requires colons to be escaped. We don't plan to support any other way to select elements, and colons have a special meaning in CSS already, which makes the proposed change unusable (otherwise you couldn't use pseudo-selectors like :nth-child anymore).
Would you accept a PR to make a small note of this in the README? Maybe a sentence at the end of this section? https://github.com/cheeriojs/cheerio#-selector-context-root-
Sure thing :)
Cool, made a PR here: https://github.com/cheeriojs/cheerio/pull/1151