Cheerio: Malformed attribute selector

Created on 8 Aug 2017  ·  8Comments  ·  Source: cheeriojs/cheerio

I'm trying the following and getting an error:

var htmlFiles = [];
$("resource[href*='.htm'][d2l_2p0:material_type='content']").each(function () {
    htmlFiles.push($(this).attr("href"));
});
return htmlFiles;
\path\to\node_modules\css-what\index.js:152
                                        throw new SyntaxError("Malformed attribute selector: " + selector);
                                        ^

SyntaxError: Malformed attribute selector: d2l_2p0:material_type='content']
    at parseSelector (\path\to\node_modules\css-what\index.js:152:12)
    at parse (\path\to\node_modules\css-what\index.js:82:13)
    at compileUnsafe (\path\to\node_modules\css-select\lib\compile.js:31:14)
    at select (\path\to\node_modules\css-select\index.js:18:49)
    at CSSselect (\path\to\node_modules\css-select\index.js:41:9)
    at initialize.exports.find (\path\to\node_modules\cheerio\lib\api\traversing.js:40:21)
    at initialize.module.exports (\path\to\node_modules\cheerio\lib\cheerio.js:86:18)
    at new initialize (\path\to\node_modules\cheerio\lib\static.js:29:20)
    at initialize (\path\to\node_modules\cheerio\lib\static.js:26:14)
    at Object.scanForContentFiles [as contentFiles] (\path\to\lib\scanner\content-files.js:6:2)

I'm fairly certain that I've used this exact selector before with no problem. I'm guessing the problem is the colon (:) in the selector which is fairly common in XML.

Please note that I instantiated the cheerio object with the following options:

var cheerioOptions = {
    normalizeWhitespace: true,
    xmlMode: true,
    decodeEntities: false
};

It should be noted that I can access the value of the attribute via the attr method so the following does work (-around):

var htmlFiles = [];
$("resource[href*='.htm']").each(function () {
    if($(this).attr("d2l_2p0:material_type") === "content") {
        htmlFiles.push($(this).attr("href"));
    }
});
return htmlFiles;

All 8 comments

Hi there,

The CSS 2.0 spec has this to say:

Attribute values must be CSS identifiers or strings.

...and:

In CSS, identifiers (including element names, classes, and IDs in selectors)
can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0
and higher, plus the hyphen (-) and the underscore (_); they cannot start
with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can
also contain escaped characters and any ISO 10646 character as a numeric code
(see next item). For instance, the identifier "B&W?" may be written as
"B\&W\?" or "B26 W\3F".

So you'll want to modify your code like so:

 var htmlFiles = [];
-$("resource[href*='.htm'][d2l_2p0:material_type='content']").each(function () {
+$("resource[href*='.htm'][d2l_2p0\\:material_type='content']").each(function () {
    htmlFiles.push($(this).attr("href"));
 });
 return htmlFiles;

Please let us know if that fixes your problem.

Assuming it does solve the problem, I'm still not sure that I would classify it as "intended behavior" while running with xmlMode: true.

How about something like this:

if(opts.xmlMode) {
  selector = selector.replace(/:/g,"\\:");
}

That's how CSS selectors work 🤷‍♂️

@fb55 IMO, I don't consider this issue closed.

I don't believe the suggested workaround adequately addresses the need for targeting namespaced XML tags and attributes. In addition to being unintuitive, I don't feel that it's reasonable to strictly apply rules relating to CSS with syntax unique to XML.

If this tool is indeed intended for use with XML (which I can attest to it's ease of use over alternatives like x-path) then special exceptions should be welcomed.

CSS requires colons to be escaped. We don't plan to support any other way to select elements, and colons have a special meaning in CSS already, which makes the proposed change unusable (otherwise you couldn't use pseudo-selectors like :nth-child anymore).

Would you accept a PR to make a small note of this in the README? Maybe a sentence at the end of this section? https://github.com/cheeriojs/cheerio#-selector-context-root-

Sure thing :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

M3kH picture M3kH  ·  4Comments

askie picture askie  ·  4Comments

trevorfrese picture trevorfrese  ·  4Comments

chenweiyj picture chenweiyj  ·  5Comments

rajkumarpb picture rajkumarpb  ·  3Comments