Cheerio: Pass cheerio selectors as a variable

Created on 30 Oct 2013  路  7Comments  路  Source: cheeriojs/cheerio

I couldn't get any responses on other forums so I'll try here although this is technically not a bug report. Is it possible to do something like this with cheerio?

This is a simplified example. I need to somehow store cheerio selectors in a variable or object and then call request function repeatedly with different set of selectors.

var request = require('request');
var cheerio = require('cheerio');


// None of the below approaches work
// var person = "$('li.person')";  <-- this one is passed as a string
// var person = $('li.person');  <-- this one gives error because $ is not defined

request( 'http://example.com', function(error, response, body) {
  $ = cheerio.load(body);
  var $people = person;

  $people.each(function() {
    console.log('Person: ' + $(this).text());
  }
});

Most helpful comment

@finspin You _might_ be able to do something quick-and-dirty with eval, but I wouldn't recommend it. Instead, I can think of two options:

1. Define a more expressive data structure that you can "map" to selections when you're ready to scrape the page, i.e.

var sites = {
  site1: {
    name: "Site 1",
    url: "http://example1.com",
      selector: ["li.person", "a"]
  }
};

...this is nice because it's declarative (so you can easily save it to a database), but it will also take some extra thought: you'll have to design an expressive declarative form for traversing documents, and you'll have to intelligently iterate through it when the time comes to scrape.

2. Define a function that accepts the "loaded" Cheerio object and makes the selection as necessary:

var sites = {
  site1: {
    name: "Site 1",
    url: "http://example1.com",
    scrape: function($) {
      $('li.person').each(function() {
        console.log($(this).find('a').text());
      });
    }
  }
};

// later...
      function(error, response, body) {
        var $ = cheerio.load(body);
        company.scrape($);
      }

All 7 comments

Hi @finspin,

You'll need to defer selecting the "people" until you've loaded the HTML. Once you've done that, you can use the $ as you would use jQuery in the browser. Here's how you can fix your example:

 var request = require('request');
 var cheerio = require('cheerio');
-
-
-// None of the below approaches work
-// var person = "$('li.person')";  <-- this one is passed as a string
-// var person = $('li.person');  <-- this one gives error because $ is not defined

 request( 'http://example.com', function(error, response, body) {
   $ = cheerio.load(body);
-  var $people = person;
+  var $people = $('li.person');

   $people.each(function() {
     console.log('Person: ' + $(this).text());
   }
 });

...and this is unrelated, but don't forget to declare a local $ variable:

 var request = require('request');
 var cheerio = require('cheerio');

 request( 'http://example.com', function(error, response, body) {
-  $ = cheerio.load(body);
+  var $ = cheerio.load(body);
   var $people = $('li.person');

   $people.each(function() {
     console.log('Person: ' + $(this).text());
   }
 });

Hi @jugglinmike,

thanks a lot for looking into my issue. However, I think I can't use your solution. It's my fault, I should have been more specific with my example. I can't use the selector inside the request function. I've put together another example, it's more lengthy but I hope it shows my issue in a more clear way.

var request = require('request');
var cheerio = require('cheerio');

var sites = {
  site1: {
    name: "Site 1",
    url: "http://example1.com",
    people: "$('li.person')"
  },
  site2: {
    name: "Site 2",
    url: "http://example2.com",
    people: "$('li.human')"
  }
};

var getData = function(company) {
  async.waterfall([
    // this function retrieves some data from database for later use
    // it's 'not very relevant to my cheerio problem but leaving it here for clarity
    function(callback) {
      getLinks(site.name, callback);
    }, 
    function(links, callback){
      request({
        url: site.url,
        headers: headers
      },
      function(error, response, body) {
        var $ = cheerio.load(body);
        var $people = site.people; // <-- this is where things break

        $people.each(function() {
          console.log('Person: ' + $(this).text());
        });
      }
      );
    }
  ], function (error, result) {
       process.exit(0);
     };
   )
}

for (var site in sites) {
  getData(sites[site]);
} 

@finspin The selection does not occur until you invoke the $ function with a CSS selector string. You can most directly solve your problem by making the following modification:

       var $ = cheerio.load(body);
-      var $people = site.people; // <-- this is where things break
+      var $people = $('li.person');

...but I can see that you are trying to separate the definition of the selector string from the act of selecting. In that case, your code has errors that are unrelated to Cheerio (but they can be fixed):

var request = require('request');
var cheerio = require('cheerio');

var sites = {
  site1: {
    name: "Site 1",
    url: "http://example1.com",
-   people: "$('li.person')"
+   peopleSelector: "li.person"
  },
  site2: {
    name: "Site 2",
    url: "http://example2.com",
-   people: "$('li.human')"
+   peopleSelector: "li.human"
  }
};

var getData = function(company) {
  async.waterfall([
    // this function retrieves some data from database for later use
    // it's 'not very relevant to my cheerio problem but leaving it here for clarity
    function(callback) {
      getLinks(site.name, callback);
    }, 
    function(links, callback){
      request({
        url: site.url,
        headers: headers
      },
      function(error, response, body) {
        var $ = cheerio.load(body);
-       var $people = site.people; // <-- this is where things break
+       var $people = $(company.peopleSelector);

        $people.each(function() {
          console.log('Person: ' + $(this).text());
        });
      }
      );
    }
  ], function (error, result) {
       process.exit(0);
     };
   )
}

for (var site in sites) {
  getData(sites[site]);
} 

@jugglinmike yes, exactly, I'm trying to separate the selectors from the act of selecting. As you pointed out this is definitely not an issue with cheerio and so this is not the right forum to discuss this so I really appreciate you sticking with me :)

The approach you described above would work with simple selector (e.g. human.li) but is there a way to make it work with chained selectors (e.g. $(this).find.('a').text())?

@finspin You _might_ be able to do something quick-and-dirty with eval, but I wouldn't recommend it. Instead, I can think of two options:

1. Define a more expressive data structure that you can "map" to selections when you're ready to scrape the page, i.e.

var sites = {
  site1: {
    name: "Site 1",
    url: "http://example1.com",
      selector: ["li.person", "a"]
  }
};

...this is nice because it's declarative (so you can easily save it to a database), but it will also take some extra thought: you'll have to design an expressive declarative form for traversing documents, and you'll have to intelligently iterate through it when the time comes to scrape.

2. Define a function that accepts the "loaded" Cheerio object and makes the selection as necessary:

var sites = {
  site1: {
    name: "Site 1",
    url: "http://example1.com",
    scrape: function($) {
      $('li.person').each(function() {
        console.log($(this).find('a').text());
      });
    }
  }
};

// later...
      function(error, response, body) {
        var $ = cheerio.load(body);
        company.scrape($);
      }

@jugglinmike Your first solution looks interesting, I haven't thought of that! I had tried something similar than what you described in second solution but couldn't make it work due to some scoping issue that I couldn't resolve. You now gave me an idea with passing the loaded cheerio object. I made first prototype with this approach and it looks like it's going to work. Once again, thanks a lot for your help, much appreciated!

I'm glad it's working out for you!

Was this page helpful?
0 / 5 - 0 ratings