Cheerio: Get a Javascript object using cheerio

Created on 4 Mar 2016  路  3Comments  路  Source: cheeriojs/cheerio

So, I have scraped the HTML using cheerios and among the HTML, there is some information which I need as an JSON Object. To get the JSON, what I'm doing as of now is, looping through all scripts and get the json, and get the value!! :cry: Is there any better approach for this? I've even tried with some regEx but it failed!!

*_Regex code : *_

var regex = /product_price:?$/g;
//var regex = /product_price:\\d/g;
var results = regex.exec($.html());

HTML

<script>
var utag_data = {"request_sessionid": "", "request_ipaddress": "xxx.xxx.x.x", 
"product_id": "9372", "customer_city": "xxxxxx", "siftscience_api_key": "77b4eca14b",       "product_price_readable": "4999.00", "user_username": "", 
"product_price": "499900", "request_method": "GET", "country_code": "IN", 
"product_platform": "PC", "product_sku": "Tom Clancys The Division - PC",   "user_is_authenticated": "False", "customer_country": "India", 
"product_name": "Tom Clancy's The Division", "currency_code": "INR",
 "request_path": "/s/in/en/pc/games/action/tom-clancys-division/"};
</script>

*_Cheerios Code : *_

var temp = '';
var json = '';

    $('script').each(function(i, elem) {
      temp = $(this).text(); 

      if(temp.indexOf("var utag_data")!== -1){              
        json = temp.substring(json.indexOf("{"), json.length-2);            
        json = JSON.parse(json);
        return false;
      }
      //console.log($(this).text());
    });

Most helpful comment

Cheerio doesn't provide any tools to make your task any easier. Because you are trying to parse JavaScript code (and _not_ JSON), any approach based on simple pattern matching is likely to be brittle.

I would recommend using Cheerio to scrape the contents of the <script> tag and feeding that code to a proper JavaScript parser like Shift, Acorn or Esprima. Those tools will output an AST which you can
programatically traverse for the values you need.

All 3 comments

Cheerio doesn't provide any tools to make your task any easier. Because you are trying to parse JavaScript code (and _not_ JSON), any approach based on simple pattern matching is likely to be brittle.

I would recommend using Cheerio to scrape the contents of the <script> tag and feeding that code to a proper JavaScript parser like Shift, Acorn or Esprima. Those tools will output an AST which you can
programatically traverse for the values you need.

Thanks for the help!

Nice, thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gajus picture gajus  路  4Comments

becush picture becush  路  3Comments

askie picture askie  路  4Comments

tndev picture tndev  路  4Comments

unicrus picture unicrus  路  4Comments