So, I have scraped the HTML using cheerios and among the HTML, there is some information which I need as an JSON Object. To get the JSON, what I'm doing as of now is, looping through all scripts and get the json, and get the value!! :cry: Is there any better approach for this? I've even tried with some regEx but it failed!!
*_Regex code : *_
var regex = /product_price:?$/g;
//var regex = /product_price:\\d/g;
var results = regex.exec($.html());
HTML
<script>
var utag_data = {"request_sessionid": "", "request_ipaddress": "xxx.xxx.x.x",
"product_id": "9372", "customer_city": "xxxxxx", "siftscience_api_key": "77b4eca14b", "product_price_readable": "4999.00", "user_username": "",
"product_price": "499900", "request_method": "GET", "country_code": "IN",
"product_platform": "PC", "product_sku": "Tom Clancys The Division - PC", "user_is_authenticated": "False", "customer_country": "India",
"product_name": "Tom Clancy's The Division", "currency_code": "INR",
"request_path": "/s/in/en/pc/games/action/tom-clancys-division/"};
</script>
*_Cheerios Code : *_
var temp = '';
var json = '';
$('script').each(function(i, elem) {
temp = $(this).text();
if(temp.indexOf("var utag_data")!== -1){
json = temp.substring(json.indexOf("{"), json.length-2);
json = JSON.parse(json);
return false;
}
//console.log($(this).text());
});
Cheerio doesn't provide any tools to make your task any easier. Because you are trying to parse JavaScript code (and _not_ JSON), any approach based on simple pattern matching is likely to be brittle.
I would recommend using Cheerio to scrape the contents of the <script> tag and feeding that code to a proper JavaScript parser like Shift, Acorn or Esprima. Those tools will output an AST which you can
programatically traverse for the values you need.
Thanks for the help!
Nice, thank you!
Most helpful comment
Cheerio doesn't provide any tools to make your task any easier. Because you are trying to parse JavaScript code (and _not_ JSON), any approach based on simple pattern matching is likely to be brittle.
I would recommend using Cheerio to scrape the contents of the
<script>tag and feeding that code to a proper JavaScript parser like Shift, Acorn or Esprima. Those tools will output an AST which you canprogramatically traverse for the values you need.