Pandoc: Convert a webpage with dynamically generated content from javascript

Created on 28 Aug 2020  Â·  6Comments  Â·  Source: jgm/pandoc

Is it possible to convert a webpage that has some content generated by javascript (e.g tables, etc) to another file format using pandoc?
When I try, pandoc retrieves only static html, and the content generated by javascript is not there.

Thanks

Most helpful comment

@SignificantCell2 Parsing dynamically generated content is not a job for a document conversion tool. Pandoc is not a web crawler! There are lots of tools that are crawlers and can be automated to fetch, generate and render dynamic content into static pages — but you need to use a tool designed for that job and use it to same static HTML pages with relevant resources in the right places. Then use Pandoc to convert those.

All 6 comments

While pandoc does not process dynamic content, you should be able to, just save it first locally as rendered, and feed that to pandoc:

Chrome and Firefox will save a copy of the page as it's currently displayed (ie using the current DOM) with Ctrl+S / Save Page As... if you choose Web page, complete.

Thanks for the reply
I mean, couldn't pandoc do that automatically because I have thousands of links and it not possible to manually save each page.

Thanks

This would require a JavaScript to be included in pandoc, which is not the case. I'm sure you could automate JS execution and HTML downloading with something like selenium.

@SignificantCell2 Parsing dynamically generated content is not a job for a document conversion tool. Pandoc is not a web crawler! There are lots of tools that are crawlers and can be automated to fetch, generate and render dynamic content into static pages — but you need to use a tool designed for that job and use it to same static HTML pages with relevant resources in the right places. Then use Pandoc to convert those.

@alerque : You are right, I could use a browser in headless mode to do that or maybe using selenium package on Python. But I am just perplexed. Lately I started to see your posts everywhere on unix SE and other SE websites, ...etc. I visited your website but found that it no longer exist, I have seen some tweets of you on twitter. And surprisingly, after 2 days you reply to a closed issue of mine on github. Please don't tell me you can read minds remotely 🤔

I don't think you have anything to worry about. Given that I effectively quit participating on SE sites last year in protest over their handling of volunteer moderators yet you are only now bumping into my posts over there I would have to not only read minds but predict the future in order to be the one following you. I don't even know who you are on any of those other platforms. I suspect you've just started wandering into topical areas where I've had ongoing interest for a while.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

elliottslaughter picture elliottslaughter  Â·  44Comments

jgm picture jgm  Â·  117Comments

matthijskooijman picture matthijskooijman  Â·  54Comments

brainchild0 picture brainchild0  Â·  66Comments

jclement picture jclement  Â·  117Comments