Is it possible to convert a webpage that has some content generated by javascript (e.g tables, etc) to another file format using pandoc?
When I try, pandoc retrieves only static html, and the content generated by javascript is not there.
Thanks
While pandoc does not process dynamic content, you should be able to, just save it first locally as rendered, and feed that to pandoc:
Chrome and Firefox will save a copy of the page as it's currently displayed (ie using the current DOM) with
Ctrl+S / Save Page As...if you chooseWeb page, complete.
Thanks for the reply
I mean, couldn't pandoc do that automatically because I have thousands of links and it not possible to manually save each page.
Thanks
This would require a JavaScript to be included in pandoc, which is not the case. I'm sure you could automate JS execution and HTML downloading with something like selenium.
@SignificantCell2 Parsing dynamically generated content is not a job for a document conversion tool. Pandoc is not a web crawler! There are lots of tools that are crawlers and can be automated to fetch, generate and render dynamic content into static pages — but you need to use a tool designed for that job and use it to same static HTML pages with relevant resources in the right places. Then use Pandoc to convert those.
@alerque : You are right, I could use a browser in headless mode to do that or maybe using selenium package on Python. But I am just perplexed. Lately I started to see your posts everywhere on unix SE and other SE websites, ...etc. I visited your website but found that it no longer exist, I have seen some tweets of you on twitter. And surprisingly, after 2 days you reply to a closed issue of mine on github. Please don't tell me you can read minds remotely 🤔
I don't think you have anything to worry about. Given that I effectively quit participating on SE sites last year in protest over their handling of volunteer moderators yet you are only now bumping into my posts over there I would have to not only read minds but predict the future in order to be the one following you. I don't even know who you are on any of those other platforms. I suspect you've just started wandering into topical areas where I've had ongoing interest for a while.
Most helpful comment
@SignificantCell2 Parsing dynamically generated content is not a job for a document conversion tool. Pandoc is not a web crawler! There are lots of tools that are crawlers and can be automated to fetch, generate and render dynamic content into static pages — but you need to use a tool designed for that job and use it to same static HTML pages with relevant resources in the right places. Then use Pandoc to convert those.