Dvc.org: docs: publish as ebook/plain html/pdf or other formats

Created on 18 Apr 2020  Â·  24Comments  Â·  Source: iterative/dvc.org

As discussed on discord, would the team please consider generating ebook versions of the documentation as an additional artifact of the site build?

I've tested manual conversion with pandoc which looks promising, but obviously the output needs tweaking and the process is not nice. e.g. I had to manually parse sidebar.json to get the correct ordering of the articles.

Whereas this might get closer to the right thing:
https://www.gatsbyjs.org/packages/gatsby-plugin-ebook/#gatsby-plugin-ebook

thanks all. stay safe-

UPDATE: Jump to https://github.com/iterative/dvc.org/issues/1167#issuecomment-621604958

2.0 release doc-content feature-request priority-p1

Most helpful comment

With the Models PR separating Doc nodes from others, something like this should be pretty painless to implement as long as there's a way to generate the required formats in Node.
A simple HTML output is obviously the easiest route because it can just be done as another page, but for non-HTML formats we can take the same approach gatsby-plugin-sitemap does and output a file from within the onPostBuild hook using data sourced from GraphQL.

There's also the different ways such a page could be formatted like choosing if we keep the sidebar, use another more page-friendly form of index, or skip the index altogether. I can also see the need for some slight schema changes to get every page accessible in sidebar order, but that wouldn't be a big deal for me to implement.

I'm going to look into gatsby-plugin-ebook to see if it suits our needs- it probably provides an easy way to use the onPostBuild approach.

All 24 comments

So. This need has recently surfaced again as a relatively easy way to start keeping an archive of versions of the docs that match different major DVC releases. So either a PDF eBook or a simple standalone static HTML website of dvc.com/doc would be ideal, if that's something we can achieve easily with Gatsby.

Thoughts @shcheklein @fabiosantoscode @iAdramelk ? Cc @dmpetrov and @rogermparent

Thanks!

With the Models PR separating Doc nodes from others, something like this should be pretty painless to implement as long as there's a way to generate the required formats in Node.
A simple HTML output is obviously the easiest route because it can just be done as another page, but for non-HTML formats we can take the same approach gatsby-plugin-sitemap does and output a file from within the onPostBuild hook using data sourced from GraphQL.

There's also the different ways such a page could be formatted like choosing if we keep the sidebar, use another more page-friendly form of index, or skip the index altogether. I can also see the need for some slight schema changes to get every page accessible in sidebar order, but that wouldn't be a big deal for me to implement.

I'm going to look into gatsby-plugin-ebook to see if it suits our needs- it probably provides an easy way to use the onPostBuild approach.

Since the website is already a set of static files, keeping an archive of HTML shouldn't be too hard to accomplish.

Remember that most of what's good for epub, is also good for PDF and print. A lot of it also applies to AMP. So we can deal with all of those at once if need be.

Of course there's the joining all the pages together, which depends on how epub works (I don't know anything about it!), but if we use a print-to-PDF tool we can control page breaks with CSS, and the rest (removing sidebars, top bar) with print CSS.

Thanks for the answers guys, sounds promising! But I'm wondering how to keep this as simple as possible. We have all the content in Markdown so in theory this should not be a tough problem, let's not even force it to be done via Gatsby if it's too invovled.

A simple HTML output is obviously the easiest route

Lets focus on this format for now. What I'm imagining is:

  • A special build process that produces an archive e.g. a ZIP or TAR file.
  • The archive contains a directory that basically matches the content/docs/ dir tree, but with .html files instead of .md.
  • The index.html file has the content of https://dvc.org/doc (docs home), and so on. You just open this or any file from file explorer to browse the docs archive (i.e. file:// protocol in browser).
  • All web pages use the same layout (and basic CSS) as the actual site, including the navigation sidebar, yes — it can be repeated in every single HTML or as an