Latest pdf version is 4 years old?
Hrm, I'm not sure how that is generated or where it's located. Do you have a link to that?
I think its here. There's a closed bug that talks about auto updating the PDF. Doesn't seem to have been implemented?
Originally I came across the PDF on the main website.
https://tldr.sh/
https://tldr.sh/assets/tldr-book.pdf
We should probably merge #4993 before looking at this. Anyway, I'm unsure about how auto-updating the pdf version would actually work. @owenvoke, could GitHub Actions do it?
In the mean time, you can build a local copy yourself. The instructions are present here that tell you how to do it: https://github.com/tldr-pages/tldr/tree/master/scripts/pdf
All you need is a Python 3 installation with pip. I'd attach a copy I've built while testing #4993, but GitHub has a 10MiB file limit and it's 43MiB O.o
Yes, we could pretty easily do this with a GitHub Action. I'd be happy to have a look at this later.
@sbrl, I've created a PR that does this in the CI. https://github.com/tldr-pages/tldr/pull/4996
it's 43MiB O.o
Something is very wrong there. The current one on the website is only 481 KB
If you have ghostscript try passing it through
gs -sDEVICE=pdfwrite -dDetectDuplicateImages=true -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -dPrinted=false -dPDFSETTINGS=/default -sOutputFile=output.pdf input.pdf
Well, it's gone from 274 pages to 2000+, and also includes styling on the pages which might change it. But that definitely does seem a lot larger. 馃
I took a look at the sample PDF you made on #4996. Running it through the above ghostscript doesn't do a lot. Taking a look at it with poppler's 'pdffonts' there seems to be _quite_ a lot of embedded fonts: 6336 to be exact. :open_mouth:
I'm not a PDF expert but shouldn't one embedded font do the entire PDF? Ok, maybe two or three more if you want different styles for code or headings but >6000 has to be wrong. The document is simple plain text. There don't seem to be any custom glyphs required.
Edit: I forgot to say the clickable index like in the current out-of-date PDF is pretty crucial to the usability of it. Its not in the demo PDF.
Yes, one font should do... I wonder if it's embedding the font for every page or something...
And hmm, I feel like the ToC would be pretty massive, but might be nice to include. 馃憤馃徎 I've not seen anyone use the PDF before.
I wonder if it's embedding the font for every page or something...
That'd be my guess too.
I feel like the ToC would be pretty massive
You're probably right.
What about three columns per page for the index and maybe a slightly smaller font (just for the index)?
Or what about three a few primary clickable headings on the first index page: Common, OSX, Linux SunOS etc.
Click on one of them and they take you to another clickable sub-index at the start of each section within the PDF?
'Common' would still be pretty big I guess. Along with the triple column layout it might help somewhat?
Another thing I noticed, in the original PDF it didn't do just one command per page. It just skipped a line and started the next command on the same page. You'd probably cut of a third(?) of the pages if each command just followed on from the next.
...just some thoughts.
I've not seen anyone use the PDF before.
Don't tell me I'm the only one! God, I'm sorry for putting you to all this trouble if I am. If you think the PDF isn't used feel free to leave it. We can close this issue. I guess it would be best to remove the old PDF from the main web site too.
No problem at all, just because I haven't seen anyone use it doesn't mean people don't. It's definitely worth updating it if it exists. And it's not too much effort. 馃憤馃徎 Just need to sort out the size issue. I'm not sure who implemented the PDF scripts originally, but they might know what Weasyprint is doing.
No worries, @older-pack!
Hrm, I wonder if we can do something about the size of that PDF. If we could do some debugging we may be able to open an issue against Weasyprint to improve encoding efficiency.
I found an open bug from 2017 about embedding fonts. It kind of implies fonts might be a work-in-progress in Weasyprint.
This is just some brainstorming, but the original pdf was made with a Latex based script - I think. What about reverting to that script? It managed fonts and size just fine. It had a nice simple clean layout?
If it ain't broke...
Passing it through 'pdftocairo' reduces it from 43MB to 36MB
pdftocairo -pdf input.pdf output.pdf
but hyperlinks aren't clickable anymore.
The problem with a latex-based script is that it requires complex dependencies to run. The aim of the weasyprint-based solution was simplify the overall process and make it easier to maintain.
Most helpful comment
No problem at all, just because I haven't seen anyone use it doesn't mean people don't. It's definitely worth updating it if it exists. And it's not too much effort. 馃憤馃徎 Just need to sort out the size issue. I'm not sure who implemented the PDF scripts originally, but they might know what Weasyprint is doing.