Tiddlywiki5: Please GZip $:/core, maybe?

Created on 15 Sep 2019  路  34Comments  路  Source: Jermolene/TiddlyWiki5

Yesterday I figured out that if we gzip the core tiddler and base64 encode it, it is 450 KB instead of 2 MB. I kind of think that's a massive improvement. Since it doesn't make sense to edit the HTML version of $:/core anyway (it is never supposed to change so the user has a fallback), can we gzip it and base64 encode it?

While internet downloads are normally compressed (at least in theory), many wikis are shared around via email, and cutting the size down like this is obviously good. Since gzip tries to make efficient use of repetition, having very specific formatting rules which can be applied to all files would probably be a good idea as well, but since this is already partly the case, we're probably already halfway there.

pmario is always correcting my VS Code formatting, which I'm glad for, but I don't know what he uses to do it.

This is much better than webpack-compiling the Javascript modules, which is 650 KB (gzipped is 160 KB), but doesn't give us the tiddler content, only the Javascript code, so it doesn't really benefit much.

The discussion that prompted this is here: https://groups.google.com/d/msg/tiddlywikidev/7nTAuwRG7h4/2RcMcJjWAwAJ

I personally think we should use a library module the way sjcl.js does, but I guess that can vary. Can a library like that still be require'd from another tiddler? Or maybe we just need to assign it as module-type:library so it get's loaded as a module and can be required by other tiddlers.

Most helpful comment

TiddlyWiki2 is less than 500KB. Maybe TiddlyWiki5 includes too much in the core?

All 34 comments

I think this should probably be an opt-in situation and only allowed for plugins (plugins, themes, and languages), so a plugin could specify compress:yes as a field on plugin.info (the fields of the plugin tiddler), and when it would get exported as an HTML tiddler the template would act accordingly. I'm not sure how that would work for import but it should use the existing framework.

As you write, if you serve TW from a proper server, gzipping is activated anyway. .... So there is no reason to gzip the core. The whole content is gzipped.

Creating and sending ZIP files is supported on the OS level. eg: For windows you can

  • right-click the file
  • SendTo: Compressed Zipped folder
  • select the created file
  • right-click
  • SendTo: email recipient

I think a 2 click (for compression) or a 5 click (for sending an e-mail) workflow should be good enough, if you use e-mail and you want to save bandwidth.

The main advantage of a plain text core is readability. It would make maintenance a lot harder, if it would be compressed!

We could use a compressor, that also creates a source map file, but that's also against a "self contained" principle and it makes maintenance a lot harder.


Tobi Beer uses compressed plugins. .. and that's a nightmare. If users have a problem with those plugins I personally don't try to help, because I know, it will take a hell lot of time to create a debugable version.

pmario is always correcting my VS Code formatting, which I'm glad for, but I don't know what he uses to do it.

I'm using brackets.io ... It uses codemirror.js as the code editor framework.

As far as I know, Jeremy uses Sublime Text.

I'm using ESLint to show coding problems. ... I did have a closer look at the web, if someone uses the same or a similar style guide to TW-code. eg: What's good and what's bad.

The closest I could find was airbnb js styleguide. If you follow the rules described there, I think you are very close to the existing TW style.

There are some very important exceptions:

  • TW core always uses tabs for indentation. (this makes a difference in file size!!)
  • parameters in function calls have no whitespace.

    • from boot.js

It is:

/*
Check if an object has a property
*/
$tw.utils.hop = function(object,property) {
    return object ? Object.prototype.hasOwnProperty.call(object,property) : false;
};

instead of:

$tw.utils.hop = function (object, property) {
    return object ? Object.prototype.hasOwnProperty.call( object, property) : false;
};

This is "not so good" for readability, but saves 4 bytes. ... for 2 lines of code

brackets editor uses 1 tab == 4 spaces wide. That works well with the TW code.

Just to be clear. I'm not against a smaller empty.html. .. IF we can create an "unbreakable" workflow, that is easy to maintain!

There has been a discussion about "externalizing" the core. So splitting core and content into eg:
tiddlywiki.js and empty.html ...

This may be an option for server based wiki distributions, because tiddlywiki.js can be nicely cached by the browser.

BUT it will create a lot of problems for "single-file" users. So downloading a wiki like this, will need to combine the 2 components again.

Saving on the other hand should only save the content ...

TiddlyWiki2 is less than 500KB. Maybe TiddlyWiki5 includes too much in the core?

Hi @Arlen22 it's an interesting suggestion that hasn't come up before: to support an alternative plugin format that compresses the payload. As you note, it's more effective than just minifying the JavaScript modules, and it doesn't affect the readability of code in browser developer tools. (The core module is already unreadable via a conventional view source operation because it is HTML encoded JSON).

However, I do see a few potential issues:

  • There's no built-in string compression/decompression in the browser. We could include another library as we include SJCL but it would need to be of a similar quality (e.g. a decent test suite), and support a standard compression format to ensure interoperability with other tools. I have yet to find a JS-only compression/decompression library that meets those criteria
  • The new requirement to decompress a plugin before processing it places a significant burden on third party tools that work with plugins, and makes quick hacks a lot harder to make. I wonder if this isn't one of the cases where hackability trumps potential minor performance wins

An alternative option with even better performance and interoperability is to GZip the entire file, which is easy for users to do (e.g. prior to emailing) and easy for servers to do

(The core module is already unreadable via a conventional view source operation because it is HTML encoded JSON).

In FF I want to have these possibilities. It even shows the same directory structure as the core.

ff-debug-view

In FF I want to have these possibilities. It even shows the same directory structure as the core.

Right, that wouldn't change under @Arlen22's proposal. It would be the same JSON data comprising each plugin, just decompressed before evaluation.

We could include another library as we include SJCL but it would need to be of a similar quality

hmm, I think we would be good to go without the SJCL library, since the native web-crypto API is widely supported with latest browsers. ... I'm not 100% sure, but it would be worth a shot, to see which functions we need and if they are available natively.

hmm, I think we would be good to go without the SJCL library, since the native web-crypto API is widely supported with latest browsers. ... I'm not 100% sure, but it would be worth a shot, to see which functions we need and if they are available natively.

Perhaps that should be discussed in a separate ticket?

Yesterday I figured out that if we gzip the core tiddler and base64 encode it, it is 450 KB instead of 2 MB. I kind of think that's a massive improvement. Since it doesn't make sense to edit the HTML version of $:/core anyway (it is never supposed to change so the user has a fallback), can we gzip it and base64 encode it?

hmmm, I did probably misread this.

see: https://stuk.github.io/jszip/documentation/api_jszip.html

Hi @pmario that's a great library (and is already integrated with TW as a plugin), but it does a lot more than what we need here, and is 81KB.

.. but it does a lot more than what we need here, and is 81KB.

I know, but it would be nice, if the "Advanced Search: Filter" could export / save several tiddlers in .tid format using a ZIP file. see a recent discussion in the GG

IMO 81kByte overhead are not that bad, if we get 1.4MByte reduction _and_ a new core functionality in return.

IMO if "zipping" is globally available, it could also improve the "encrypted.html" file in 2 ways.

  • Make it smaller ... may be ... AND
  • what's more important improve encryption quality, since a ZIPed file already has less better entropy as plain text content.

Hi @pmario I quite like your argument in favour of moving JSZip into the core, but I don't know enough to refute the argument that increasing the entropy of the file will improve encryption quality. But none of that addresses my second point above: that this reduces hackability.

After some people commented on this on the Google Group, I wrote this reply, which I will also copy here.

This idea has nothing to do with empty.html itself, it really applies to all single-file wikis. All wikis could benefit from this because the core tiddler is exactly the same in all wikis for each version, no matter which other plugins are added. The core is just another plugin, that's all.聽[Everyone involved with this issue already knows this -- it was just part of the email.]

Unzipping should be very simple in terms of CPU usage, and it should be linear, or better, because it is just expanding back to the original file content. The zipping is where the time comes in, but that isn't much more. TiddlyWiki 5.1.20 takes 70 ms, which is shorter than the minimum resolution of the human brain (~100 ms), so it would be instantaneous for only one or two plugins, but we could also handle all gzipped plugins in one operation instead of one per plugin, which would probably safe some extra space.聽

We could also add a config tiddler to the wiki which would allow the user to gzip all installed plugins to save even more space regardless of whether the plugin has the compress:yes field set, since nothing about the gzip process would in any way affect the plugins.聽

We should also add a "gunzip" flag (like the safemode flag), which would decompress it in place and then immediately re-download the file so it could be edited by hand in an emergency. The file would then be loaded like usual, but skipping the gunzip step, and then next time it would save with the contents gzipped again. This would be sort of like a safemode flag that would allow someone to edit the compressed plugins by hand if necessary.聽

Ok, it is not possible for a file loaded from a file URL to access itself.

The other option would be to use the source file, and require the uploaded content to be identical, so the user would open the page with the flag, upload the source file to the page, then the page would use it's own mechanisms to first make sure that the gzipped content in the source file matches the gzipped content of the page, then unzip it and replace the source file zipped content with the unzipped content and offer that as a new download. It sounds complicated but it's not -- you just have to make sure the gzipped content in both is identical. The main reason for this two step process is to make sure nothing else breaks.

This should be available whether all gzipped plugins are one JSON object or whether each is in it's own operation. But it would be better to have them all together because the result will be smaller, and the end user will never notice the difference under the hood anyway, only plugin authors if they load a broken plugin that is actually broken in the plugin itself and not because of an over-ridden shadow tiddler.

But seriously, does anyone ever manage to package a broken plugin that they don't have the source files for? This is literally the only case where this flag is ever necessary. I guess since you can edit the plugin tiddler JSON object itself it is totally possible, but that is the only case I can think of where you would actually need to use the gunzip flag.

The new requirement to decompress a plugin before processing it places a significant burden on third party tools that work with plugins, and makes quick hacks a lot harder to make. I wonder if this isn't one of the cases where hackability trumps potential minor performance wins.

NodeJS includes a gzip library in core, so all node tools have this instantly done in one line of code. Something like this untested code:

//Parse to an array of fields objects (I like typescript)
let comped: ($tw.Tiddler["fields"])[] = JSON.parse(zlib.gunzipSync(Buffer.from(compressedArea, "base64")).toString("utf8"));

But seriously, does anyone ever manage to package a broken plugin that they don't have the source files for? This is literally the only case where this flag is ever necessary. I guess since you can edit the plugin tiddler JSON object itself it is totally possible, but that is the only case I can think of where you would actually need to use the gunzip flag.

I guess if you load a plugin that completely breaks your wiki you would need to do this, but we can easily include comments on how to do this in the source code in case the user tries to edit the file by hand instead of just using safe mode.

I'm thinking it should be placed in the DOM tiddler parser $tw.modules.define("$:/boot/tiddlerdeserializer/dom","tiddlerdeserializer",{...}) and probably check for it before calling extractTextTiddlers.

https://github.com/nodeca/pako looks like a pretty good library, and only weighs 44 KB. It would add itself to window.pako.

And for base64 encoding we can use https://github.com/beatgammit/base64-js which is a mere 2KB. For decoding, it uses Uint8Array if available, otherwise it uses a regular array.

@Arlen22 ... pako looks good.

... But none of that addresses my second point above: that this reduces hackability.

hmmm, If the compressed core would contain the $:/core plugin, there should be no difference.

  • $:/core is a json file, that creates a lot of shadow tiddlers.
  • Those shadow tiddlers can be cloned and converted to tiddlers.

    • Modified core modules will NOT be saved back to the compressed core.

  • At startup the tiddlers would overwrite the shadow tiddlers.

If the core is uncompressed at boot-time, prior to initializing it, there should be no difference for the system.

... Ok, it is not possible for a file loaded from a file URL to access itself.

IMO there is no need to reload something. ... The $:/core.zip is already part of the single-file-wiki. And if someone changes a shadow tiddler, it will be converted to a system tiddler. The same thing is done now.

I'm thinking it should be placed in the DOM tiddler parser $tw.modules.define("$:/boot/tiddlerdeserializer/dom","tiddlerdeserializer",{...}) and probably check for it before calling extractTextTiddlers.

Honestly the preferred method would be to do this in loadBrowserTiddlers just before loading the store area. The reason for this is that I think TiddlyWiki relies on Javascript Object property order to be in the order it is inserted, but I could be wrong. Is there anything that specifically causes one plugin to be initialized before another?

Hi @Arlen22

It sounds like you are thinking in terms of gzipping the entire store area. I interpreted your original request to be to do the gzipping at the plugin level. I am not in favour of gzipping the entire store area because it removes a key user benefit: the ability to do a view source and verify that the expected content is visible.

Is there anything that specifically causes one plugin to be initialized before another?

We use a string "plugin-priority" field, falling back to sorting by title. See https://github.com/Jermolene/TiddlyWiki5/blob/master/boot/boot.js#L1325-L1339

And for base64 encoding we can use https://github.com/beatgammit/base64-js which is a mere 2KB

We already have a base64 decode library in the core (https://github.com/Jermolene/TiddlyWiki5/tree/master/core/modules/utils/base64-utf8) plugin, but of course at present it isn't available in the bootup process.

https://github.com/nodeca/pako looks like a pretty good library, and only weighs 44 KB. It would add itself to window.pako.

Pako is actually one of the underlying libraries in JSZip.

But my main issues are:

  • Compromising hackability by requiring anybody reading plugins to be able to unzip (people writing plugins would presumably be able to do so in the un-gzipped format)
  • So much development effort and testing for such a meagre return. And what is proposed here really is an enormous amount of work and has a huge footprint: it means bringing another library into the boot process, for example. Meanwhile, most users won't notice the difference, and those that do care can already use external tools to compress TW5 files

You can read plugins when you load the TiddlyWiki in the browser, just not using view source. I don't mean all tiddlers or even all plugins, but it would save more space to put all opted-in plugins into one gzip file vs gzipping each of them separately. Especially for larger plugins like code mirror.

As far as which library to use, I want to use the one that is the most consistent and simple, and not add a ton of other options. But maybe we should go with something a little larger that supports several formats. I could see either one having benefits.

We need to use a base64 library that converts to/from a byte array, which the base64-js library does. Including Buffer-type string transformations from there is not directly related, I think.

I still think including the gunzip safety flag is important, even most people would just use the safe mode flag instead anyway. I don't expect the gzip implementation itself to have problems, but for that reason, it should only be used with packaged plugins, not the entire store, just so there are no extra problems or edge cases that the gzip library somehow doesn't know what to do with.

I realize it's a lot of work and the more I investigate the idea, the more I realize just how much work it actually is. Maybe it's something that should be kept in mind when we rewrite the boot loader. Making the boot-loader async, and task-based, would go a long way, especially if external code in the page could insert itself into the boot process at any desired point in order to modify the boot behavior.

Maybe TiddlyWiki development needs to focus on other things for now as there have been a lot of features and bug-fixes lately and the current push may not be finished yet. This is a more serious and invasive surgery and may need some more planning yet. I'll continue to keep it in mind.

That's just my thoughts at a rather late hour of the night.

Thanks @Arlen22. I do have on my back burner the need to upgrade the store area format used in the standalone HTML format so that we can represent field values containing new lines. When that comes up we should perhaps revisit this idea.

hear in the hope might provide some inspiration to hack is
some interesting example of compressing/decompressing web content ( in browser && @ command-line )
@ https://github.com/alcor/itty-bitty
...

Not sure how it happens, but every time I read the tiddlywiki feed of issues I end discovering some new crazy ideas.
This thread is very interesting, and what @dubiouscript linked is an awesome idea too.
I like the idea of opt-in compression, specially for attachments which is a lot of bloat and a common user requirement.

Was this page helpful?
0 / 5 - 0 ratings