Bat: Relax Downloads policy for new syntaxes?

Created on 5 Oct 2020  路  8Comments  路  Source: sharkdp/bat

As @sharkdp suggested.

If you like, you could open a new ticket to discuss the 10k downloads policy. I'm pretty sure it's not ideal, but I wanted to have something to limit the amount of syntaxes that we need to maintain. bat startup speed is also a big concern.

question

Most helpful comment

@keith-hall Great idea! We might even be able to do that as part of the asset compilation, actually. I'm not as familiar with syntect as you are, but if there's a way to extract the first line mappings and syntax name, we can generate the syntax yaml on the fly as part of bat cache and load it into the final syntax set before serializing it.

All 8 comments

Startup speed is still a large concern, but I'm open to the idea of aggregating other download metrics together to get the 10k minimum.

For example, that would allow the following to meet the criteria:

3000 downloads from JetBrains plugins
7000 downloads from Sublime

It would have to be considered on a case by case basis, however. I wouldn't consider npm to be a valid metric, since something like a templating language could be a dependency of a larger and more popular framework, yet go unused by most projects installing the framework.

We could also think about other (additional) policies like: the syntax can be added if GitHub also supports it.

Startup speed is still a large concern

We should probably measure the actual slowdown when adding more and more syntaxes. Shouldn't be too hard with some scripting and the available set of submodules.

@eth-p what's affecting the startup speed so much? Is it the single syntaxes.bin where they are all merged in?
And would it make sense to just split that in java.bin, python.bin... and load on demand?

We did some digging in #951, and the majority of the startup time is spent deserializing the syntaxes.

And would it make sense to just split that in java.bin, python.bin... and load on demand?

It's always possible, but it would likely require a non-trivial amount of refactoring and possibly even a few upstream changes to do it, though. Most of the syntax detection (e.g. by file name, by first content line) is handled by Syntect, which relies on the information provided from the .sublime-syntax files which get serialized into syntaxes.bin.

Splitting the syntaxes.bin file up would likely require us to build a mapping of that information, manually resolve which asset set it belongs to, and then deserialize that set. To complicate it further, we would probably have to figure this all out before starting any of the printing... which means we need to add a syntax resolution stage to bat's setup (and that might involve opening all the files passed as arguments).

Basically, the main thing stopping someone from adding it is the significant amount of time, effort, upstream collaboration, and regression testing behind it. The silver lining is that if we did do it that way, we could also easily add syntax detection support for nix-shell (#684) though.

Probably not worth it then since the current solution is already quite good enough. I expect that the download limit / current set of embedded syntaxes won't be seen as a problem any more when the process of adding them gets a bit simpler.

We could also think about other (additional) policies like: the syntax can be added if GitHub also supports it.

:+1:

Another possible criterion: a language is popular on Repology.

Regarding nix-shell syntax detection support, @eth-p, I was thinking it could potentially be solved with a new syntax definition specifically for nix-shell files which would read the second line shebang and push into the relevant syntax, similar to how the Markdown syntax definition works with embedded code fence blocks.

To simplify it in terms of maintenance (with needing to have a rule for each "interpreter" we want highlighting support for, and potentially wanting to ensure any newly added syntax definitions are easily included) I think we could auto generate it with a script, which would look through each .sublime-syntax file, check if it has a first_line_match which matches a shebang line, and generate the relevant match rule to handle the nix-shell shebang line and set which language the rest of the file will be highlighted with.
(I suspect we would probably want to maintain or automate that mapping anyway, whichever solution we choose.)
I'm not yet sure how it would work with custom syntaxes the user has installed/configured - ideally it would cater for those too...

@keith-hall Great idea! We might even be able to do that as part of the asset compilation, actually. I'm not as familiar with syntect as you are, but if there's a way to extract the first line mappings and syntax name, we can generate the syntax yaml on the fly as part of bat cache and load it into the final syntax set before serializing it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samuelcolvin picture samuelcolvin  路  3Comments

niedzielski picture niedzielski  路  3Comments

lilyball picture lilyball  路  3Comments

sharkdp picture sharkdp  路  3Comments

rien333 picture rien333  路  3Comments