Gitea: Server-side syntax highlighting

Created on 3 Aug 2019  路  18Comments  路  Source: go-gitea/gitea

Syntax highlighing is currently performed client-side which has a few drawbacks:

  • We need to ship a 700kB library to the client which hurts performance
  • There is a flash of unhighlighted code on every page load
  • Language detection of highlight.js does not work based on filenames but on the code itself, giving a lot of false highlights (we could probably fix by mapping more file extensions, thought).

It may be worth investigating integrating https://github.com/alecthomas/chroma as server-side highlight. It supports language detection based on filename out of the box.

kinenhancement revieweconfirmed

Most helpful comment

Easier to use chroma directly for standalone files and diff but thatt markdown plugin is workoing well for markdown code blocks and fix some current issues (bad auto detection etc...).

I got normal files working easily and with removing a lot of our messier code around that...diffs almost done trying to clean them up a bit while in there too

All 18 comments

I would like to have this but it needs to be checked on how that performs, especially on large files and diffs.
For language detection src-d/enry could be used that I'm using also for language stats

src-d/enry

Chroma has such a filename/extension detection built-in and I would prefer those to be bundled together because if we use another module, those will eventually get out of sync and we'd have to start remapping languagues a lot.

I don't think content-based detection is actually necessary, GitHub doesn't do it too.

Github does that using linguist imho same that enry is based on

Github does that using linguist imho

I don't think GitHub does any content-based detection. Check out this file, it is YAML but not highlighted.

how that performs

I tested with the sqlite amalgamation which takes around 20s to render in the Chroma playground. I'd say its somewhat comparable to our rendering which in total takes around the same time, while syntax highlighting alone is a bit faster (10s), but the page flashes empty content multiple times for some reason.

Github does that using linguist imho

I don't think GitHub does any content-based detection. Check out this file, it is YAML but not highlighted.

I think it does: https://github.com/go-gitea/gitea/blob/801843b0115e29ba2304fa6a5bea1ae169a58e02/contrib/init/debian/gitea

Yaml is just hard to have content detection on as it does not have specific keywords to detect just structure

Interesting. I think that is some rudimentry detection based on shebang for shell scripts. Certainly a good thing to try on extensionless files.

I think generally we want to try detecting the language in this order:

  1. Detect based on file name
  2. Detect based on file extension
  3. Detect based on content

Chroma looks to combine steps 1 and 2 into one function.

src-d/enry does language detection based on:

var DefaultStrategies = []Strategy{
    GetLanguagesByModeline,
    GetLanguagesByFilename,
    GetLanguagesByShebang,
    GetLanguagesByExtension,
    GetLanguagesByContent,
    GetLanguagesByClassifier,
}

from various syntax highlighting language experiences (pygments,chroma,vim,linguist (old tmlanguage, rouge) I really would recommend taking a look at rouge which is used by gitlab and has a really good and extendable highlighting interface that also accepts rather complex statements.

@ukos-git it's in Ruby so we can't really use it

@ukos-git that's a ruby library but Gitea backend is written by Golang

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

One more benefit which server-side syntax highlighting will bring is the ability to highlight diffs. This is not possible client-side because higlight.js can not highlight arbitrary code fragments.

I've found some missing syntax-highlighting and started to update the now outdated highlight.js because it is listed in https://github.com/go-gitea/gitea/blob/master/docs/content/page/index.en-us.md.
After finding out that this doesn't help because it still not support the language (I could have checked the highlight.js changelog up front...) and then also to find out that it will vanish "soon" for in reasonable favor of server-side highlighting.

This issue has no milestone attached, is there any more detailed plan if/when this will be implemented?
Would it be reasonable to already add a deprecation note to https://github.com/go-gitea/gitea/blob/master/docs/content/page/index.en-us.md already, referencing this issue?

Chroma does "work" with my initial issue btw...

One note to the "performance" mentioned above - a bad performing client (old computer/tablet) will be obviously have worse rendering in highlight.js for big sources, and I'd say it is easier to have one place (the server) upgraded than all clients accessing gitea.

Considering this goes back to 2017 and how important syntax highlighting is for a project like this, is it possible to set some milestone?

I actually started working on this yesterday will try and have a PR soon

There's goldmark plugin that does Chroma highlighting. Maybe it's integratable with standalone Chroma for the file views.

Easier to use chroma directly for standalone files and diff but thatt markdown plugin is workoing well for markdown code blocks and fix some current issues (bad auto detection etc...).

I got normal files working easily and with removing a lot of our messier code around that...diffs almost done trying to clean them up a bit while in there too

Was this page helpful?
0 / 5 - 0 ratings