Vimtex: [Feature] Full syntax support in vimtex

Created on 18 Sep 2020  路  19Comments  路  Source: lervag/vimtex

This action was discussed in #1781. Here I summarize the tasks and actions for moving forward. There are two phases.

Phase 1

The changes in phase 1 should not require any significant changes to the current vimtex syntax code. During development, the option g:vimtex_syntax_alpha will be defined as a temporary option that users can enable (set to 1) to test the WIP code.

  • [x] Check if it is possible to use vim-tex-syntax as a starting point.

    • Is there a licensing problem?

    • Will it require significant changes to current vimtex syntax scripts?

  • [x] Use current version of the TeX syntax scripts (or vim-tex-syntax) as a starting point. That is, fork previous work and adopt into vimtex.
  • [x] Clean up code, make things consistent with vimtex styles.
  • [x] Possibly remove legacy stuff.

Phase 2

This phase brings breaking changes. The g:vimtex_syntax_alpha option will be removed, and the development should be a separate branch and a PR.

  • [ ] Update configuration -> vimtex configuration schemes. E.g. let g:vimtex_syntax_config = {'conceal': ...}.
  • [x] Ensure syntax script is loaded _after_ filetype scripts (cf. #1809).
  • [x] Move current vimtex syntax scripts from being after/syntax/... to being loaded properly on startup.
  • [ ] More clean up and improvements to code.
  • [ ] Optimize things and try to improve efficiency as much as possible.
enhancement

All 19 comments

I really think you should leave the syntax support to a language server such as texlab and a suitable language client. All that vimtex then needs to do is to work on documenting how to integrate with the language server i.e. which features to turn off when lsp is detected. The syntax support of LSP is based on semantics unique to each particular coding language and can't do better with complicated/slow regexes!

I partly agree, in that I also see the benefits of relying on the tree-sitter paradigm for syntax. However, I plan to keep support for Vim, and tree-sitter is currently only available for neovim (nightly). Thus I don't think that this is currently a possible option.

Note that

  1. Semantic highlighting is not yet part of the LSP specification, and texlab doesn't support it (yet). And even if it did, you'd need a client that supports it (which the neovim built-in client doesn't, until it becomes part of the spec -- and even then it might not in favor of tree-sitter).
  2. Tree-sitter requires a LaTeX parser, first and foremost, and none (that are maintained and sufficiently robust) exist yet. So you'd have to write one from scratch. I've looked into it, and it is not a trivial thing since LaTeX is not a fully specified grammar -- as the name implies, you need to define a syntax _tree_ top-down rather than just a bunch of individual elements via regular expressions.

So even as an LSP/tree-sitter enthusiast, I agree that (better) vim syntax files are needed even in the long run.

@Konfekt You seem to have some experience with https://github.com/gi1242/vim-tex-syntax -- can you share some experiences?

Alas, few experiences to be shared other than adding regular expressions to conceal strings.

Since vimtex plans to continue to support Vim, keeping compatibility to the options (listed in :help tex.vim, for example g:tex_conceal as well as g:vimtex_syntax_config = {'conceal': ...}) of the built-in TeX syntax scripts (apparently dropped by vim-tex-syntax would ease adaption.

Since vimtex plans to continue to support Vim, keeping compatibility to the options (listed in :help tex.vim, for example g:tex_conceal as well as g:vimtex_syntax_config = {'conceal': ...}) of the built-in TeX syntax scripts (apparently dropped by vim-tex-syntax would ease adaption.

I'm curious: Why would that ease adaption? I mean, I agree we might want g:tex_conceal to specify which parts to conceal (although perhaps we could conceal all or nothing?). But I want to assume g:tex_fast is unnecessary. It would simplify a lot, and I think it would not be a problem.

I'm pushing a first iteration of this now. I looked at vim-tex-syntax, but it did not look viable since the syntax tests crashed when I based off of it. Instead, I've forked from Dr. Chips syntax plugin (version 119). I did the following modifications:

  • General clean up and improvement to syntax.
  • Removed support for four options that I don't think we need:

    • g:tex_fast

    • g:tex_no_math

    • g:tex_no_error

    • g:tex_nospell

I've added an option g:vimtex_syntax_alpha which if enabled will enable the new syntax plugin.

Questions:

  • As part of phase 1 I want to migrate the options to vimtex "style". As suggested by @Konfekt, I can let the old variants be respected in the sense that I'll use them if they exists.
  • Since this is pre alpha, I don't want to add docs until things are more ready. When we reach phase 2, I will branch things off. The docs will be written at that stage. Does this sound OK?
  • I want to remove the g:tex_fold_enabled option and the syntax based folding. Opinions?
  • Can I remove g:tex_comment_nospell? I think it makes sense to remove it.
  • Can I remove g:tex_verbspell? I think it makes sense to remove it.
  • If anyone has some competence on vimscript and syntax files: It seems to me that the texMatcherNM is _not_ useful at all and can be simply deleted. Am I wrong? (Is NM short for no math, perhaps?)
  • Finally, do anyone have a good idea for how to test the performance of the syntax script?

Really excited about this 馃憤

  • I've never used any of the syntax options at all, and they do not sound relevant to me (which is just a single data point, of course).

  • You can profile the syntax using syntime on, <c-l>, syntime report

  • You might have seen (and put aside for now) https://github.com/lervag/vimtex/issues/1800#issuecomment-698430177 regarding possible changes to make the syntax (or at least highlight) groups more consistent; in particular, texMathMatcher[NM] looks like a fall-back group?

Apologize in advance to again take a slightly pessimistic attitude about the effort vs near-future redundancy tradeoffs.

Tree-sitter is coming to both vim and neovim by late 2021 or so. Afterwards, there maybe someone who shows interest in writing a tree-sitter syntax for latex. Is it worth the effort to rewrite the syntax files now?

@krishnakumarg1984 As I wrote, a latex parser is not trivial (I looked into it), and in any case there's also vim (this is the first time I've heard of any plans of tree-sitter for vanilla vim).

I'm curious: Why would that ease adaption? I mean, I agree we might want g:tex_conceal to specify which parts to conceal (although perhaps we could conceal all or nothing?). But I want to assume g:tex_fast is unnecessary. It would simplify a lot, and I think it would not be a problem.

True, I was also mainly thinking g:tex_isk and g:tex_conceal to be kept.
The no_spell option can be removed; verbspell however could be useful.

I want to remove the g:tex_fold_enabled option and the syntax based folding. Opinions?

Remove the syntax based folding. For compatibility, (if vimtex_syntax_alpha is set) enable expression based folding if g:tex_fold_enabled = 1.

It seems to me that the texMatcherNM is not useful at all and can be simply deleted. Am I wrong? (Is NM short for no math, perhaps?)

From all the found occurrences of texMatcherNM, it seems redundant as, for example, in the double declaration of syn cluster texPreambleMatchGroup:

" /usr/share/vim/vim80/syntax/tex.vim (lines 170-173)
syn cluster texPreambleMatchGroup   contains=texAccent,texBadMath,texComment,texDefCmd,texDelimiter,texDocType,texInput,texLength,texLigature,texMatcherNM,texNewCmd,texNewEnv,texOnlyMath,texParen,texRefZone,texSection,texSpecialChar,texStatement,texString,texTitle,texTypeSize,texTypeStyle,texZone,texInputFile,texOption,texMathZoneZ
syn cluster texRefGroup         contains=texMatcher,texComment,texDelimiter
if !exists("g:tex_no_math")
 syn cluster texPreambleMatchGroup  contains=texAccent,texBadMath,texComment,texDefCmd,texDelimiter,texDocType,texInput,texLength,texLigature,texMatcherNM,texNewCmd,texNewEnv,texOnlyMath,texParen,texRefZone,texSection,texSpecialChar,texStatement,texString,texTitle,texTypeSize,texTypeStyle,texZone,texInputFile,texOption,texMathZoneZ

@clason

Really excited about this +1

:)

* I've never used any of the syntax options at all, and they do not sound relevant to me (which is just a single data point, of course).

One data point is more than zero. In the end, everything boils down to subjective opinions on what is "good enough", and as a competent user I value your opinions!

* You can profile the syntax using `syntime on`,  `<c-l>`, `syntime report`

Ah, thanks, that's useful!

* You might have seen (and put aside for now) [#1800 (comment)](https://github.com/lervag/vimtex/issues/1800#issuecomment-698430177) regarding possible changes to make the syntax (or at least highlight) groups more consistent; in particular, `texMathMatcher[NM]` looks like a fall-back group?

Yes, I've noticed #1800; I've not actively joined that discussion yet. I agree it does overlap, and I also agree with you to make things more consistent. I aim to go in that direction, but I want to do it stepwise in a way that does not interfere with peoples work flows.

@krishnakumarg1984

Apologize in advance to again take a slightly pessimistic attitude about the effort vs near-future redundancy tradeoffs.

Tree-sitter is coming to both vim and neovim by late 2021 or so. Afterwards, there maybe someone who shows interest in writing a tree-sitter syntax for latex. Is it worth the effort to rewrite the syntax files now?

Thanks for the opinion. I think @clason has a good point in that this is probably not as "trivial" as we should hope. Further, when tree-sitter arrives, be it even early 2021, it would still take years before it has reached "most users". People are still using Ubuntu 16.04, which has barely reached Vim 8! I like being an arly adopter, but at the same time, I think vimtex should keep being a little bit conservative in this regard.

@Konfekt

True, I was also mainly thinking g:tex_isk and g:tex_conceal to be kept.
The no_spell option can be removed; verbspell however could be useful.

I'm not sure I agree that verbspell is useful. I'm pretty sure almost noone uses it (or is even aware it exists). But I agree on both g:tex_isk and g:tex_conceal, and I will keep compatibility with these options. Perhaps also a couple of other minor options.

Remove the syntax based folding. For compatibility, (if vimtex_syntax_alpha is set) enable expression based folding if g:tex_fold_enabled = 1.

No, I don't want that type of compatibility, because it would be confusing. If you do :help g:tex_fold_enabled, it would state "syntax folding", and if vimtex then applies expression folding it could lead to unwanted issues.

From all the found occurrences of texMatcherNM, it seems redundant as, for example, in the double declaration of syn cluster texPreambleMatchGroup: ...

Agreed. I'll probably remove it.

No, I don't want that type of compatibility, because it would be confusing. If you do :help g:tex_fold_enabled, it would state "syntax folding", and if vimtex then applies expression folding it could lead to unwanted issues.

Agreed, a sensible stand.

@krishnakumarg1984

Tree-sitter is coming to both vim

Oh, are you aware of an active effort to integrate tree-sitter with vanilla vim?

RFC: Syntax primitives
In the (hopefully near) future, I'll make quite large changes to the syntax definitions. In that regard, it would be very useful with some discussion about what would be a natural set of _primitives_. @clason made some very relevant and good remarks in #1800 comment that partly relate to this. I'm thinking something along the lines of:

  • texComment
  • texCite
  • texSection
  • texEnv (environments, inconsistently matched now as e.g. texStatement and/or texBeginEnd)
  • texCmd (commands, currently named texStatement)
  • texDelimiter

There will probably be more primitives. But some current "primitives" might be unnecessary, such as texAccent and texLigature?

What do you think? A well thought out definition of the primitives would be very useful in the planned restructuring.

Sounds good; but I'd like to propose to distinguish between the command, argument, and (where applicable) optional arguments -- so you'd have begin, {theorem} (with or without the {}) and [title], different primitives. Similarly, section and {A section} would have different highlight groups.

One point of consistency to look for is whether the \ or the {/} are part of the command highlight group, the name highlight group, or a separate highlight group (which is not consistent now, see #1800).

I don't see any points in, e.g., texAccent or texLigature, either; they might be needed/used internally? (Which is fine, as long as a smaller, consistent, "documented API" is available.)

I've now successfully simplified quite extensively. Things are still inconsistent, and right now the \section family of commands are not specially highlighted. But the texDocZone and similar are also gone. This makes it much easier to work with the syntax, because we don't need to ensure that new rules are contained in the various clusters.

I'll continue this type of iteration for some more time, but then I think I want to enter phase 2. This phase means a branch, because I will fully merge the vimtex specific additions with the new main syntax rules. I'll release a minor version or two before this new branch is merged, and then a major release after the branch has merged.

The major changes to syntax names and grouping can not really start before phase 2, because the "vimtex hacks" I've put on top of the old syntax file "relies" on the old group names and similar.


Some more thoughts on the basic ideas, just to throw out my current set of ideas.

I want to remove some of the old specific stuff like the texAccents and so on (except when they're used for conceals). I want to add a naming scheme, e.g. texCmdAccent, texCmdStyle, i.e. tex{main}{sub} where {main} refers to the main class of syntax element and {sub} refers to some sub class. Similar for texEnv.... Environments will generally _not_ make syntax regions, except when necessary (e.g. math environments, tikz, and so on).

I also want to add support for the distinction specified by @clason. This must typically be added for each specific command (e.g. \usepackage takes an optional option list [...] and a package list arg {...}).

Ok, I've made one more simplficiation, but I'm hitting the wall were changing things require that I also change the existing vimtex syntax additions. My plan is therefore now to enter phase 2. This means the following:

  • I'll remove the g:vimtex_syntax_alpha option and the master branch will "revert" to the old style.
  • I'll start a new branch in which I'll merge everything and make things consistent.
  • When the branch has matured so that the "old" syntax script works smoothly with the vimtex additions, I'll let everyone know so they can start to test things. And then I will open the discussion about how we should start to change things for the better, e.g. along the lines specified by @clason.

I'll give this a couple of days before I start it in case someone has some useful thoughts and comments.

Btw: I plan to make a release _before_ merging the "phase 2" branch. Further, I think merging "phase 2" should be a major version release. As I'm not so used to enforcing versioning, I also appreciate comments on these things. For instance, does it make sense to make two releases in such a "rapid" succession? It will be something like v1.6 includes everything before merge, then v2.0 is v1.6 + syntax.

I've opened a branch and PR: #1834. It is obviously still WIP. I'd like the discussion to continue there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lervag picture lervag  路  5Comments

krissen picture krissen  路  5Comments

chakravala picture chakravala  路  5Comments

Davidnet picture Davidnet  路  4Comments

nbanka picture nbanka  路  5Comments