neovim 🚀 - Semantic Highlighting, Folding

Though possibly very handy, I really think this think will be a job for the new high-powered plugin API. Let's not bloat the core with AST management code. With a new vim community approved package manager, it should then be easy to see which plugin provides the best support.

aktau on 14 May 2014

Right. I understand, but the core wouldn't have to deal with AST management code. It wouldn't even know what an AST is. It would only have to provide an additional syntax-defining option. Namely the ability to specify syntax groups based on _position_ in the document. For example (and this is not a good example, but it was the first one I could think):
var foo = value
funcall (bar, foo)
If I want to add the first foo to a syntax group called Identifiers and the second to a group called Parameters it would be hard. Currently with Vim you can only define syntax groups through keywords, a regex match and a region (based on regexes too). See http://vimdoc.sourceforge.net/htmldoc/syntax.html#:syn-define. This particular example can be specified with regexes, but it would be probably buggy and slow to customize every little specifics that I want with this approach.
Thanks for the prompt reply

stellarhoof on 14 May 2014

Someone suggested the use of LPeg to specify syntax highlighters on the
mailing list.

I would like to see that and I think it has a place in the core even though
there is a lot of stuff do be done/improved before we can get to that.
On May 14, 2014 1:05 PM, "Alejandro Hernandez" [email protected]
wrote:

Right. I understand, but the core wouldn't have to deal with AST
management code. It wouldn't even know what an AST is. It would only have
to provide an additional syntax-defining option. Namely the ability to
specify syntax groups based on position in the document. For example (and
this is not a good example, but it was the first one I could think):
var foo = value
var foo = value
funcall (bar, foo)
If I want to add the first foo to a syntax group called `Identifiers

—
Reply to this email directly or view it on GitHubhttps://github.com/neovim/neovim/issues/719#issuecomment-43100654
.

philix on 15 May 2014

Namely the ability to specify syntax groups based on position in the document

:+1:. That would go well with the new plugin system: get the buffer text, feed the language parser and update the syntax rules based on positions. This could be done on cursorhold autocommands and since everything is running in another process, it wouldn't block the UI.

Someone suggested the use of LPeg to specify syntax highlighters on the
mailing list.

I would like to see that and I think it has a place in the core even though
there is a lot of stuff do be done/improved before we can get to that.

We could make use of lpeg for highlighting with the scheme suggested by @azure-satellite(a 'lpeg highlighter' plugin). Some high-level languages expose their own parsers are libraries so lpeg wouldn't be needed in those cases.

tarruda on 15 May 2014

I completely agree that the current syntax highlighting ability in Vim is light years _behind_ most "modern" editors. If neovim could do AST syntax highlighting, and do it in a separate process no less, that would be incredible. I'm currently struggling with large JS files being extremely laggy while it's trying to syntax highlight.

Two things I'd like to bring up for discussion:

First, I'd love to keep in mind is that there's an easy way for _other_ plugins to register syntax highlighting requests as well, so that something like Syntastic could layer highlighting on top of the base layer.

Secondly, I'm not sure how this would work with parsing the documents, but with the current Vim syntax, it's really silly that embedded languages inside of another document type must be duplicated. Here are a couple of examples:

<div class="my-html-tag">
  <%= my_erb_variable %>
</div>

class MyCoffeeScriptClass
  hi = `function() {
    return [document.title, "Hello JavaScript"].join(": ");
  }`

So in the first example, the base filetype is HTML, and then ERB is layered into the HTML. Inside the ERB tags, it's just Ruby. So it's seems silly that, if there's a perfectly good syntax highlighter for HTML and Ruby, the ERB highlighter wouldn't just be:

Highlight this file as HTML
Look at everything between the <%= %>
Highlight everything in there as Ruby

Similarly, for CoffeeScript:

Highlight this file as CoffeeScript
Look at everything between the ``'s
Highlight everything in there as JavaScript

Just my two cents speaking as someone who doesn't know a single thing about the internals of the Vim codebase. But in my opinion autocomplete, better plugin support (via a real programming language) and syntax highlighting are the top three places where I think Vim has the most catchup to do with "modern" editors.

jfelchner on 6 Jul 2014

@jfelchner note that adding some sort of highlighting on top of the base is already done by plugins like vim-easytags. Granted, it's absolutely dog-slow in its vimscript implementation, and just doable in the python version. This isn't really semantic though, it's based on what ctags has generated before it.

aktau on 6 Jul 2014

If I might necro this discussion:

it's really silly that embedded languages inside of another document type must be duplicated.

@jfelchner It is not necessarily so, even right now. For example, to implement your first example, one can currently use

syn include @RUBY syntax/ruby.html
syn region embeddedRuby start=/<%=/ end=/%>/ contains=@RUBY

(there's some more nuance to it, but these are the basics). The real problems are 1) the performance one can expect of this is not that great, and 2) there can be some conflicts with some misbehaved syntax definitions (embedding scale, for example, was not that great in this sense, and although it has been fixed, there is chance other syntaxes can raise problems too.).

Anyway, I was reading on parsing today, and found this article that discusses some approaches to the problem of embedded syntaxes, and it suggests PEG grammars have some limitations in regards to left side recursion that might make them not so general as initially conceived. Probably for most cases this won't matter, but I thought it was interesting to note in case there's interest to implement this approach for syntax highlighting in neovim.

fmoralesc on 21 Aug 2014

@fmoralesc That was an insightful article.

Although different communities have different approaches to the practicalities of parsing - C programmers reach for lex / yacc; functional programmers to parser combinators;

My observation has led me to believe that most language implementors just write their own parsers: bison/yacc is largely ignored except for prototypes or academic scenarios. I guess this is even less composable. Python is notoriously difficult to parse, if I recall.

simply squashing two grammars together is rarely useful in practice. Typically, grammars have a single start rule; one therefore needs to choose which of the two grammars has the start rule. More messy is the fact that the chances of the two grammars referencing each other is slight; in practice, one needs to specify a third tranche of data - often referred to, perhaps slightly misleadingly, as glue

I wonder if this is part of the enthusiasm behind homoiconic languages (lisp): composition of grammars and DSLs.

As for text editor highlighting implementation: I think the regex approach by Vim and Emacs has more merit than detractors admit. And, if you have syntax highlighting groups as Vim does, a semant can highlight tokens by position. So what does the editor have to do with it? Editors should not be in the business of semantic _anything_. But they should provide facilities for decorating tokens, and if this can be achieved by an editor-agnostic middleware like YCM: even better.

justinmk on 21 Aug 2014

Oh, yes.

I appreciate vim's approach to highlighting. Most people don't appreciate all vim's regexp engine can do. The matchaddpos() API is really neat and will surely be used a lot if people start writing out-of-process highlighters (I haven't seen one in the wild yet, I've experimented a bit but not too much). While this can be done currently with the clientserver architecture in vanilla vim or taking the client-server approach YCM currently has, it will be (and is) better with first-class support for async, as in neovim.

I agree the editor core should not be in the business of dealing with semantics in regards to highlighting (that is better handled by extensions), but I believe there are some points where it be smarter. For example, many times syntax regions or matches describe what could be handled as text objects, and it could be neat to breach the gap between those (Somewhat related: I believe everyone would agree isk and paragraphs are a mess to _use_.) At vim-pandoc we already have stuff like syntax assisted folding and syntax assisted text objects. Now, defining text objects for syntactic/semantic elements should be handled by extensions, but the editor might provide simpler ways to do so than what's currently available. Syntax assisted folds and text objects are currently a hack; I wish it was not so.

Now, for stuff I believe (neo)vim's core highlighting capabilities really need (from the perspective of consumers of the highlighting API) are

1) Support for transparent regions (I mean matches in general), to simulate "subclassing". For example, if I have a match that needs to be always in bold text, but it can appear in text which is both red and blue, I can't reuse the color from a covering region, I must define it again. In this sense, containedin and contains are underused. So you can see what I mean, try this:

syn region testR1 start=/{{/ end=/}}/ contains=testM
syn region testR2 start=/xx/ end=/xx/ contains=testM
syn match testM /this should be \(red\|blue\) and bold/ contained

hi! testR1 guifg=#ff0000 guibg=bg
hi! testR2 guifg=#0000ff guibg=bg
hi! testM guifg=NONE guibg=NONE gui=bold

finish

xx this should be blue and bold xx

{{ this should be red and bold }}

which is faulty.

Currently, to do it correcly you have to use

syn region testR1 start=/{{/ end=/}}/ contains=testM1
syn region testR2 start=/xx/ end=/xx/ contains=testM2
syn match testM1 /this should be red and bold/ contained
syn match testM2 /this should be blue and bold/ contained

hi! testR1 guifg=#ff0000 guibg=bg
hi! testR2 guifg=#0000ff guibg=bg
hi! testM1 guifg=#ff0000 guibg=NONE gui=bold
hi! testM2 guifg=#0000ff guibg=NONE gui=bold

which works, but is more cumbersome.

2) Make conceals sub-classable. Many different things can be concealed within the same view, but vim doesn't support to make any distinctions, and it always uses the Conceal group. Since Conceal is a default group, plugins can't override it (well, they can, but they shouldn't if the want to behave well with the users configuration), and since most colorschemes don't support Conceal, 9 out of 10 times the result of using conceals is awful.

I've been digging around vim's code to see what changes are needed for this stuff, but I believe it might be difficult to handle. Anyway, it might be worth taking in consideration.

fmoralesc on 21 Aug 2014

Purely as an FYI, one of the examples I intend to provide along with the neovim Go package is a syntax highlighter that uses Go's excellent go/parser package. This thread, and the one about matchaddpos() on Google Groups, will certainly help guide that.

myitcv on 21 Aug 2014

And, if you have syntax highlighting groups as Vim does, a semant canhighlight
tokens by position
https://groups.google.com/d/msg/vim_dev/WX10Jx8Paeo/o-TNHCZlOZIJ. So what
does the editor have to do with it? Editors should not be in the business
of semantic _anything_. But they should provide facilities for decorating
tokens, and if this can be achieved by an editor-agnostic middleware like
YCM: even better.

YCM is huge and I almost always disable it (probably for my lack of care in
configuring it). We should probably provide a "Lua LPEG Plugin" bundled
with Neovim so that we can have better syntax highlighting and folding out
of the box.

I tried doing some complicated highlighting with Vim syntax before and I
couldn't translate my understanding of the language CFG to the syntax
commands. I was never sure if something was impossible or if I was trying
to be more sophisticated with the highlighting of the language than it's
necessary.
On Aug 21, 2014 2:02 AM, "Justin M. Keyes" [email protected] wrote:

@fmoralesc https://github.com/fmoralesc That was an insightful article.

Although different communities have different approaches to the
practicalities of parsing - C programmers reach for lex / yacc; functional
programmers to parser combinators;

My observation has led me to believe that most language implementors just
write their own parsers: bison/yacc is largely ignored except for
prototypes or academic scenarios. I guess this is even less composable.
Python is notoriously difficult to parse, if I recall.

simply squashing two grammars together is rarely useful in practice.
Typically, grammars have a single start rule; one therefore needs to choose
which of the two grammars has the start rule. More messy is the fact that
the chances of the two grammars referencing each other is slight; in
practice, one needs to specify a third tranche of data - often referred to,
perhaps slightly misleadingly, as glue

I wonder if this is part of the enthusiasm behind homoiconic languages
(lisp): composition of grammars and DSLs.

As for text editor highlighting implementation: I think the regex approach
by Vim and Emacs has more merit than detractors admit. And, if you have
syntax highlighting groups as Vim does, a semant can highlight tokens by
position
https://groups.google.com/d/msg/vim_dev/WX10Jx8Paeo/o-TNHCZlOZIJ. So
what does the editor have to do with it? Editors should not be in the
business of semantic _anything_. But they should provide facilities for
decorating tokens, and if this can be achieved by an editor-agnostic
middleware like YCM: even better.

—
Reply to this email directly or view it on GitHub
https://github.com/neovim/neovim/issues/719#issuecomment-52878857.

philix on 21 Aug 2014

We should probably provide a "Lua LPEG Plugin" bundled with Neovim so that we can have better syntax highlighting and folding out of the box.

How would that work, though? Unless it worked at the level nvim/syntax.c currently does, or translated PEG grammars into vim's syntax language (which would be awesome, _if_ possible, but then again, if we had such translators there wouldn't be any issue in the first place), I don't see it as a very efficient solution. Worse yet: because of vim's regexp idiosyncrasies, the PEG parser would have to handle pretty weird stuff anyway (if it aimed at being as featureful as the current syntax code). For example, what would the equivalent for this be?

syn match FirstLine /\%1l.*$/

How does one say 'beggining of line', 'beginning of file', 'end of file' in a PEG grammar? (Genuinely curious, I haven't used PEG grammars).

I was never sure if something was impossible or if I was trying to be more sophisticated with the highlighting of the language than it's necessary.

I've also suffered from this. In the end, I realized most of the stuff I was trying to do was actually context sensitive in a way vim couldn't be expected to handle in any performant way (for example, 'match all strings delimited by if there exists somewhere in the file a line that matches (^[match]: .*). It's not practical to read the whole file for every match. For my issues, external highlighters would work wonders (I wished file change deltas could be sent from neovim to external highlighters, though, so such processes wouldn't require the file to be written to disk before processing). Anyway, yes, going from a proper grammar to vim syntax rules is quite the daunting task sometimes. It's best to approach it by brute-force, specially because one has to consider the parsing vim needs to do has pretty huge time constraints. If it takes 1ms to match a region, and the match is tried 1000 times, all is lost anyway.

fmoralesc on 21 Aug 2014

Also, @myitcv, that sounds awesome.

fmoralesc on 21 Aug 2014

Also, @myitcv, that sounds awesome.

I'm also very much looking forward to that. I believe Go can provide the speed, ease-of-use and portability necessary for an explosion of cool plugins for neovim. As one can notice, I'm quite enamored by it.

aktau on 21 Aug 2014

translated PEG grammars into vim's syntax language (which would be awesome, if possible, but then again, if we had such translators there wouldn't be any issue in the first place)

Not possible, PEG grammars are way more powerful than regular expressions. For example, it's probably not possible to express recursive constructs using regexes(but maybe vim regexes are more powerful, not sure about that)

How does one say 'beggining of line', 'beginning of file', 'end of file' in a PEG grammar? (Genuinely curious, I haven't used PEG grammars).

This kind of thing is normally done at API level, not at grammar level. When you use a parser generated from a grammar it always need to match exactly from beginning of the input string. For example, if you want a parser allow one or more spaces at the beginning, then this needs to be explicitly specified in the grammar.
What happens with most regular expression engines is that they will keep scanning until the string matches(unless a ^ is specified in the regular expression, which would require a match at the beginning)

Matching end of file(or end of input string) can be done by checking if the reduced start symbol ends at the last character

tarruda on 21 Aug 2014

maybe vim regexes are more powerful, not sure about that

Yeah, vim regexes don't support recursion, but _syntax rules_ do (regions containing matches, matches contained in other matches, etc.) or at least they can handle many cases where one would want to describe a recursive structure.

  syn match parRegion matchgroup=Delimiter start=/{/ end=/}/ contains=parRegion

  {   1  {  2 }  1 }

There are doubts PEG grammars really support left side recursion (that was the gist of the article I linked before). I'm not sure how much of a merely academic concern that is, though.

On the second point. There's a bunch of vim regex idioms that don't seem compatible with PEG grammar: \$^, \%$, \%V, \%#, \%'m, \%1l, \%1c, \%1v, my point was this would be an issue if someone tried to replace the current capabilities with LPEG. As I pointed out, extending it might be an option.

LPEG is attractive because one can compose the grammar, so it would be easier to handle multiple embedded languages in a buffer. I'm not so sure this approach would work. I'll refer you to the article again, section "Fine-grained composition", for more on that:

There can be subtle conflicts between the grammars, in the sense that the combined language might not give the result that was expected. Consider combining two grammars that have different keywords. Scannerless parsing allows us to combine the two grammars, but we may wish to ensure that the combined languages do not allow users to use keywords in the other language as identifiers. There is no easy way to express this in normal CFGs. The SDF2 paper referenced earlier allows "reject" productions as a solution to this; unfortunately this then makes SDF2 grammars "mildly context sensitive". As far as I know, the precise consequences of this haven't been explored, but it does mean that at least some of the body of CFG theory won't be applicable; it's enough to make one a little nervous, at the very least (not withstanding the excellent work that has been created using the SDF2 formalism by Eeclo Visser and others).

A recent, albeit relatively unknown, alternative are boolean grammars. These are a generalization of CFGs that include conjunction and negation, which, at first glance, are exactly the constructs needed to make grammar composition practical (allowing one to say things like "identifiers are any sequence of ASCII characters except SELECT"). Boolean grammars, to me at least, seem to have a lot of promise, and Alexander Okhotin is making an heroic effort on them. However, there hasn't yet been any practical use of them that I know of, so wrapping ones head around the practicalities is far from trivial. There are also several open questions about boolean grammars, some of which, until they are answered one way or the other, may preclude wide-scale uptake.

(Ironically, then, vim's contains=ALLBUT idiom might be extended to make the current system be more powerful than usual CFGs to handle this issue.) As to PEG:

Are PEGs the answer to our problems? Alas - at least as things stand - I now doubt it. First, PEGs are rather inexpressive: like LL and LR parsing, PEGs are often frustrating to use in practise. This is, principally, because they don't support left recursion; Alex Warth proposed an approach which adds left recursion but I discovered what appear to be problems with it, though I should note that there is not yet a general consensus on this (and I am collaborating with a colleague to try and reach an understanding of precisely what left recursion in PEGs should mean). Second, while PEGs are always unambiguous, depending on the glue one uses during composition, the ordered choice operator may cause strings that were previously accepted in the individual languages not to be accepted in the combined language - which, to put it mildly, is unlikely to be the desired behaviour.

fmoralesc on 21 Aug 2014

Now, I should add I'm _not_ against adding LPeg as an option for defining syntax highlighting, I'm just curious about how it would really compare to what's available now. I'm looking at Scintillua's lexers and it doesn't look that bad. Their lexer module, actually, is a nice reference on LPeg's use for this purpose.

fmoralesc on 21 Aug 2014

It would certainly be interesting to benchmark how a Go-plugin based syntax highlighting (using go/parser) performs vs the existing approach when the former is complete. I will certainly require some help/guidance with benchmarking the latter when I get to this point.

I wished file change deltas could be sent from neovim to external highlighters, though, so such processes wouldn't require the file to be written to disk before processing, though

I agree this needs to be answered, not only as part of this thread but as part of the MSGPACK API more broadly. Is there a thread where such requirements are being discussed?

This thread also picks up the topic of how YCM fits into the picture, or not as the case may be. In particular my latest post. My comments in that thread should absolutely not be interpreted as criticism of YCM. Rather I'm advocating an approach where Neovim plugins are very language specific. I fear that trying to make them language agnostic simply moves the problem of defining a suitably generic API from Neovim to the plugin, YCM or otherwise... at which point we're no further forward.

That said performance has to be one of the determining factors in all of this, so if some sort of hybrid approach performs better then we should at least consider that.

Continuing the Go example a bit further. Subject to how this performs, I'm roughly envisaging the following for a plugin, itself written in Go, that supports editing of Go files:

syntax highlighting via go/parser
indeed anything syntax-based (e.g. folds) be exposed via commands that ultimately take advantage of the go/parser part of the plugin. Code structure plugins like Tagbar can and should be driven from this too
completion via some integration of gocode; reuse the go/parser part of the plugin here
integration of Go oracle - again, reuse the go/parser part of the plugin here. Go Oracle will be exposed by a number of language-specific commands, e.g, pointsto. This particular integration is a great example of where trying to fit Go into a language agnostic API makes no sense to my mind
integration of the go toolset, e.g. gofmt, godoc, godef (some overlap here with oracle), test, govet, etc. Again, where possible reusing the go/parser part of the plugin

That's probably an incomplete list, a rather hurried brain dump at this point.

I'll repeat again, I'm still getting up to speed on all of this so please correct me where appropriate. If it turns out that integration of these types of functions via YCM is more performant for Go, then I'm all for considering how to work with it. Indeed, if the Go implementation is less performant, then it certainly gives a very good opportunity for feeding back to the Go core team.

However, I remain pretty confident that if architected properly, a Neovim plugin written in Go to support the editing of Go files:

is going to work
will be more performant
has the clear benefit of writing the plugin in the language being used for development
has the neat side effect of encapsulating all of the language specific elements within the plugin itself

myitcv on 21 Aug 2014

(I just remember we touched upon this on an old issue re: GUIs (esp] and on the mailing list.)

I believe the issue that should be tackled first is to define what responsibilities should reside where in these scenarios. In regards to highlighting: should neovim's core simply push redraw events to UIs and communicate file changes to the highlighters? How? Should it also be able to process the files and add the matches (like it does now)? If so, how to define syntaxes? Should it allow for vim's syntax dsl only, lpeg only, both? Should core highlighting reside in-process, or run as a different process? What of neovim's runtime? Just thinking aloud.

Out of process highlighters are interesting to think about, but we have to consider if they are really needed. For example: what kind of thing does the current highlighting for go not support? Why do you (@myitcv) think it's necessary?

fmoralesc on 21 Aug 2014

I believe the issue that should be tackled first is to define what responsibilities should reside where in these scenarios

I suppose my thinking out loud is really geared towards the same goal. Even if the Go-based plugin I outlined did not handle syntax highlighting, a parse step is still a prerequisite for every other feature I listed (albeit perhaps not requiring the same sort of response time). Hence if the plugin is going to this effort in any case, why bother go to the effort of parsing/matching elsewhere? Don't you duplicate effort?

Of course this ignores the overhead of the MSGPACK API calls and any shortcuts the main process can take.

Thoughts?

myitcv on 22 Aug 2014

As a very basic starting point for the aforementioned Go plugin, please see neovim-go. At the time of writing, this performs a very brute force syntax highlight of the keyword func in Go code edited in a Neovim using neovim-go. Clearly no Go syntax file is required. All syntax highlighting is externally driven by the plugin using the go/parser approach I outlined above.

This starting point is _very_ raw. Indeed I have highlighted a number of areas for improvement on one of the wiki pages

For this approach to work better, one of the key patches required from vim required is 7.4.330, matchaddpos(), as referred to in previous posts. On that note, is a there a list somewhere of the patches that Neovim is missing?

Feedback very much welcomed and appreciated.

myitcv on 22 Aug 2014

I was thinking I was going to hack this thing myself, as a proof of concept ;) Thanks a lot.

Would anyone mind if I worked on integrating 7.4.330 into neovim?

fmoralesc on 22 Aug 2014

integrating 7.4.330 into neovim?

Please feel free. Although it makes non-trivial changes to eval.c it's good for us to keep up with that so that we have a baseline for when we switch to the VimL translator.

justinmk on 22 Aug 2014

is a there a list somewhere of the patches that Neovim is missing?

http://neovim.org/doc/reports/vimpatch/

Also note that this is linked from the wiki page: https://github.com/neovim/neovim/wiki/Merging-patches-from-upstream-vim

justinmk on 22 Aug 2014

I've started work on the patches at PR #1107. The created executable is unresponsive, though, I'm trying to figure out why. Help would be appreciated.

fmoralesc on 23 Aug 2014

As a brief update on this thread, given that #1107 has been merged matchaddpos is available in neovim.

neogo is the updated plugin I mentioned above. It is a Go MSGPACK plugin written using the neovim package to read the contents of the buffer being edited, parse using go/parser into an AST, then spit out byte offset-based syntax highlighting commands using matchaddpos and matchdelete

Thanks for @fmoralesc for #1107

myitcv on 3 Oct 2015

Great demo, @myitcv! It's also very nice to see that you've been making progress on the go host.

fmoralesc on 3 Oct 2015

@myitcv That's awesome. How does it feel on large files? If user edits in the middle of some function, does the parser have enough state to tell the go code "only matchdelete() these nodes, and matchaddpos() these other specific nodes", or is it rebuilding the entire buffer on each keypress?

justinmk on 3 Oct 2015

@justinmk the current implementation is extremely inefficient for two reasons:

The conversation in #1114
neogo spits out matchaddpos/matchdelete commands for the entire buffer. This could easily by optimised to only spit out commands for the current view port (i.e. the user is currently looking at the contents from byte offset X to Y, hence only spit out highlighting commands that affect that region)

Even with a solution from #1114, the go/parser package will ultimately be reparsing the entire buffer each time. The package is extremely fast (takes ~35ms to parse a 20k file) so I don't expect this to be an issue most of the time, but still, this is another area where things could be improved.

myitcv on 3 Oct 2015

@myitcv On your second point, I guess you could _cache_ the matchaddpos commands for highlighting outside of the viewport. You would probably need to delete/update all the matchaddpos commands for matches after it (because inserts could make them invalid), though, and that might be complicated.

fmoralesc on 3 Oct 2015

@fmoralesc this is what I was thinking:

maintain an AST of the current buffer contents - easy
when the contents of the buffer update, update the AST - easy, but not efficient for a couple of reasons as discussed above)
only output matchaddpos commands for the current viewport (calculated from the current AST, which gives byte offsets of each token)
maintain a set of current matchaddpos commands sent to neovim
when the viewport changes, call matchaddpos/matchdelete as required based on the current matchaddpos commands

This all assumes I can ask neovim "what is the current view port in terms of byte offsets"... can I do that?

myitcv on 3 Oct 2015

"what is the current view port in terms of byte offsets"... can I do that?

The vimscript winsaveview() function combined with winheight('%') and winwidth('%') can give you the viewport, though of course in the future we will look to add first-class properties to the API.

justinmk on 3 Oct 2015

@justinmk That means we would need to notify viewport changes. Not sure how that would work along the UI infraestructure and @tarruda's proposed smart UI mechanism.

fmoralesc on 3 Oct 2015

Just to be clear, by "first class properties" I do not be properties that are "bound" (I think that would be a bad idea). I just mean API calls that do not require vimscript strings.

justinmk on 3 Oct 2015

@myitcv If you like you could test to use #1817 which intends to have better semantics and performance that matchaddpos for these kinds of usages. (e.g should scale better for lots of added highlights, allows batch deletions of lineranges of highlights). Though realistically it won't be finished/merged in the timeframe of 0.1, but I think its eligible for 0.2.
The documentation is currently missing, but hopefully https://github.com/neovim/neovim/pull/1817#issuecomment-72335766 is explaning the basic API, as well as the functional test.

bfredl on 3 Oct 2015

@bfredl - certainly will add it to the list. The optimisations I've described above will, however, probably get us 90% of the way there.

myitcv on 3 Oct 2015

True, the point of #1817 is to make it _simpler_ for highlighting plugins to give a smooth and performant experience, especially when 1) switching between buffers in a window and 2) deleting or inserting lines in the beginning (or before) or middle of the current viewport (which would make matchaddpos items misaligned immediately)

bfredl on 3 Oct 2015

@bfredl

but matches tied to a specific buffer and not the current window.

This shows how little I have tested my prototype! Sounds eminently sensible to be tying these highlights to buffers.

One thought on the API (apologies if I have missed discussion elsewhere):

I think the API should be a byte-offset based approach. Parsers will in all likelihood work with byte offsets under the covers, as does neovim (unless I am mistaken). And so this will be the most efficient means of generating and using syntax highlighting commands. At the moment I'm translating between byte offsets and line/col references, but only in order to call matchaddpos.

It should respond to removing/inserting lines before, and deleting highlighting lines (though not undoing deletes)

Can you explain this a bit more? Surely we can't make any guarantees about a highlight still being valid when a line gets inserted/deleted? Only the parser can know that.

myitcv on 4 Oct 2015

AFAIK neovim mostly works with line/col numbers ( and there is lots of "stuff that happens" when lines are renumbered: marks, signs, folds, jump items etc are updated, but less so when shifting chars within a line..)
But won't most parsers emit line/col pairs as well, I mean for human-readable syntax errors anyway? (as a data point clang cindex API has full support for both). In the long run it might be good to gain a byte-based buffer API but I think it's out of scope for #1817 .

Surely we can't make any guarantees about a highlight still being valid when a line gets inserted/deleted? Only the parser can know that.

True, the question is what happens the split second before the parser had time to rerun. Instead of immediately clearing all highlight below, just letting the old highlights follow along will probably give a smoother transition. (consider the case where a blank line is deleted or inserted, the user won't even notice that an update was needed)

bfredl on 4 Oct 2015

AFAIK neovim mostly works with line/col numbers

I am mistaken therefore :+1:

But won't most parsers emit line/col pairs as well

Yes they will. But I think the minimum representation needed in a parser is the byte offset; the line, col etc is always derivable. Perhaps I was trying to prematurely optimise....

Either way, the goal behind my comment was to have the most efficient implementation for what is a very sensitive part of the code base, especially when it comes to large files (as @justinmk alluded to earlier). If however neovim currently works by line/col references, then this efficiency goal is somewhat subsumed by the complexity of the changes that would be required for v0.1.

So practically speaking, if neovim works with line/col references then it probably makes sense for v0.1 of the API you're proposing to also be line/col based, I agree.

myitcv on 4 Oct 2015

@justinmk - incidentally, for me to start using winsaveview() I need #1250 to have landed (just commented there because I've seen the milestone slip to 0.2)

myitcv on 4 Oct 2015

How is #1250 related, ain't winsaveview() just a dict of numerical values ?

bfredl on 4 Oct 2015

It's a dict yes; and dicts have string keys (can they be anything else, not sure). But right now the string keys are encoded as MSGPACK BIN.

To be fair in the meantime I could use multiple calls to eval winsaveview()['leftcol'] etc..

But as I mention in #1250, the proposal we have agreed upon is a breaking change for the writers of plugin hosts so should be in 0.1

myitcv on 4 Oct 2015

can't you just assume BIN keys are valid ASCII/UTF-8 in the meanwhile? or maybe is limitation with the msgpack library you're using? Anyway #1250 in 0.1 would be good, but is it really breaking? Mean a plugin host could just hande both incoming BIN and STR correctly already, and would not be affected when neovim switches to STR per defaullt.

bfredl on 4 Oct 2015

I wouldn't describe it as a limitation :)

There would be relatively significant changes needed to the library, yes. And the same would be the case for all other typed language MSGPACK libraries. So my point is really that this can be more easily fixed at source.

And yes it is breaking because the encoding of the API String is part of the API definition.

myitcv on 4 Oct 2015

And the same would be the case for all other typed language MSGPACK libraries.

Not true. You could just allow a static type to match both STR and BIN, as the haskell msgpack library allows for instance (by the same mechanism, say, it matches a msgpack int when a Float is expected), and nvim-hs does, if I understand it correctly. But sorry for the nitpick, I agree #1250 sooner than later would make life easier for new hosts :)

bfredl on 4 Oct 2015

@bfredl - sorry, yes. That was a rather broad sweeping statement that is easily proved wrong :+1:

myitcv on 4 Oct 2015

Vanilla vim works quite well with https://github.com/jeaye/color_coded if we exclude the lack of an asynchronous API. I'm certainly interested in supporting neovim in color_coded and I'd welcome any help.

jeaye on 11 Oct 2015

@jeaye I used color_coded a while back to test drive #1817 though it is presently only a ugly hack that doesn't integrate well with the rest of the codebase and uses the python-client/cffi as a "bridge" to communicate with neovim. What I indented to do was to then use msgpack.cpp to directly communicate with neovim without any bridge, but unfortunately I haven't had time to work on it yet. FWIW my fork is here

bfredl on 11 Oct 2015

@bfredl That's very exciting to see! color_coded has changed a lot since then, unfortunately, but it's excellent that you were able to test drive it. I'm still very new to neovim and its asynchronous API, but neovim support has been on my mind for a while.

A big issue with the marriage of vim and libclang, right now, is that every libclang plugin wrangles the AST manually. This means that, if I have 5 plugins using libclang to do _something_, my code will be compiled at least 5 times. I've spoken with @oblitum, a YCM collaborator, about this. Before much work goes into bringing more libclang-based plugins (like color_coded) to neovim, I think this is something that should be thoroughly considered.

jeaye on 11 Oct 2015

FYI: http://thread.gmane.org/gmane.comp.compilers.clang.devel/21780

purpleKarrot on 5 Nov 2015

@purpleKarrot Look(ed) promising but the mails are from 2012, was any substantial code produced from that ? https://github.com/crange/crange is from ~2014 but it's not the same (not a server but a static db apparently)

teto on 5 Nov 2015

@jeaye maybe it would help to define an api that is language agnostic and includes most usecases:

type analysis semantic coloring and maybe context sensitive doc display
positional completion
error and warning information (short and long messages, positions and related positions for warnings and errors)
other usecases?

Then plugins using this api could be language agnostic and use backends for any language.

torpak on 18 Nov 2015

tree-sitter (see also video) looks like a viable option.

C lib with minimal dependencies
more momentum (and more grammars) than scintillua just by association with Atom editor
advanced features like incremental parsing and error recovery (not sure if scintillua has those)

justinmk on 23 Dec 2017

👍27

@justinmk Looks very promising. It looks like christmas came early.

stellarhoof on 23 Dec 2017

👍2

Discussion above is mostly outdated.
Closing this as duplicate of https://github.com/neovim/neovim/issues/1767.

justinmk on 23 Mar 2018

We need this badly, f.e. here's why: https://github.com/posva/vim-vue/issues/95

trusktr on 18 Apr 2018

👍7

Neovim: Semantic Highlighting, Folding

Most helpful comment

All 57 comments

Related issues