Vimtex: Startup speed improvements by disabling optional features?

Created on 23 Mar 2020  路  16Comments  路  Source: lervag/vimtex

Love vimtex! Thanks for building/maintaining it!

I have noticed that vimtex is quite slow on startup ~2 seconds. When profiling, I see that most of it comes from calls to vimtex#parser#general#parse() (perhaps due to calls to gather_sources?) which for my project results in some 23k calls to a seemingly expensive read function (section of the profiler output pasted below, entire log here:
vim.log)

I'm wondering if there is a way for me to turn off some optional features to reduce this load on startup time. Is it searching for and parsing a whole bunch of files I don't need for most work? (such as aux, out, and log files?)

FUNCTION  433()
    Defined: ~/.vim/bundle/vimtex/autoload/vimtex/parser/general.vim line 41
Called 33 times
Total time:   1.990302
 Self time:   1.977864

count  total (s)   self (s)
   33              0.000645   if !filereadable(a:file) || index(self.prev_parsed, a:file) >= 0
    1              0.000001     return []
   32              0.000025   endif
   32              0.000107   call add(self.prev_parsed, a:file)

   32              0.000070   let l:parsed = []
   32              0.000050   let l:lnum = 0
23165              0.045789   for l:line in readfile(a:file)
23134              0.039972     let l:lnum += 1

23134              0.031205     if self.finished
                                  break
23134              0.012942     endif

23134              0.068357     if has_key(self, 're_stop') && l:line =~# self.re_stop
    1              0.000003       let self.finished = 1
    1              0.000001       break
23133              0.012778     endif

23133              0.025432     if self.detailed
21946              0.083893       call add(l:parsed, [a:file, l:lnum, l:line])
 1187              0.000898     else
 1187              0.003237       call add(l:parsed, l:line)
23133              0.017361     endif

23133              0.292964     if l:line =~# self.input_re
   27   0.006763   0.000405       let l:file = self.input_parser(l:line, a:file, self.input_re)
   27              0.000191       call extend(l:parsed, self.parse(l:file))
   27              0.000040       continue
23106              0.013738     endif
23138              0.021502   endfor

   32              0.000055   return l:parsed
bug

Most helpful comment

Thanks! I've already looked into things like pdflatex -recorder and had the same conclusion.

I think caching might be a good idea, but it does require some work due to the instructured nature of LaTeX source files. That is: The main file may include a single files multiple times. Also, the output of the parser should be a list of [file, line number, line] that is correctly ordered. To apply proper caching, I would need to create a more complex data structure. Not impossible, but also not trivial.

Using .fls files is probably simpler. I'll continue to look into this, in any case. I think perhaps caching may be a good idea, because it would improve more things. But: I don't want to cache the latex source to file, so it would not be persistent (e.g. security concerns, etc). Thus it would not really help startup.

I think a good way forward is:

  1. Use .fls file if it is available, and
  2. Add nonpersistent caching for the TeX parser to avoid repeated parsing of the document.

All 16 comments

Love vimtex! Thanks for building/maintaining it!

Thanks! <3

I have noticed that vimtex is quite slow on startup ~2 seconds. When profiling, I see that most of it comes from calls to vimtex#parser#general#parse() (perhaps due to calls to gather_sources?) which for my project results in some 23k calls to a seemingly expensive read function (section of the profiler output pasted below, entire log here:

I try to avoid this, sorry. It is fast on small to medium documents, I think. And for me, it is also quite fast on large documents. So, to investigate the slowness in your case means I need to reproduce it. The part you've detected is used to parse the tex file to collect a list of "input" LaTeX files.

I'm wondering if there is a way for me to turn off some optional features to reduce this load on startup time. Is it searching for and parsing a whole bunch of files I don't need for most work? (such as aux, out, and log files?)

Yes, but nothing that I think is relevant for this particular case.


Would you be willing to share your manuscript with me? If you did, I would be able to investigate this directly. Still, I understand if you don't want to share, and a good alternative is therefore for you to make a new project where you can reproduce this slowness that you can share.

It's the source for a book I'm publishing for profit, so unfortunately I cannot share the source as is.

I will see if I can reproduce it, but in the mean time could you describe a bit about what sorts of files the plugin is looking for? I have about 315 files total in my repository (including about 100 MiB of high-res images), but only 30 are tex files (each with associated aux, and with a menagerie of fls/out/toc/fdb_latexmk etc files). In total there might be 100 tex-related files, so I don't understand in theory how it can be calling parse 23k times. Maybe there's a log statement I could insert into the vimscript to tell me which file it's loading? (I am not an expert at vimscript so you'd have to give me a hand)

It's the source for a book I'm publishing for profit, so unfortunately I cannot share the source as is.

I see, and no problem.

I will see if I can reproduce it, but in the mean time could you describe a bit about what sorts of files the plugin is looking for? I have about 315 files total in my repository (including about 100 MiB of high-res images), but only 30 are tex files (each with associated aux, and with a menagerie of fls/out/toc/fdb_latexmk etc files). In total there might be 100 tex-related files, so I don't understand in theory how it can be calling parse 23k times. Maybe there's a log statement I could insert into the vimscript to tell me which file it's loading? (I am not an expert at vimscript so you'd have to give me a hand)

First, nothing is parsed 23k times. The number 23k is rather something close to the number of lines that are parsed. The parser works by recursively iterating through the main manuscript files.

Is the startup equally slow when your project is "clean" (i.e. after removing all compiled and auxilliary files)? I believe it should be.

One way to reproduce this might be to find an equally large book in the public domain. I downloaded the PGF and TikZ manual source code, and with this I can reproduce O(1) second startup. I'll look into it!

@lervag From my admittedly anecdotal experience (which mirrors that of @jkun), the performance of vimtex is related not so much directly to lines of code as to the number of labels. (My personal "stress test" has about 2400 newlabels in the combined aux files.) To reproduce this, it might be sufficient to programmatically spam empty (but labelled) sections, theorems, and equations?

I've pushed an update that I think should improve things at least somewhat. Can you please test?

On my tests, things are improved from about 4.7 seconds load time to 3.7 seconds. Not quite enough, but at least noticable.

Yes, that's pretty much my observation, too (roughly same numbers). Every bit helps :) (And 25% _is_ significant!)

Thanks! Would you by any chance know a simple way to parse a LaTeX project for the list of relevant source files? This is what is creating the delay here, and if I found a faster way of doing it it could solve this issue.

Hmm, not in VimL (it may be possible to parallelize using vimgrep&, but this may be neovim only?)

An option would be to shell this out to external tools, such as https://tex.stackexchange.com/questions/24542/~~ (no, that's even slower as it actually processes the source)

(For me it would be fine if it is fast only for a compiled project -- with .fls files etc. lying around -- since that's the default, compilation being similarly slow... Or you could use the caching mechanism? If I remember right, the list of source files is already cached for completion?)

Thanks! I've already looked into things like pdflatex -recorder and had the same conclusion.

I think caching might be a good idea, but it does require some work due to the instructured nature of LaTeX source files. That is: The main file may include a single files multiple times. Also, the output of the parser should be a list of [file, line number, line] that is correctly ordered. To apply proper caching, I would need to create a more complex data structure. Not impossible, but also not trivial.

Using .fls files is probably simpler. I'll continue to look into this, in any case. I think perhaps caching may be a good idea, because it would improve more things. But: I don't want to cache the latex source to file, so it would not be persistent (e.g. security concerns, etc). Thus it would not really help startup.

I think a good way forward is:

  1. Use .fls file if it is available, and
  2. Add nonpersistent caching for the TeX parser to avoid repeated parsing of the document.

Sounds like a good idea (especially point 1); nonpersistent caching could be helpful for the table of contents functionality. Otherwise, it's obvious that parsing can never be O(1) (in the size of the document), so big projects _will_ be slow(er than typical articles).

Although I wonder if implementing the parser (at least the hot spots) in Lua would lead to significantly better performance :P

In any case, this was already a noticeable improvement, so thanks!

I've added nonpersistent caching now. A lot of work and clean up, but I think it is quite good. However, first: it does not really solve this issue, and the speedup is only about 30-40%, because there is still a lot of work necessary even with caching.

Looking into the .fls route as well now to improve the start up.

Amazing! 鉂わ笍

New structure gave some immediate benefits: Now the initial load time should be even better. The first tests clocked in at 4.7 seconds, now I'm down to about 2 seconds. The project has 130 tex files and about 100k lines of latex code. I'm satisfied with that and I think it should be very noticable.

Ok, that's seriously impressive. I get the same numbers for my example (<2s), and that's now in the range where I wouldn't really notice the delay. 馃

(And even the TOC is near instant on subsequent calls, where it used to take ~7s every time 馃帀 )

Great, happy to hear it!

Regarding .fls parsing: I think it might be somewhat more challenging. The .fls file includes a lot more details, and it seems difficult to differentiate between what is part of the "manuscript" and what is a package/class. I think you understand what I mean if you inspect any .fls file generated from any given project, really.

The syntax is of the form INPUT path-to-file, and I could filter by .tex files of course. But it still includes a lot of noise. If anyone has any good ideas, I'll be glad to hear them!

I've tried to add the .fls parsing as well, and it seems to work for a couple of simple examples. But I don't know if my "heuristics" are right. Please see #1643; I would be very happy to get feedback on this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

vanabel picture vanabel  路  6Comments

thomasahle picture thomasahle  路  4Comments

benutzer193 picture benutzer193  路  4Comments

carloabelli picture carloabelli  路  3Comments

dsevero picture dsevero  路  4Comments