nvim --version:
VIM v0.2.2
Build type: RelWithDebInfo
LuaJIT 2.0.4
Compilation: /usr/bin/x86_64-linux-gnu-gcc -g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -DDISABLE_LOG -Wdate-time -D_FORTIFY_SOURCE=2 -Wconversion -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -DNVIM_UNIBI_HAS_VAR_FROM -O2 -g -DMIN_LOG_LEVEL=3 -Og -g -Wall -Wextra -pedantic -Wno-unused-parameter -Wstrict-prototypes -std=gnu99 -Wvla -fstack-protector-strong -fdiagnostics-color=auto -DINCLUDE_GENERATED_DECLARATIONS -D_GNU_SOURCE -I/build/neovim-HJnLxJ/neovim-0.2.2/build/config -I/build/neovim-HJnLxJ/neovim-0.2.2/src -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/usr/include -I/build/neovim-HJnLxJ/neovim-0.2.2/build/src/nvim/auto -I/build/neovim-HJnLxJ/neovim-0.2.2/build/include
Compiled by [email protected]
Vim (version: ) behaves differently? Haven't checked. Issue seems to be new with 0.2.2.
$TERM: screen-256colorJust start neovim with the following config cutdown.
nvim -u nvim-leak.vim
nvim-leak.vim
call plug#begin('~/.vim/plugged')
Plug 'vim-airline/vim-airline'
Plug 'enricobacis/vim-airline-clock'
Plug 'airblade/vim-gitgutter'
set updatetime=250
let g:gitgutter_realtime = 1
let g:gitguter_eager = 1
call plug#end()
Leaks memory. About 1MB a second.
If either vim-airline-clock or vim-gitgutter is removed, memory is stable. With both together, memory increases uncontrollably.
Not leaking memory
Are you sure this isn't a gitgutter bug? Timers need to be stopped manually if they are repeating.
If it's not reproducible in some older version of Nvim, bisecting to the problem commit will help a lot.
gitgutter by itself works ok. vim-airline-clock by itself works ok. Together something happens.
I had this setup working well in the past. Stepped away for a few months, updated to neovim 0.2.2 and :PlugUpdate to grab the latest plugins, that's when I noticed this issue.
vim-airline-clock hasn't been updated in a year. But both neovim and gitgutter have been, since Nov 2017. The combination of plugins causing a problem make me lean towards a change in neovim. But I don't know enough about how plugins interoperate to be sure.
Ah, I thought this sounded familiar. This was reported before (in chat ,I think), regarding the same plugin "vim airline clock". https://github.com/enricobacis/vim-airline-clock/issues/3
Does this bring the memory back down?
:call timer_stopall()
:call garbagecollect()
:call timer_stopall()
immediately brings the memory back down. With no further increase.
That suggests the leak is in the plugin. Try adding some logging statements in the plugin or vim-airline itself, to see if timer_start() is being called many times. Or maybe its callback is doing something weird.
But the problem only happens if both of these plugins are active at the same time. They behave normally when they're the only ones active.
Won't both plugins claim to be working properly when they work in isolation?
Without steps to reproduce anything is possible.
It's easily reproducible.
nvim -u nvim-leak.vim
10,000 lines of plugin code doesn't point to a case that can be isolated as a Nvim bug. If timer_stopall frees the memory, there doesn't seem to be a low-level leak. Maybe Nvim isn't stopping timers (after they timeout) for some reason, but so far there's no evidence of that. Only evidence that some big plugins are creating lots of timers.
So someone needs to add logging to these plugins to see how often they create timers, and how often they stop them.
It would appear that vim-airline-clock does its timer logic here: https://github.com/enricobacis/vim-airline-clock/blob/95d452ec07b12d884b508ba8be686c0f913df6ac/autoload/airline/extensions/clock.vim#L36
It creates a single timer that calls a function every tick, forever. vim-gitgutter doesn't have "timer_start" anywhere in the codebase. Yet the two don't play nice together, for some reason.
There are plenty of other plugins that also use timer_start that appear to behave normally. I just didn't include them in the cutdown.
Could this be an interaction between:
Seems like these 3 put together (across all plugins) cause problems.
What is the call to kill a single timer? I don't think this is an issue with an explosion of timers.
Rather it appears that one plugin creates a global {'repeat':-1} timer that calls a function in another plugin that holds references to another plugin. Each tick of the global holds more and more through this chain of events.
Does a global {'repeat':-1} timer clean up resources generated in a prior tick? Killing all timers allows for a GC. How do I kill just 1 timer to prove this theory?
Rather it appears that one plugin creates a global {'repeat':-1} timer that calls a function in another plugin that holds references to another plugin.
Something like that.
How do I kill just 1 timer to prove this theory?
Get the timer id. You need to modify the plugin code.
Does a global {'repeat':-1} timer clean up resources generated in a prior tick?
If the timer callback is a closure (or partial in general) it could unexpectedly hold on to resources (that the closure could use in principle, even if it actually doesn't) . Otherwise it shouldn't.
It could be an idea to test with g:gitgutter_realtime set to zero, as CursorHold,CursorHoldI functions as a very limited timer of sorts.
Does a global {'repeat':-1} timer clean up resources generated in a prior tick?
If the timer callback is a closure (or partial in general) it could unexpectedly hold on to resources (that the closure could use in principle, even if it actually doesn't). Otherwise it shouldn't.
Even if the closure did close over other values, that garbage should be collected when the execution context used within the timer drops out of scope. I would assume that a new execution context is used for each invocation of the closure when the {'repeat':-1} option is used. Otherwise, the timer itself could pin application memory, which is never the intent of a timer.
function! airline#extensions#clock#timerfn(timer)
call airline#update_statusline()
endfunction
let g:airline#extensions#clock#timer = timer_start(
\ g:airline#extensions#clock#updatetime,
\ 'airline#extensions#clock#timerfn',{'repeat':-1})
This seems pretty straightforward and this plugin has been unchanged for a while. But I'm not a vim or vim plugin author so my abilities here are limited. We have determined that if timers are killed memory is released. So either a timer is holding onto memory (which it shouldn't), or an extraordinary number of timers are being created (or both). Because the identical code works without an additional plugin loaded, I think it's unlikely to be a leak caused by too many timers. Rather a combination of call airline#update_statusline() generating way too much garbage and the neovim timer not letting go of it. This same ecosystem of plugins and neovim worked under neovim 0.2.0 and possibly 0.2.1.
It would be great if someone with experience on the neovim team could take a few minutes and do some of this triage for me, and either identify the issue with timers holding plugin memory, or help point the finger to a particular plugin, so I can chase that author down.
Absent that, could someone point me to documentation on the timer_start/3 call. The call to kill an individual timer. And the call that logs to somewhere.
It could be an idea to test with g:gitgutter_realtime set to zero, as CursorHold,CursorHoldI functions as a very limited timer of sorts.
g:gitgutter_enabled = 0 still has this problem. g:gitgutter_realtime = 0 also no positive impact.
Because the identical code works without an additional plugin loaded, I think it's unlikely to be a leak caused by too many timers.
Also, if the code was creating new timers it would keep creating them after the old timers were killed. But once killed no new memory growth happened. Unless a timer within gitgutter was creating new timers. That doesn't appear to be the case because gitgutter doesn't make a single start_timer call.
. I would assume that a new execution context is used for each invocation of the closure when the {'repeat':-1} option is used.
The only such "context" I could think of is the ordinary function invocation context, when the function returns it is gone. If no closure is used, which isn't in this case, the timer _should_ be perfectly equivalent to call airline#extensions#clock#timerfn(number) invoked at the regular interval (say by external RPC).
I would try both timer_stop() and timer_pause() which should be equivalent (but maybe isn't, if timer implementation indeed has the bug)
call timer_stop(g:airline#extensions#clock#timer) does indeed free the memory and stop the growth.
I don't know the proper way to call timer_pause()
call timer_pause(g:airline#extensions#clock#timer, 5000) stops the growth, but does not free the memory
That is correct (it only checks if 2nd arg is non-zero, so it works) and very interesting. So keeping the timer in the table seems to pin some resources. Beyond the non-existing closure, timer struct only holds on to libuv resources. But I would expect an uv handle leak to be in the order of kb/s per seconds and not megabytes..
Taking a harder look... I removed gitgutter from my cutdown vimrc and there does appear to be a very slow leak, even without gitgutter. I didn't even notice at first. I also removed gitgutter from my large vimrc and the growth is still there. So gitgutter itself was only one possible contributor, not the only contributor.
Leads me to assume that much more than a uv handle is actually being pinned. And that airline without any extensions is pretty light.
Trying to reproduce with the script. I don't seem to get any leak per time, however easily 0.5-1MB for a mode-switch pair i<esc>. But I also get this with _only_ airline and neither airline-clock nor gitgutter (completely commented out from the script). I guess the underlying bug is either in airline and nvim itself and the airline-clock+gitgutter combination just makes the airline update more frequently.
Time for some heap analysis...
Unfortunately that was probably only ASANs own allocations. Using instead a logging profiler (heaptrack) I can't reproduce any leak at all even with the original script.
What commands should I throw at my install? I can repro pretty easily here.
Also, what branch are you on? I'm having problems building master locally, so still on whatever the Ubuntu 16 maintainers are calling 0.2.2-1.
I finally got HEAD to build locally. Still same problem. Built with included (auto-dl'd) deps. How can I generate/submit a heap dump?
Vim (version: ) behaves differently? Haven't checked. Issue seems to be new with 0.2.2.
It should checked in Vim.
Sigh. My setup is for neovim. Plugins break horribly when I launch regular vim. There's probably a way to fix this, but I'm short on time atm. Now that I have my neovim build building, I'm going to try with prior versions 0.2.1 and 0.2.0 and see if I can get the issue to go away that way. I'm pretty sure this was introduced recently.
v0.2.0 - no problem!
v0.2.1 - leak
v0.2.2 - leak
In addition to leaking memory, there's also some moderate CPU usage (~15%) in the problem versions.
0% while idle and timers not running (all versions). If that helps anyone triage anything.
0% while timers running, no memory growth, v0.2.0
v0.2.0 - no problem!
v0.2.1 - leak
Can you bisect between these two tags to see when it was introduced?
It should checked in Vim.
In the plugin issues linked earlier, there was mention of it being seen in both editors.
Can you bisect between these two tags to see when it was introduced?
I can try. But there are a zillion commits per release. If someone could point me to particularly interesting commits, that would be nice. Otherwise, I can try my binary-search foo.
I hear there's a git-bisect command but I've never used it. Anyone have a quick primer or is Google my friend?
git bisect start v0.2.1 v0.2.0git bisect bad will mark it as a bad commit and jump to the next commit. Otherwise use git bisect good to mark it good and jump to the next commit.git bisect skip to move to another commit without marking the current one as good/bad.Here are the bisect results
```06ec15121a4da439684cfa1909d920c3f22f2894 is the first bad commit
commit 06ec15121a4da439684cfa1909d920c3f22f2894
Author: James McCoy jamessan@jamessan.com
Date: Sat Apr 8 22:50:50 2017 -0400
Bump Vim version to 8.0 and applied patches
:040000 040000 d0cac841eacf7453526514093906e63d96779af2 ef9a0c2276831237aaf6461feb4e78ee23689d61 M scripts
:040000 040000 e8f707704c364560202c31d78ec3d507549eccc1 2e6001ec9b87c7f21ddab32efc233a582c01b3d3 M src
```
Direct link: https://github.com/neovim/neovim/commit/06ec15121a4da439684cfa1909d920c3f22f2894
That is just version related stuff, though it is possible the plugin could behave differently on a version/patch check.
Plugin guys say it isn't possible for them to get worse per-tick. If they have a bug, things should be bad during a tick, then get cleaned up, then get bad again. The only way for things to get worse, per-tick of a single timer, would be a container issue.
Does neovim change logic based upon vim patch or version? Maybe the patch specific logic within the plugin exposes some issue within neovim. Worst of both worlds. Where is my :crystal_ball: ?
Does neovim change logic based upon vim patch or version? Maybe the patch specific logic within the plugin exposes some issue within neovim
Sounds like a reach. Are you sure the parent of 06ec15121a4da439684cfa1909d920c3f22f2894 doesn't have the issue?
Does neovim change logic based upon vim patch or version?
No. Either the plugin changed the behavior, or the bisect got troubled with indeterminism (we could easily imagine a codebase state where the bug _sometimes_ trigger)
Sounds like a reach. Are you sure the parent of 06ec151 doesn't have the issue?
Lemme double check that...parent is dabd9ef44bb30b403e5a13ab1b6534cef6d8d8d0
Nope...issue isn't in dabd9ef44bb30b403e5a13ab1b6534cef6d8d8d0
and is in 06ec15121a4da439684cfa1909d920c3f22f2894
Search the plugin(s) for checks like version >= 800 and has('patch-8...').
Looking...
There is switching but mostly on v:version >=800 && has("job") elseif has("nvim") and that should drop into nvim since has("job") == 0
Try reverting the changes to version.h and rebuild Nvim.
issues goes away with reverted version.h. Seems to be version >=800 test in the plugin and associated logic. Going to retest version.h hack against master.
But has('job') is false in neovim, so the v:version check shouldn't matter... Are there any <= 704 or < 800 checks?
There are additional 704 related checks. Lemme try and enumerate them.
Can confirm that if master pretends to be 704, the issue goes away.
Maybe remove the check in vim-airline-clock and redo the rebase? I suppose you could test all the way back to #4624 (https://github.com/neovim/neovim/pull/4624/commits/61e8adb25e6f37c3c9df0dd8f4fea5e23b62bacd) unless something else is missing.
I think the real issue is within the core vim-airline (and interaction with nvim) and not the clock, but I'll try that too.
Maybe remove the check in vim-airline-clock
Ooh, didn't notice that. That makes sense since it doesn't setup the timer if v:version < 800.
There's a bunch of version based logic in vim-airline. I'm trying to find stuff that would be added/subtracted between 704 and 800.
I would first start with changing vim-airline-clock. That's most likely to help.
Ugh. Wait. The problem could be in prior versions too, we just will never start our problem timer if version < 800. We've already established that we don't have a problem without this timer.
Yea...cross post.
Hacked up the clock. The problem is present in neovim all the way back to at least 0.1.7 (I had trouble building versions earlier than v0.1.7).
Seems there's been a problem for a while, only highlighted recently due to the clock exposing the problem based upon version.
Now the "problem" could also be airline doing something that leaks if the timer pins memory across ticks. I couldn't follow the airline logic. But there did appear to be events and other things being pushed to common queue/stack/whatever, possibly with references to files and io buffers. I suppose it's possible that if neovim doesn't use a fresh thread for each invocation of the tick that the plugin is unintentionally generating things that are never cleaned up, since this wasn't an execution path the author originally intended.
But that's a wild guess. The airline guys argue that it's still neovim pinning within their timer in this case, no matter what they generate and fail to clean up within a single tick. So they might have a problem, but it wouldn't be a problem if the timer allowed the memory to be freed.
Sigh.
The airline guys argue that it's still neovim pinning within their timer in this case
Even though there was talk in one of the issues that it could reproduce in Vim?
The airline guys argue that it's still neovim pinning within their timer
Please stop spreading FUD. To be clear, airline does not use any timer. Any problem that comes up from using a timer in either Vim or Neovim must be solved by vim/neovim itsself. In any case memory shouldn't leak and this is not a thing that can be fixed by airline.
That's what I said. Their == neovim. Maybe it could have been "its timer". Sorry for the confusion.
The airline guys argue that it's still neovim pinning within their timer
Again, neovim timer uses no "threads" or any sort of context that is supposed to be able to own or "pin" memory. Timer invocations are ordinary vimL function calls if closure is not used.
I don't understand. Something associated with whatever represents an individual timer is preventing memory allocated as a result of that timer's execution from being reclaimed. Maybe there isn't an intentional timer context, but there appears to be an unintentional one.
A plugin-level hack could be to stop the currently running timer every time the timer ticks and then reregister itself under a new ID as part of the tick logic, decrementing the iteration count as it goes.
That might free this non-existent context, as well as prove it somehow exists. I'll try and hack the clock and see how that goes.
Here we go...
function! airline#extensions#clock#timerfn(timer)
call airline#update_statusline()
endfunction
let g:airline#extensions#clock#timer = timer_start(
\ 1000,
\ 'airline#extensions#clock#timerfn',{'repeat':-1})
" let g:airline#extensions#clock#timer = timer_start(
" \ g:airline#extensions#clock#updatetime,
" \ 'airline#extensions#clock#timerfn',{'repeat':-1})
The variable updatetime was 250. If updatetime is 250 I see the leak. If it's 1000 I don't. What happens within the timing system if the function doesn't complete before the next tick should have been scheduled?
https://github.com/neovim/neovim/issues/6829 gives an overview of the differences. Sounds like there may be some more detail in 2b. that needs to be looked at.
It is possible that a timer callback gets "chained" immediately after the previous one like in that issue, but I wouldn't expect that with 250ms airline timer, and certainly not consistently.
However this code in os_inchar looks a bit suspicious
if ((result = inbuf_poll((int)p_ut)) == kInputNone) {
if (trigger_cursorhold() && !typebuf_changed(tb_change_cnt)) {
create_cursorhold_event();
} else {
before_blocking();
result = inbuf_poll(-1);
}
}
Here before_blocking() invokes GC. It is not the _only_ entry point for GC, but it suggests that under _some_ conditions, having events more often than 'updatetime' inhibits GC. Probably GC should be invoked already before the first inbuf_poll, possibly taking the time first and later subtracting it from p_ut (or not bother, as timing is not exact anyway)
Would be a good time to refactor CursorHold to use a libuv timer instead of the os_inchar special-case.
Ah, I thought this sounded familiar. This was reported before (in chat ,I think), regarding the same plugin "vim airline clock".
Right, it was me who reported on IRC, I still have the screenshots from that time. There was no gitgutter involved, I got it with plain vim-airline and vim-airline-clock alone.
It takes many hours to leak 16GB, then OOM killer kicks in and it's nightmare.
Related: https://github.com/vim/vim/pull/3961
Can someone port that patch to Neovim?
vim-airline@53aa65 seems to fix issue (or slow it way down) for above nvim-leak.vim. I had gitgutter@1e60667 and airline-clock@a752ae8, then bisected to that commit in vim-airline.
Looks like the change eliminates that 'opts' local. Would be interested to hear how that plays into the memory leak. The memory isn't reclaimed once variable goes out of scope?
Assume merge of 8.1.1007 still needs to be addressed.
Here's a minimal for re-produce, derived from the test cases in vim-patch 8.1.1007. For me, sourcing the snippet below would cause memory to jump ~20 MB each time.
func! s:f(...)
" All these cause issues
let x = a:000
" let x = a:
" let x = l:
" Maybe some more...?
" This one doesn't
" let x = g:
endfunc
for i in range(10000)
call s:f(0)
endfor
I attempted porting that patch, as the tests use vim job-api. But, they didn't expose the bug when I finished. So, I must not have done that great of job of it. But I think it might just be easier to add the coverage in lua, the below snippet did work for me.
local getmem = function()
local pid = eval('getpid()')
local handle = io.popen('ps -o rss= '..pid)
return handle:read('*a')
end
start_mem = getmem()
source([[
func! s:f(...)
let x = a:000
endfunc
for i in range(10000)
call s:f(0)
endfor
]])
memory_increase = getmem() - start_mem
assert.is_true(memory_increase < 1000)
I did a survey of existing leak tests in the suite, and it wasn't quite clear to me the strategy they were using to detect leaks. I assume relying on valgrind? For example, this one:
test/functional/api/window_spec.lua|75| 18: it('does not leak memory when using invalid window ID with invalid pos',
So I was wondering if using ps (and some other windows equivalent) was what would be desired. Or if we can rely on valgrind to complain or use some other strategy to reliably cover the bug.