Renovate: Potential Memory Leak

Created on 1 Apr 2018 · 12Comments · Source: renovatebot/renovate

This is a:

[X] Bug report (non-security related)
[ ] Feature request
[ ] I'm not sure which of those it is

I'm using:

[ ] The Renovate GitHub App
[ ] Self-hosted GitHub
[X] Self-hosted GitLab
[ ] Self-hosted VSTS

Please describe the issue:

Just flagging this while I'm still investigating. For just over a week now my on-prem instance of Renovate, using the official container, has been getting victimised by oom-killer.

It has gone from a clean end-to-end run in just under two hours to getting terminated around the 40 minute mark.

Last night just to try I doubled the available ram from 4gb to 8gb and same issue has occured so it would definitely appear there is something strange going on.

help wanted priority-3-normal bug

Source

adam-moss

All 12 comments

@adam-moss Out of interest, what's the use-case for your "two hours" run? Is that because of the number of repositories to be serviced, or partly because you are spreading them out a little (somehow?) to avoid rate limiting?

Are you able to measure RAM use on the machine and confirm that it increases approximately linearly over time?

rarkins on 1 Apr 2018

Purely the number of repos, I have autodiscover set to true so consumers can just add the bot account to their groups/repos rather than have to set it up themselves individually. As it is mostly node stuff, and a lot of the stuff is... not up to date, it takes quite some time even with a max 3pr per hour per repo setting 😂

Yep, I'll dig out the nagios charts for it and ping them over to you 👍

adam-moss on 1 Apr 2018

Hmmm, going to close this, clean machine with renovate only and memory isn't spiking, but renovate is still exiting 1 (or being exited 1) for some reason. the hunt continues 😭

adam-moss on 1 Apr 2018

😕1

Is there anything more detailed than LOG_LEVEL=debug? Absolutely nothing is showing that suggests why this thing is failing.

adam-moss on 1 Apr 2018

trace

rarkins on 1 Apr 2018

Ok, will give that a shot when the current run goes 💥

thanks 😃

adam-moss on 1 Apr 2018

Right... 🤦‍♂️ moment I think!

Does

global.logger[x] = (p1, p2) => {
    if (x === 'error') {
        global.renovateError = true;
}

mean that if the logger captures an event of type error that the process will exit 1 when it ends?

In other words Renovate has actually completed successfully but some of the updates have failed for some reason (in this case ERROR: Error updating branch: internal-error.

adam-moss on 1 Apr 2018

Yes that's right, if an error was encountered at all then we exit with a non-zero exit code at the end. Hopefully you should never see Renovate crashing out due to our use of try/catch per-repository.

rarkins on 1 Apr 2018

Haaa right, ok, that makes sense. So in CI it is actually, technically, a false positive as that exit 1 causes it to fail the job but really the job failed with warnings. Interesting. Will need to have a ponder (and also work out :wtf: that particular dev team has done to cause that particular issue).

Thanks for the assist dude, greatly appreciated.

adam-moss on 1 Apr 2018

Mind if I submit a PR to dump a log an info message akin to "Renovate completed, failures elsewhere" type of message and the end to make this easier to diagnose in future?

adam-moss on 1 Apr 2018

No problem. If you think it should be configurable (e.g. exit with 0 no matter what application errors happen) then I'm happy to accept a PR. BTW the person who requested it explicitly wanted jobs to fail on CI if they had errors in the app, so they could be alerted and fix it.

One more thing: in general, I usually only logger.error() if it's something that "should never happen". Sometimes if I decide it's not something I can fix then I lower it to warn. There are still a few cases where errors happen as I'm still deciding how to handle, e.g. if the platform API returns 401 errors for no good reason.

rarkins on 1 Apr 2018

No, I agree what you're doing is correct - it should exit 1, it is just I had the default log_level of info set, and gitlab CI automatically cuts off output after 4mb by default, so that error line was totally buried in the noise and doubly hidden, I've changed the log_level to warn now and it has popped right out, along with a whole bunch of stuff relating to isBranchStale not being implemented which I really must get worked up for you.

I think what is triggering it in my instance is https://github.com/renovateapp/renovate/blob/259312bb977c3cb87daf79cd679625aeb72b17e7/lib/workers/branch/lock-files.js#L434 but I'm going to run individually against the repo concerned to verify.

Now that that's cleared up I'm going to test that other PR for you 👍

adam-moss on 1 Apr 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings