What Renovate type are you using?
Launched with:
RENOVATE_AUTODISCOVER="true" \
RENOVATE_PLATFORM=gitlab \
RENOVATE_ENDPOINT="https://**edited**/api/v4/" \
RENOVATE_TOKEN="**edited**" \
renovate --log-level=debug
Describe the bug
We are observing renovate bot failing to process all auto-discovered projects in Gitlab when more than 100 projects are accessible. We suspect the pagination code is not properly iterating the pages.
Edited original report: the issue is that renovate exits with 0 status and fails to process further repositories on a specific repository with nuget dependencies.
The new debug messages available in v20.2.3 helped pinpoint that auto-discovery was working properly, but still not all repositories were processed.
Did you see anything helpful in debug logs?
See logs below in https://github.com/renovatebot/renovate/issues/6299#issuecomment-633124141
Original incorrect report (to keep issue history)
We have around 160 projects in which renovate is added as a member. The debug log shows all of them being auto-discovered:
DEBUG: Autodiscovering GitLab repositories
DEBUG: Discovered 165 project(s)
DEBUG: Discovered 165 repositories
But then after renovate finishes, we don't get updated MRs in all projects. Grepping for Repository started in the log file shows 65 entries, which is fishy taking into account that the renovate code will query in chunks of 100 elements:
const url = `projects?membership=true&per_page=100&with_merge_requests_enabled=true`;
This grep is consistent: adding an extra project to renovate bot, will then show a total of 166 projects, but only 66 Repository started entries.
We also made raw curl REST queries to gitlab to rule out a server issue, but the queries seem to work properly and pagination headers look ok:
curl -v --request GET --header "PRIVATE-TOKEN: **edited**" "https://**edited**/api/v4/projects?membership=true&per_page=100&with_merge_requests_enabled=true"
...
X-Next-Page: 2
X-Page: 1
X-Per-Page: 100
X-Prev-Page:
X-Total: 165
X-Total-Pages: 2
and for the second page:
curl -v --request GET --header "PRIVATE-TOKEN: **edited**" "https://**edited**/api/v4/projects?membership=true&per_page=100&with_merge_requests_enabled=true&page=2"
...
X-Next-Page:
X-Page: 2
X-Per-Page: 100
X-Prev-Page: 1
X-Total: 165
X-Total-Pages: 2
This can hopefully be reproduced easily by reducing the per_page from 100 to something low and then testing. i.e. it's hopefully not necessary to have 100+ projects in order to reproduce.
Cc @zharinov as I think you are refactoring GitLab API code right now
I'll take a look
I tried to reproduce it in debugger and pagination looks okay. Can this be somehow related to prCommitsPerRunLimit or other Renovate limits?
@rarkins @zharinov Thanks for the prompt response!
A couple of related comments:
Repository started in the logs to identify the processed repos correct? I'm definitely available for further debuggingmin_access_level to further limit the size of queries. Reporter seems to be the lowest user type that is able to create MRs.Is our approach of grepping Repository started in the logs to identify the processed repos correct? I'm definitely available for further debugging
I guess it's correct. Now I'm about to create >100 gitlab projects via CLI and debug it further with more realistic example.
Can this be somehow related to prCommitsPerRunLimit or other Renovate limits?
@zharinov We are not setting any limits and the default seems to be disabled, but in any case I've just made a run with it explicitly deactivated, same result:
DEBUG: Combined config
"config": {
"hostRules": [],
"dryRun": true,
"platform": "gitlab",
"endpoint": "https://**edited**/api/v4/",
"token": "***********",
"autodiscover": true,
"prCommitsPerRunLimit": 0,
"logLevel": "debug"
}
DEBUG: Adding trailing slash to endpoint
DEBUG: Using platform gitAuthor: Code Renovate Bot <code-renovate@localhost>
DEBUG: Using baseDir: /tmp/renovate
DEBUG: Using cacheDir: /tmp/renovate/cache
DEBUG: Initializing Renovate internal cache into /tmp/renovate/cache/renovate/renovate-cache-v1
DEBUG: Autodiscovering GitLab repositories
DEBUG: Discovered 165 project(s)
DEBUG: Discovered 165 repositories
DEBUG: Limits.init enter method
DEBUG: Limits.init prCommitsPerRunLimit processing
DEBUG: Limits.init prCommitsPerRunLimit variable is not set. Ignoring prCommitsPerRunLimit
we might want to use as extra filter
min_access_levelto further limit the size of queries.Reporterseems to be the lowest user type that is able to create MRs.
Renovate needs to be a Developer level in order to create branches.
- As you're mentioning improvements to the Gitlab API code (I'm assuming this is #6275, great job @zharinov), I'd like to point out that the offsets pagination is limited to 50k entries from next 13.0 onwards, and that keyset pagination will be the only option in that case, the docs also mention it should be more efficient. Not that I think we'll reach that limit soon, we actually have more projects than that limit but I don't expect all users activating the bot.
Please raise a separate issue to discuss that if you anticipate needing it. I suspect that you'll probably hit other usability or scalability problems before you hit 50k active repositories :)
Please raise a separate issue to discuss that if you anticipate needing it. I suspect that you'll probably hit other usability or scalability problems before you hit 50k active repositories :)
We are already over that repository limit significantly 😅 But renovate-bot onboarding is opt-in, so I don't think the bot will reach that checkup on our end soon.
But using the keyset pagination is probably a better future-proof option anyway, it's the recommended approach in 13.0+ and it's also mentioned it should be more efficient, since it doesn't depend on the number of entries.
I've done a code review and can't see anything wrong. This is one of logged lines you mentioned: https://github.com/renovatebot/renovate/blob/53ca91301f6a36c0c87e7d6b06b1bd8c9ec1b283/lib/workers/global/autodiscover.ts#L40
The results of that pretty much go direct into a simple for loop here: https://github.com/renovatebot/renovate/blob/53ca91301f6a36c0c87e7d6b06b1bd8c9ec1b283/lib/workers/global/index.ts#L71
You may need to do some debugging in your local copy to confirm e.g.
we might want to use as extra filter
min_access_levelto further limit the size of queries.Reporterseems to be the lowest user type that is able to create MRs.Renovate needs to be a
Developerlevel in order to create branches.
You are of course right, so most probably that is the appropriate query parameter value 👍
You may need to do some debugging in your local copy to confirm e.g.
- Is it the first X repositories or last X repositories?
- What logs (debug level) are printed after the Xth repository is completed?
Is it possible to enable some logging at e.g. package level that shows me the actual queries and responses obtained from gitlab at the API level? So I can see all traffic against the server.
I could do some network sniffing but it could be much more convenient to see a log file 🙇
We don't log at that level currently. You'd expect it to be pretty massive. I think what I suggested is a better first step and shouldn't be much effort.
I added a couple of things in [email protected] just now:
@rarkins Thanks for that 💯 As soon as the image is available on docker hub I'll give it a run.
@rarkins I've just tested the latest version and found out that the issue is not pagination, the 100 vs 65 was just by chance and misled me. The new debug messages help identify that all projects are auto-discovered properly.
I failed to notice that this was missing:
DEBUG: Renovate exiting successfully
The problem is that renovate is failing on a specific repository, and exiting with 0 status and no error. If I filter that repository, the run goes through all projects.
What I see in the logs just before exiting, tls problem when accesing an internal artifactory trying to access a nuget package:
DEBUG: nuget registry failure: Unknown error (repository=the/failing/repo)
"err": {
"name": "RequestError",
"code": "ECONNRESET",
"host": "**edited**",
"hostname": "**edited**",
"method": "GET",
"path": "/artifactory/api/nuget/v3/scp-stable-nuget/FindPackagesById()?id=%27Microsoft.Build.Tasks.Core%27&$select=Version,IsLatestVersion,ProjectUrl",
"protocol": "https:",
"url": "https://**edited**/artifactory/api/nuget/v3/scp-stable-nuget/FindPackagesById()?id=%27Microsoft.Build.Tasks.Core%27&$select=Version,IsLatestVersion,ProjectUrl",
"gotOptions": {
"path": "/artifactory/api/nuget/v3/scp-stable-nuget/FindPackagesById()?id=%27Microsoft.Build.Tasks.Core%27&$select=Version,IsLatestVersion,ProjectUrl",
"protocol": "https:",
"slashes": true,
"auth": null,
"host": "**edited**",
"port": null,
"hostname": "**edited**",
"hash": null,
"search": "?id=%27Microsoft.Build.Tasks.Core%27&$select=Version,IsLatestVersion,ProjectUrl",
"pathname": "/artifactory/api/nuget/v3/scp-stable-nuget/FindPackagesById()",
"href": "https://**edited**/artifactory/api/nuget/v3/scp-stable-nuget/FindPackagesById()?id=%27Microsoft.Build.Tasks.Core%27&$select=Version,IsLatestVersion,ProjectUrl",
"headers": {
"user-agent": "https://github.com/renovatebot/renovate",
"accept-encoding": "gzip, deflate"
},
"hooks": {
"beforeError": [],
"init": [],
"beforeRequest": [],
"beforeRedirect": [null],
"beforeRetry": [],
"afterResponse": []
},
"retry": {"methods": {}, "statusCodes": {}, "errorCodes": {}},
"decompress": true,
"throwHttpErrors": true,
"followRedirect": true,
"stream": false,
"form": false,
"json": false,
"cache": false,
"useElectronNet": false,
"method": "GET",
"hostType": "nuget",
"gotTimeout": {"request": 60000}
},
"message": "Client network socket disconnected before secure TLS connection was established",
"stack": "RequestError: Client network socket disconnected before secure TLS connection was established\n at ClientRequest.<anonymous> (/usr/src/app/node_modules/got/source/request-as-event-emitter.js:178:14)\n
at Object.onceWrapper (events.js:417:26)\n at ClientRequest.emit (events.js:322:22)\n at ClientRequest.EventEmitter.emit (domain.js:482:12)\n at ClientRequest.origin.emit (/usr/src/app/node_modules/@szmarczak/
http-timer/source/index.js:37:11)\n at TLSSocket.socketErrorListener (_http_client.js:426:9)\n at TLSSocket.emit (events.js:322:22)\n at TLSSocket.EventEmitter.emit (domain.js:482:12)\n at emitErrorNT (internal
/streams/destroy.js:92:8)\n at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)\n at processTicksAndRejections (internal/process/task_queues.js:84:21)"
},
"pkgName": "Microsoft.Build.Tasks.Core",
"feedUrl": "https://**edited**/artifactory/api/nuget/v3/scp-stable-nuget"
The failing url is actually a host that is not reachable due to proxy settings (something I'll fix), curl to it returns:
curl -v https://**edited**/artifactory/api/nuget/v3/scp-stable-nuget
* Trying **edited**...
* TCP_NODELAY set
* Connected to **internal-proxy** (**edited-ip**) port **edited-port** (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to **edited**:443
> CONNECT **edited**:443 HTTP/1.1
> Host: **edited**:443
> User-Agent: curl/7.58.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 502 Bad Gateway
< Server: **edited**
< Content-Type: text/html
< Connection: close
<
* Received HTTP code 502 from proxy after CONNECT
* CONNECT phase completed!
* Closing connection 0
curl: (56) Received HTTP code 502 from proxy after CONNECT
So I'm wondering if there's an issue with the proxy-handling code. I'll update the title of the issue accordingly.
Well probably need the full debug for that repo in order to get the stack trace and work out what’s going wrong. Datasource lookups shouldn’t cause exits.
@rarkins I'll check internally if we can submit this as a confidential issue 👍 Thanks again for the prompt response.
If you look in the source code, you'll see that nuget registry failure: Unknown error is a gracefully logged message that does not in itself trigger any exit or crash. The log and behavior seem correct too - it's an unknown error. Therefore it's quite possible that it's another red herring and not related to what you're ultimately trying to fix.
@rarkins You are right, I've just fixed the proxy issues and this repo still makes renovate exit on 20.2.3, no idea why, there's no errors in the logs anymore that I can see. The log before exit:
DEBUG: Found 17 package file(s) (repository=this/repo/fails)
INFO: Dependency extraction complete (repository=this/repo/fails)
"baseBranch": "master",
"stats": {
"managers": {
"gitlabci": {"fileCount": 1, "depCount": 1},
"gitlabci-include": {"fileCount": 1, "depCount": 1},
"nuget": {"fileCount": 15, "depCount": 22}
},
"total": {"fileCount": 17, "depCount": 24}
}
DEBUG: getLabels(https://mcr.microsoft.com, dotnet/core/sdk, latest) (repository=this/repo/fails)
DEBUG: getManifestResponse(https://mcr.microsoft.com, dotnet/core/sdk, latest) (repository=this/repo/fails)
It just stops there, exiting with 0. I'll send you a full trace if I get the ok internally.
I think you might want to start looking into out of memory errors for example. It looks like it might be getting killed rather than exiting/failing if there’s no log or stack trace immediately prior. If there’s any easy way to greatly dial up the resources to quickly prove or disprove it, it could be a good use of time. It perhaps run it outside GitLab CI to see if it finishes in the same location? BTW it’s only possibly the last few lines that are of interest, but I’d been assuming there would be some error message.
@rarkins If it's being killed because of memory, then it must happen at the node level, because I made the last run directly on my localhost from the npm package only checking this failing repo, I wanted to rule out any docker issues 😕 :
$ npm list renovate -g
/Users/diego/.asdf/installs/nodejs/12.10.0/.npm/lib
└── [email protected]
$ renovate --version
20.2.3
$ RENOVATE_AUTODISCOVER="true" \
RENOVATE_PLATFORM=gitlab \
RENOVATE_ENDPOINT="https://**edited**/api/v4/"\
RENOVATE_DRY_RUN=true \
RENOVATE_AUTODISCOVER_FILTER='this/repo/fails' \
RENOVATE_TOKEN="**edited**" \
renovate --log-level=debug
only checking this failing repo
Can you clarify what you mean by this? e.g. do you mean you ran Renovate against just one repo and it still failed?
BTW if you're running it locally using node, then setting NODE_OPTIONS=--max_old_space_size=4096 into your env (make sure to quote or escape it..) should hopefully increase memory allocated substantially.
Can you clarify what you mean by this? e.g. do you mean you ran Renovate against just one repo and it still failed?
Yes, if you check the command above I'm setting a filter so only this repo is loaded. The logs show only this repo is being checked, still dies.
Interesting. Same result if you disable autodiscover and just list the repo in the CLI?
Auto-discovery has no effect, I can just put the repo as a parameter and the output is the same.
I've just ried to no avail (all localhost macOS, node 12.16.3):
NODE_OPTIONS="--max_old_space_size=4096"--trace-uncaught --trace-exit --trace-warningsIs the last line always DEBUG: getManifestResponse(https://mcr.microsoft.com, dotnet/core/sdk, latest) (repository=this/repo/fails) ?
This is during the "lookup" phase where most work is done in parallel, so the actual code or package causing it is possibly unrelated to the root cause.
Can you possibly fork or copy the repo and start deleting package files until the problem stops? i.e. to try to isolate if it's a specific package. It doesn't seem like there's too many.
It's very strange that you get no exit message or trace of any sort!
@rarkins Yes, it's really weird. I still haven't checked the source code, but I can give you something to narrow this down a bit in case it rings a bell, thanks to an internal tip: version 19.236.4 does work properly. So this is something introduced recently.
Here's the diff: https://github.com/renovatebot/renovate/compare/19.236.4...19.239.11
Unfortunately a bit too much to manually code inspect. If you can narrow it down or even work out the exact version it breaks on, then that would be extremely helpful.
Did a bit of binary search and identified the version where the issue was introduced: 19.239.3
This must be the change (it is compatible, since the failing project is dotnet): https://github.com/renovatebot/renovate/pull/6242
Incidentally, while testing the different versions I have noticed that tagged versions are missing from docker hub, something looks not ok with the docker CI pipeline:
https://hub.docker.com/r/renovate/renovate/tags?page=1&name=19.239
I had to test with 19.239.2-slim because the default version is missing. A quick look shows as missing at least:
19.237.019.239.219.239.419.239.10Also some slim versions tags were updated a couple of hours ago according to docker hub :confused:
Yes, we had some issues with the images, so we rebuild them. But for your issue the slim image should work. We also rebuild the last x docket images when the base image or nodejs or yarn ... Changed
We also rebuild the last x docket images when the base image or nodejs or yarn ... Changed
I forgot what you state in https://renovate.whitesourcesoftware.com/blog/overcoming-dockers-mutable-image-tags/. Does that mean that you only mutate the -slim versions to reflect upstream changes in the base image?
Sorry I don't want to start a parallel discussion in this issue, maybe I should open a new one.
Thanks @dlouzan, amazingly quick as always! For us it also seems like Renovate aborts while processing .NET repos.
@dlouzan all images should retain the renovate version specified, but we had a timeframe where multiple images where overwritten with the newest renovate version.
Ok, since the issue seems to be related to dotnet / nuget, we should be able to reproduce with vanilla renovate cli.
Can you create a sample repo to reproduce?
I'll need some time for that; I'm definitely not a windows guy and the project is internal, so I can't share a reference.
@fgreinacher Are you able to support with this quicker? My only approach right now is to go through the project I have listed in our shared Siemens issue and start removing packages until I find the one causing the crash. Maybe you have a smaller project that can be shared?
@fgreinacher Are you able to support with this quicker? My only approach right now is to go through the project I have listed in our shared Siemens issue and start removing packages until I find the one causing the crash. Maybe you have a smaller project that can be shared?
Sure, I'll have a look tomorrow!
I've refactored the code that was changed in #6242 to be simpler and more defensive, however I don't see what was obviously wrong with the original changes. Hopefully the pending release will resolve things for you - please let me know.
@rarkins I've just checked and unfortunately the change did not solve the problem, so I think you can reopen :-( This will need some deeper analysis, hopefully @fgreinacher can provide a small repo project that reproduces the problem
It's possible then that the results of this fetch are triggering the problem later, and not the fetch itself.
After looking a bit at the repos where I saw it failing I noticed that all of them have a NuGet.config pointing to local directories (in addition to remote repos):
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="some-feed" value="https://some-feed />
<add key="local-packages" value="packages" />
</packageSources>
</configuration>
My current theory is the following:
I'm continuing investigation!
Small update: I'm now able to reproduce it locally, but only when using our corporate proxy :O
Latest observations:
packages/FindPackagesById()?id=%27Microsoft.Build.Tasks.Core%27&$select=Version,IsLatestVersion,ProjectUrl (where packages is the value from the config)node inspect without success. My current assumption is that somewhere in Renovate or its dependencies some low-level node request hook taking care of the proxy protocol is crashing in a very bad way. I also set some breakpoints in global-agent but did not notice anything obvious.
@fgreinacher Great job Florian! What I find really puzzling is that a crash that terminates the node process does not show a stack trace, or exits with something different than zero
If we add "test if it's a valid URL" filtering at the datasource/nuget getReleases stage, should that stop it? I was thinking we should add it at the datasource/index (so it's common for all datasources) although I'm not 100% sure yet if we do in fact have valid URLs in all scenarios for all datasources.
Yes, that should fix the issue 👍 I would still like to understand why it breaks that badly 🤔 Enojoying the sun a bit now, will continue tonight or tomorrow.
We use @gajus's global-agent for proxying. I believe it patches Node's http's routines to achieve its aim so possibly the crash is such that Node exits without a stack trace.
:tada: This issue has been resolved in version 20.8.3 :tada:
The release is available on:
20.8.3Your semantic-release bot :package::rocket:
@rarkins @viceice The "full" Docker image is not available yet. Is it normal that it takes that long?
Yes, these days it can be:
Here's the full image PR: https://github.com/renovatebot/docker-renovate-full/commit/457d09c93fd10d23a8c5b918bdd882b8c6af9c8e
Wow, that's a lot of steps. Any reason to not publish the Docker images from a single repo (maybe even this one)?
Yes, we prefer to keep this repository for publishing the npm image and then other repos for building the various distributions on top. I don't think the full Renovate image will be used by too many end users soon, because it is both large and inflexible. Better to build your own, or use the slim image with mapped Docker sock
The patch fixed our issues, thanks all!
Most helpful comment
After looking a bit at the repos where I saw it failing I noticed that all of them have a NuGet.config pointing to local directories (in addition to remote repos):
My current theory is the following:
I'm continuing investigation!