/kind bug
/area prow
This seems to be an issue with GitHub and not with tide because we've been getting reports of this across OpenShift, Kubernetes and Jetstack tide deployments.
It's happening across multiple repos regardless of what CI or merge criteria they use.
Reasons tide gives for not merging range for "not mergeable" to "some-job failed" even when all status contexts show green.
example PR's for which this is happening:
eg: is:open is:pr repo:kubernetes/community label:approved label:"cncf-cla: yes" doesn't return https://github.com/kubernetes/community/pull/2502 which has all of those labels
I think we have this too, for service-catalog, from the collapsed "merge requirements" section which has a link for all the github searches. giant github link should catch:
another example
is:open is:pr repo:kubernetes-sigs/federation-v2 label:lgtm label:approved shows no PR'sHere's one that @krzyzacy found.
This search contains merged and closed PRs even though is:open is specified:
https://github.com/pulls?q=is%3Apr+assignee%3Akrzyzacy+archived%3Afalse+is%3Aopen
That shows that the results can include incorrect results in addition to just missing valid results. Thats a lot more concerning because that could allow Tide to consider a PR mergeable when it isn't. We haven't seen that happen yet, but if it does we should consider turning off Tide until this is resolved.
Here is the original example from the Jetstack org.
This query:
https://github.com/jetstack/cert-manager/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Aapproved+label%3Algtm
should contain both of these PRs:
https://github.com/jetstack/cert-manager/pull/807
https://github.com/jetstack/cert-manager/pull/805
That shows that the results can include incorrect results in addition to just missing valid results. Thats a lot more concerning because that could allow Tide to consider a PR mergeable when it isn't. We haven't seen that happen yet, but if it does we should consider turning off Tide until this is resolved.
:+1:, if we're seeing PRs incorrectly included we should probably shut down tide. It's worrying though that we have no communication from github that this is known, so we won't know when it is fixed for sure either...
https://status.github.com/messages has nothing for this still, have we gotten any response to support emails?
No response to my support request, but I only sent it 45 mins ago.
@munnerz sent one this morning and also contacted directly by email.
@munnerz since CCed us on his thread from this morning, they are looking into it, but there's not anything public to track. We could probably shut down tide for now and follow up on @spiffxp's email informing everyone until things settle.
Has anyone reached out to the CNCF Help Desk. They might have an existing relationship with Github.
:wave: GitHub search engineer here. I've dug through our code and identified the reason that these issues are not showing up in the search results. I'm able to reproduce the bug with three lines of ruby, and the correct team is looking at how to resolve the issue.
A fix will be forthcoming "soon" (sorry for the nebulous software engineer answer there). I'll drop a note on this issue when things are resolved.
Thanks @TwP!
Thank you @TwP !
Note for our internal followup: the commenter (@fejta-bot) is being disabled as it wasn't able to search correctly and was spamming at least one issue.
@TwP will the status.github.com page be updated to reflect this regression in the search API? Other consumers should be made aware.
First of all I want to give a big "thank you" to @spiffxp for including the example searches in this issue. That made the process of tracking down the root cause much much quicker :bowing_man:
The root issue here was the code that transforms markdown into plain text format. We use the CommonMarker ruby gem to strip markdown syntax from comments, issues bodies, pull request bodies, etc. while constructnig documents to send to the search index. There was a regression in the CommonMarker library addressed here https://github.com/gjtorikian/commonmarker/pull/74. The regression triggered an assertion in the C-level code of the gem which then caused the Ruby interpreter to exit. Any issues or pull requests that included strikethrough text would trigger this assertion.
We have rolled out the patched CommonMarker code across our infrastructure, and we have started repair jobs that will reconcile the search indices with the database. While those jobs are running, if you run into further errant search results just add a reaction :heart: or a comment to the item that is missing from your search results. That action will trigger an indexing event for the item. You'll need to wait a few minutes, too, for caches to clear before trying your query again to see the item in your results.
Again, thank you for the excellent observations and write up here in this issue.
On a personal note, thank you for all the work on Kubernetes. We are big fans at GitHub :octocat::tada:
I bumped the PRs listed in this thread and they were discovered by our merge automation 馃帀
Thanks for your help @TwP!
if you run into further errant search results just add a reaction 鉂わ笍 or a comment to the item that is missing from your search results. That action will trigger an indexing event for the item.
the real reason for reactions surfaces at last ;-)
@TwP Thank you for your help and the detailed explanation!
@TwP - how can I get some k8s swag to you for being super helpful?
@parispittman thank you for the kind offer. I am all set on swag at the moment. Helping out teams is the fun part of the job, and it was a enjoyable little bug to hunt down.
Most helpful comment
First of all I want to give a big "thank you" to @spiffxp for including the example searches in this issue. That made the process of tracking down the root cause much much quicker :bowing_man:
The root issue here was the code that transforms markdown into plain text format. We use the CommonMarker ruby gem to strip markdown syntax from comments, issues bodies, pull request bodies, etc. while constructnig documents to send to the search index. There was a regression in the CommonMarker library addressed here https://github.com/gjtorikian/commonmarker/pull/74. The regression triggered an assertion in the C-level code of the gem which then caused the Ruby interpreter to exit. Any issues or pull requests that included
strikethroughtext would trigger this assertion.We have rolled out the patched CommonMarker code across our infrastructure, and we have started repair jobs that will reconcile the search indices with the database. While those jobs are running, if you run into further errant search results just add a reaction :heart: or a comment to the item that is missing from your search results. That action will trigger an indexing event for the item. You'll need to wait a few minutes, too, for caches to clear before trying your query again to see the item in your results.
Again, thank you for the excellent observations and write up here in this issue.
On a personal note, thank you for all the work on Kubernetes. We are big fans at GitHub :octocat::tada: