Describe the bug
The issue title might be a bit misleading, still trying to understand.
Last week, we had an application starting to return broken HTTP response (more on this below) under load on OpenShift (important as the limited concurrency may be the issue).
while true; do curl -XGET -i -Hx-rh-identity:`cat rhid` localhost:8080/api/notifications/v1.0/notifications/defaults; done;
The application is using Mutiny, Reactor, and R2DBC. It does an R2DBC query and the result is produced in a reactor epoll thread (Thread[reactor-tcp-epoll-7,5,executor]).
The HTTP endpoint is returning a Uni<List<Endpoint>>.
Under load, it sometimes returns:
HTTP/1.1 200 OK
Content-Length: 0
and sometimes:
HTTP/1.1 200 OK
Content-Length: 706
Content-Type: application/json
[{...}][{...}]
So 2 concatenated responses.
We have verified that the item provided by the Uni is correct, so the issue is around the HTTP response write and flush.
As the endpoint is async, the request is suspended and resume. But it seems that the resuming may lead to a wrong (Vert.x) context.
To Reproduce
Unfortunately, I was not able to reproduce it locally.
It only happens in OpenShift. I believe it's because of the limited parallelism (and so limited event loop) which means they may be reused by multiple requests.
/cc @geoand
@stuartwdouglas any idea?
The issue happens in https://github.com/RedHatInsights/notifications-backend
So this is very odd. I think it must be happening in the RESTEasy layer, but I don't really see how.
The reason I think this is because of the content-length header. This is set in io.quarkus.resteasy.runtime.standalone.VertxHttpResponse#prepareWrite. If the content length actually matches the response then that implies that both lists were written to the output stream.
I have managed to reproduce this at https://github.com/RedHatInsights/notifications-backend/commit/627a9243e6e486c8abc58e92efee7e4c603da359
still not sure about the root cause though.
This is caused by the custom authentication layer, when a response is recieved from the MP Rest Client this will resume processing on a worker thread that already has some RESTEasy ThreadLocal state on the thread. Somehow this additional state breaks RESTEasy server when we start server side processing. I have no idea how or why, but when I clear the ThreadLocal state I can no longer reproduce this.
Wow!
Another case of the client interfering with the server that I am sure @FroMage will appreciate :)
Another case of the client interfering with the server that I am sure @FroMage will appreciate :)
Didn't I tell you about this? ;)
Actually it's worse: it's a mix of two issues that plague RESTEasy's context:
Can this cause problems for apps that write JAX-RS responses from MP Client callbacks? If so I think this is a CVE, as you could send a users data to the wrong people. Even as it currently stands I think this issue is likely a CVE, although not something that we really provide out of the box so maybe a low priority one. @asoldano do you have any idea why resteasy could end up aggregating the data from two different requests into the same response when the client is in use, and more importantly is there any chance it could happen for more normal client use that is not in the authentication layer before the server side processing starts?
Well, I'm not sure, I think it depends on what state is present in the context. Just JAX-RS spec state is very little: "Except for Configuration and Providers, which are injectable in both client and server-side providers, all the other types are server-side only." so it could affect marshalling providers, but not really marshalling _state_.
I assume what's causing this issue is _internal_ RESTEasy state in the context, as opposed to JAX-RS contextual objects. So it depends what state is causing this. Perhaps the VertxHttpResponse? Except the dispatch call will set it, even if there's one already, so it's hard to think it's that.
Most helpful comment
Didn't I tell you about this? ;)