Quarkus: Quarkus HTTP response gets messed up under load

Created on 16 Nov 2020 · 11Comments · Source: quarkusio/quarkus

Describe the bug
The issue title might be a bit misleading, still trying to understand.

Last week, we had an application starting to return broken HTTP response (more on this below) under load on OpenShift (important as the limited concurrency may be the issue).

while true; do curl -XGET -i -Hx-rh-identity:`cat rhid` localhost:8080/api/notifications/v1.0/notifications/defaults; done;

The application is using Mutiny, Reactor, and R2DBC. It does an R2DBC query and the result is produced in a reactor epoll thread (Thread[reactor-tcp-epoll-7,5,executor]).

The HTTP endpoint is returning a Uni<List<Endpoint>>.
Under load, it sometimes returns:

HTTP/1.1 200 OK
Content-Length: 0

and sometimes:

HTTP/1.1 200 OK
Content-Length: 706
Content-Type: application/json
[{...}][{...}]

So 2 concatenated responses.

We have verified that the item provided by the Uni is correct, so the issue is around the HTTP response write and flush.

As the endpoint is async, the request is suspended and resume. But it seems that the resuming may lead to a wrong (Vert.x) context.

To Reproduce

Unfortunately, I was not able to reproduce it locally.
It only happens in OpenShift. I believe it's because of the limited parallelism (and so limited event loop) which means they may be reused by multiple requests.

arekubernetes kinbug

Source

cescoffier

Most helpful comment

Another case of the client interfering with the server that I am sure @FroMage will appreciate :)

Didn't I tell you about this? ;)

FroMage on 17 Nov 2020

🎉2

All 11 comments

/cc @geoand

quarkus-bot[bot] on 16 Nov 2020

@stuartwdouglas any idea?

cescoffier on 16 Nov 2020

The issue happens in https://github.com/RedHatInsights/notifications-backend

cescoffier on 16 Nov 2020

So this is very odd. I think it must be happening in the RESTEasy layer, but I don't really see how.

The reason I think this is because of the content-length header. This is set in io.quarkus.resteasy.runtime.standalone.VertxHttpResponse#prepareWrite. If the content length actually matches the response then that implies that both lists were written to the output stream.

stuartwdouglas on 17 Nov 2020

I have managed to reproduce this at https://github.com/RedHatInsights/notifications-backend/commit/627a9243e6e486c8abc58e92efee7e4c603da359

still not sure about the root cause though.

stuartwdouglas on 17 Nov 2020

This is caused by the custom authentication layer, when a response is recieved from the MP Rest Client this will resume processing on a worker thread that already has some RESTEasy ThreadLocal state on the thread. Somehow this additional state breaks RESTEasy server when we start server side processing. I have no idea how or why, but when I clear the ThreadLocal state I can no longer reproduce this.

stuartwdouglas on 17 Nov 2020

Wow!

Another case of the client interfering with the server that I am sure @FroMage will appreciate :)

geoand on 17 Nov 2020

Another case of the client interfering with the server that I am sure @FroMage will appreciate :)

Didn't I tell you about this? ;)

FroMage on 17 Nov 2020

🎉2

Actually it's worse: it's a mix of two issues that plague RESTEasy's context:

mixing client and server context
auto-creating contexts if not present, but re-using (and polluting) whatever current context otherwise

FroMage on 17 Nov 2020

Can this cause problems for apps that write JAX-RS responses from MP Client callbacks? If so I think this is a CVE, as you could send a users data to the wrong people. Even as it currently stands I think this issue is likely a CVE, although not something that we really provide out of the box so maybe a low priority one. @asoldano do you have any idea why resteasy could end up aggregating the data from two different requests into the same response when the client is in use, and more importantly is there any chance it could happen for more normal client use that is not in the authentication layer before the server side processing starts?

stuartwdouglas on 17 Nov 2020

Well, I'm not sure, I think it depends on what state is present in the context. Just JAX-RS spec state is very little: "Except for Configuration and Providers, which are injectable in both client and server-side providers, all the other types are server-side only." so it could affect marshalling providers, but not really marshalling _state_.

I assume what's causing this issue is _internal_ RESTEasy state in the context, as opposed to JAX-RS contextual objects. So it depends what state is causing this. Perhaps the VertxHttpResponse? Except the dispatch call will set it, even if there's one already, so it's hard to think it's that.

FroMage on 17 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings