Quarkus: OIDC CodeFlowTest has become unstable

Created on 19 May 2020  路  19Comments  路  Source: quarkusio/quarkus

areinfra-automation areoidc kinbug

All 19 comments

@geoand @gsmet @n1hility Hi, will I disable 2 tests (testAccessTokenInjection, etc) in the test suite which have failed recently ?

Sounds good to me

9455 disabled the 2 tests, lets see if it isolates the problem

I've spent all the day today trying to catch the cause of the problem. I don't think it happens inside the quarkus-oidc code but it is very hard to trace, running 500 tests in a row, JVM and native

500 runs, wow!!!

Hi All, so I've continued tracing and I think I've reached the point where I have no idea what else can be done to trace it. Here is what is happening when a successful testAccessTokenInjectionTest is run:

// Get a static resource index.html, OIDC CodeAuthenticationMechanism checks it
(1)    2020-05-20 13:27:19,281 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-7) Auth:http://localhost:8081/index.html

// PathMatchingHttpSecurityPolicy checks (2) and denies (3) an acess to index.html
(2) 2020-05-20 13:27:19,291 ERROR [io.qua.ver.htt.run.sec.PathMatchingHttpSecurityPolicy] (vert.x-eventloop-thread-7) p1:http://localhost:8081/index.html
(3) 2020-05-20 13:27:19,294 ERROR [io.qua.ver.htt.run.sec.PathMatchingHttpSecurityPolicy] (vert.x-eventloop-thread-7) p2:http://localhost:8081/index.html

// This is the actual test, here we are already at the Keycloak Login page, because invisibly to the user  CodeAuthenticationMechanism has redirect the user to authenticate 
(4) 2020-05-20 13:27:20,180 ERROR [io.qua.it.key.CodeFlowTest] (main) Login***

// This is a user returning back from Keycloak, part of the code flow, note URI is http://localhost:8081/web-app per the code flow configuration, but since the user originally requested index.html, then, by default, we restore that path, see the next step

(5) 2020-05-20 13:27:20,561 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-7) Auth:http://localhost:8081/web-app?state=3042c883-58d6-40cf-9d3d-cc93dece5e23___%2Findex.html&session_state=02582d1c-c40d-4a2c-b5f1-b7616e449c34&code=afe72ae1-36f8-4b24-bcc4-5087e48726cf.02582d1c-c40d-4a2c-b5f1-b7616e449c34.693b9131-8da6-4d72-a81c-43ef231582e8

// This is CodeAuthenticationMechanism again, note now we are accessing index.html, as in step 1

(6) 2020-05-20 13:27:20,566 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-7) Auth:http://localhost:8081/index.html?pathChecked=true&state=3042c883-58d6-40cf-9d3d-cc93dece5e23___%2Findex.html&session_state=02582d1c-c40d-4a2c-b5f1-b7616e449c34&code=afe72ae1-36f8-4b24-bcc4-5087e48726cf.02582d1c-c40d-4a2c-b5f1-b7616e449c34.693b9131-8da6-4d72-a81c-43ef231582e8

// This is where we have redirected the user again to drop the redundant query parameters - at this stage the user already has a session authentication cookie, so CodeAuthenticationMechanism just lets it through 
(7) 2020-05-20 13:27:20,618 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-7) Auth:http://localhost:8081/index.html

// PathMatchingHttpSecurityPolicy checks it, all is good
(8) 2020-05-20 13:27:20,624 ERROR [io.qua.ver.htt.run.sec.PathMatchingHttpSecurityPolicy] (vert.x-eventloop-thread-7) p1:http://localhost:8081/index.html

// the test continues, after `index.html` has been successfully retrieved, we check the JAX-RS endpoint
// since the session auth cookie is still there,  CodeAuthenticationMechanism is happy
(9) 2020-05-20 13:27:20,660 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-7) Auth:http://localhost:8081/web-app/access

// PathMatchingHttpSecurityPolicy is also happy
(10) 2020-05-20 13:27:20,662 ERROR [io.qua.ver.htt.run.sec.PathMatchingHttpSecurityPolicy] (vert.x-eventloop-thread-7) p1:http://localhost:8081/web-app/access

(11) End of the test

So I run it in a long loop, and I have say 450 of the above blocks logged in exactly the same sequence.
And then, out of the blue, the following is reported:

[ERROR] testAccessTokenInjection[219]  Time elapsed: 0.51 s  <<< ERROR!
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 401 Unauthorized for http://localhost:8081/index.html
    at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:595)
    at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2277)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:914)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1002)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:1047)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:989)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:897)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:878)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:859)
    at io.quarkus.it.keycloak.CodeFlowTest.testAccessTokenInjection(CodeFlowTest.java:452)

with none of my error log messages displayed. For example, I'd N+ of those 11 steps logged, and then boom, it looks like as if some stale 401 came back. I can't trace where this error came from. neither PathMatchingHttpSecurityPolicy nor CodeAuthenticationMechanism is apparently called.

@stuartwdouglas do you have some hints ?

@stuartwdouglas Intercepting a start/end of request at the vertx filter level gives some light, but more tracing is needed

Something spooky is going on. So I was able to trace in a vertx filter a request for index.html returning 401 and then I rerun, test failed, but the vertx filter did not even show a request for index.html, this is strange.

Here is a tracing sequence at the CORS filter level with the successful test following by 401:

(1) Get index.html, get 302 redirect to KC
 2020-05-20 15:38:49,934 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) Begin: http://localhost:8081/index.html
2020-05-20 15:38:49,934 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-12) Auth: http://localhost:8081/index.html
2020-05-20 15:38:49,934 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) End: http://localhost:8081/index.html 302

(2) User has authenticated, and returning back from KC, it will end with 302 to restore the original path

2020-05-20 15:38:50,439 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) Begin: http://localhost:8081/web-app?state=8a40a335-568b-4d57-9fab-48ab468deab7___%2Findex.html&session_state=4e98e01e-09ee-4815-af0a-44930fb6e4fe&code=9e3e11f0-5b97-4786-8797-83067fdc7e15.4e98e01e-09ee-4815-af0a-44930fb6e4fe.c61d7508-b25b-401e-bcc0-e1cd226c0b9e
2020-05-20 15:38:50,439 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-12) Auth:http://localhost:8081/web-app?state=8a40a335-568b-4d57-9fab-48ab468deab7___%2Findex.html&session_state=4e98e01e-09ee-4815-af0a-44930fb6e4fe&code=9e3e11f0-5b97-4786-8797-83067fdc7e15.4e98e01e-09ee-4815-af0a-44930fb6e4fe.c61d7508-b25b-401e-bcc0-e1cd226c0b9e
2020-05-20 15:38:50,439 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) End: http://localhost:8081/web-app?state=8a40a335-568b-4d57-9fab-48ab468deab7___%2Findex.html&session_state=4e98e01e-09ee-4815-af0a-44930fb6e4fe&code=9e3e11f0-5b97-4786-8797-83067fdc7e15.4e98e01e-09ee-4815-af0a-44930fb6e4fe.c61d7508-b25b-401e-bcc0-e1cd226c0b9e 302

(3) index.html restored, another 302 to drop the query parameters

2020-05-20 15:38:50,439 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) Begin: http://localhost:8081/index.html?pathChecked=true&state=8a40a335-568b-4d57-9fab-48ab468deab7___%2Findex.html&session_state=4e98e01e-09ee-4815-af0a-44930fb6e4fe&code=9e3e11f0-5b97-4786-8797-83067fdc7e15.4e98e01e-09ee-4815-af0a-44930fb6e4fe.c61d7508-b25b-401e-bcc0-e1cd226c0b9e
2020-05-20 15:38:50,439 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-12) Auth:http://localhost:8081/index.html?pathChecked=true&state=8a40a335-568b-4d57-9fab-48ab468deab7___%2Findex.html&session_state=4e98e01e-09ee-4815-af0a-44930fb6e4fe&code=9e3e11f0-5b97-4786-8797-83067fdc7e15.4e98e01e-09ee-4815-af0a-44930fb6e4fe.c61d7508-b25b-401e-bcc0-e1cd226c0b9e
2020-05-20 15:38:50,446 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) End: http://localhost:8081/index.html?pathChecked=true&state=8a40a335-568b-4d57-9fab-48ab468deab7___%2Findex.html&session_state=4e98e01e-09ee-4815-af0a-44930fb6e4fe&code=9e3e11f0-5b97-4786-8797-83067fdc7e15.4e98e01e-09ee-4815-af0a-44930fb6e4fe.c61d7508-b25b-401e-bcc0-e1cd226c0b9e 302

(4) Finally we are getting index.html with 200

2020-05-20 15:38:50,447 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) Begin: http://localhost:8081/index.html
2020-05-20 15:38:50,447 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-12) Auth:http://localhost:8081/index.html
2020-05-20 15:38:50,448 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) End: http://localhost:8081/index.html 200

(5) Test continues, checking JAX-RS endpoint
2020-05-20 15:38:50,453 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-12) Begin: http://localhost:8081/web-app/access
2020-05-20 15:38:50,453 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-12) Auth:http://localhost:8081/web-app/access
2020-05-20 15:38:50,454 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (executor-thread-1) End: http://localhost:8081/web-app/access 200

(6) all is good, and now:
[ERROR] Tests run: 515, Failures: 0, Errors: 1, Skipped: 14, Time elapsed: 255.705 s <<< FAILURE! - in io.quarkus.it.keycloak.CodeFlowInGraalITCase
[ERROR] testAccessTokenInjection[259]  Time elapsed: 0.487 s  <<< ERROR!
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 401 Unauthorized for http://localhost:8081/index.html

Where it came from ? This is something with the Vertx ending the request, may be not

OK, I've directed output to the text file and this 401 is somewhere earlier as it is all async:

2020-05-20 17:15:17,468 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-2) Begin: http://localhost:8081/index.html
2020-05-20 17:15:17,468 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-2) Auth:http://localhost:8081/index.html
2020-05-20 17:15:17,477 ERROR [io.qua.ver.htt.run.cor.CORSFilter] (vert.x-eventloop-thread-2) End: http://localhost:8081/index.html 401

So it is some progress I guess

In the vert.x PR today there was a random test with no relation to security that failed with a 403. I wonder if there is a race somewhere in our security layer.

Hi @stuartwdouglas, All, I think I've pinpointed it, it is a favourite one of @Ladicek, sometimes the tokens are apparently issued at too early due to some clock skew, so I get a Vertx Inactive token - we don't log it yet though :-)

I would expect that there would be no clock skew in these tests, as everything runs on a single machine?

It can be just a matter of > vs >= check etc, I'm running up to 1000 tests in one go (500 in JVM, followed by native), Inactve Token means Vertx check comparing iat or exp has failed, HtmlUnit is involved, around 5 HTTP requests in a single test, so something is getting out of sync

@Ladicek You are right, it is something else. Something is going wrong inside Vertx code when verifying the token and then it tries to introspect and gets an inactve token message, this introspect should not be even happening...sigh, back to tracing...

This is weird now:

unnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.security.SignatureException: Signature length not correct: got 25 but was expecting 256
        at sun.security.rsa.RSASignature.engineVerify(RSASignature.java:211)
        at java.security.Signature$Delegate.engineVerify(Signature.java:1394)
        at java.security.Signature.verify(Signature.java:771)
        at io.vertx.ext.jwt.JWK.verify(JWK.java:412)
        ... 85 more

My next theory is that the cookie content is corrupted somehow and the logs suggests it happens internally somewhere in transit

This is what I'm getting:

(1)
2020-05-21 16:18:00,989 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-6) ID Token length:943
(2)
2020-05-21 16:18:00,990 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-6) Cookie length:2584
....
(3)
2020-05-21 16:18:00,990 DEBUG [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-6) Final redirect URI: http://localhost:8081/index.html

(4)
2020-05-21 16:18:00,992 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-6) Session Token length:2584

(5)
2020-05-21 16:18:00,992 ERROR [io.qua.oid.run.CodeAuthenticationMechanism] (vert.x-eventloop-thread-6) ID Token length 1:676

with
(1,2) : https://github.com/quarkusio/quarkus/blob/master/extensions/oidc/runtime/src/main/java/io/quarkus/oidc/runtime/CodeAuthenticationMechanism.java#L311
This is where the session token is created.
(1): result.opaqueIdToken().length()
(2):cookieValue.length()

(4,5) : https://github.com/quarkusio/quarkus/blob/master/extensions/oidc/runtime/src/main/java/io/quarkus/oidc/runtime/CodeAuthenticationMechanism.java#L89 - this is where the session cookie is checked
(4): sessionCookie.getValue().length() - this is equal to (2)
(5) tokens[0].length() - this must match (1) which it does in all the cases but when the test fails, the length differ.

Digging more into it

I think I've definitely pinpointed it

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MossabTN picture MossabTN  路  3Comments

rodrigofujioka picture rodrigofujioka  路  3Comments

gsmet picture gsmet  路  3Comments

hantsy picture hantsy  路  3Comments

gastaldi picture gastaldi  路  3Comments