Quarkus: Pool AdminClient in KafkaStreams health check

Created on 12 Jun 2020 · 3Comments · Source: quarkusio/quarkus

Describe the bug

Lots of INFO logs are generated by default due to KafkaStreams health check (500MB of logs in a few hours with 20 pods & 10s check frequency):

INFO  <[org.apa.kaf.cli.adm.AdminClientConfig]> AdminClientConfig values: 
    client.dns.lookup = default
    client.id = 
    connections.max.idle.ms = 300000
    (...)
INFO  <[org.apa.kaf.com.uti.AppInfoParser]> version: 2.4.1

The logging is kind of hardcoded into Kafka when using Admin#create, but a workaround is to set:

quarkus.log.category."org.apache.kafka.clients.admin.AdminClientConfig".level=WARN
quarkus.log.category."org.apache.kafka.common.utils.AppInfoParser".level=WARN

Though creating many admin clients also raises the question of connection usage on the broker.
And we're seeing some AdminClient#close actually taking a long time (not investigated):

io.vertx.core.impl.BlockedThread - WARNING <[io.ver.cor.imp.BlockedThreadChecker] TID#17> [PFX: ] Thread Thread[vert.x-worker-thread-5,5,main]=Thread[vert.x-worker-thread-5,5,main] has been blocked for 69273 ms, time limit is 60000 ms: io.vertx.core.VertxException: Thread blocked
    at [email protected]/java.lang.Object.wait(Native Method)
    at [email protected]/java.lang.Thread.join(Thread.java:1305)
    at [email protected]/java.lang.Thread.join(Thread.java:1379)
    at app//org.apache.kafka.clients.admin.KafkaAdminClient.close(KafkaAdminClient.java:541)
    at app//org.apache.kafka.clients.admin.Admin.close(Admin.java:96)
    at app//org.apache.kafka.clients.admin.Admin.close(Admin.java:79)
    at app//io.quarkus.kafka.streams.runtime.KafkaStreamsTopologyManager.getMissingTopics(KafkaStreamsTopologyManager.java:175)
    at app//io.quarkus.kafka.streams.runtime.KafkaStreamsTopologyManager_ClientProxy.getMissingTopics(KafkaStreamsTopologyManager_ClientProxy.zig:339)
    at app//io.quarkus.kafka.streams.runtime.health.KafkaStreamsTopicsHealthCheck.call(KafkaStreamsTopicsHealthCheck.java:50)
    at app//io.quarkus.kafka.streams.runtime.health.KafkaStreamsTopicsHealthCheck_ClientProxy.call(KafkaStreamsTopicsHealthCheck_ClientProxy.zig:117)
    at app//io.smallrye.health.SmallRyeHealthReporter.jsonObject(SmallRyeHealthReporter.java:185)
    at app//io.smallrye.health.SmallRyeHealthReporter.fillCheck(SmallRyeHealthReporter.java:172)
    at app//io.smallrye.health.SmallRyeHealthReporter.processChecks(SmallRyeHealthReporter.java:160)
    at app//io.smallrye.health.SmallRyeHealthReporter.getHealth(SmallRyeHealthReporter.java:138)
    at app//io.smallrye.health.SmallRyeHealthReporter.getReadiness(SmallRyeHealthReporter.java:105)
    at app//io.smallrye.health.SmallRyeHealthReporter_ClientProxy.getReadiness(SmallRyeHealthReporter_ClientProxy.zig:104)
    at app//io.quarkus.smallrye.health.runtime.SmallRyeReadinessHandler.handle(SmallRyeReadinessHandler.java:42)

Expected behavior

AdminClients should be pooled.
Based on a recent answer to https://stackoverflow.com/questions/57038823/is-adminclient-in-package-kafka-clients-thread-safe , it would appear AdminClient is actually thread-safe (I haven't seen it reflected on the doc yet ?), so we could share the instance.
I'm wondering if other frameworks already took that assumption.

Actual behavior

New AdminClient created every 10s.

Environment (please complete the following information):

Output of java -version: 11.0.5
Quarkus version or git rev: 1.4.1.Final

arekafka-streams kinbug

Source

rquinio

Most helpful comment

I haven't tried with the quickstrat, but the log would happen when polling health /ready endpoint.

There's 1 admin client instance created each time: https://github.com/quarkusio/quarkus/blob/f80bb060c1af9995609358c02877c817af6ff5ea/extensions/kafka-streams/runtime/src/main/java/io/quarkus/kafka/streams/runtime/KafkaStreamsTopologyManager.java#L29

I'll open a PR with the proposed change.

rquinio on 29 Jul 2020

👍3

All 3 comments

/cc @gunnarmorling

quarkusbot on 12 Jun 2020

I was not able to reproduce this on the Quickstart example with the latest Quarkus version.
Can you please re-check your application with Quarkus 1.6.1.Final?

I think that 849fe83e813b0763c3f39c3565b8715aeaf551ec may have inadvertently fixed this as well (cc @gsmet)