Quarkus: Expose reconnection properties for reactive datasources

Created on 2 Aug 2020 · 23Comments · Source: quarkusio/quarkus

Description
The reactive datasource drivers from Vert.x support reconnection:
https://vertx.io/docs/vertx-mysql-postgresql-client/java/#_configuring_reconnections
However, this configuration seems unreachable using standard Quarkus configuration methods. (or undocumented)

Providing the ability to define reconnect properties would be hugely useful in order to gracefully handle database failovers and general network blips.

kinenhancement

Source

kostacasa

All 23 comments

cc @aguibert

geoand on 3 Aug 2020

/cc @tsegismont

gsmet on 3 Aug 2020

hi @kostacasa, the documentation you've linked is actually for the old MySQL / PostgreSQL clients from Vertx which are currently deprecated in Vertx and not the same thing as the Vertx SQL Client we have in Quarkus.

You can find the documentation for those drivers here:
https://vertx.io/docs/#data_access

Fortunately there are equivalents to these options in the current Vertx clients, which would look something like:

        PgConnectOptions connectOptions = (PgConnectOptions) new PgConnectOptions()
                .setPort(PORT)
                .setHost(HOST)
                // etc...
                .setReconnectAttempts(5)
                .setReconnectInterval(50); // ms

So I think we could expose these properties in Quarkus config under quarkus.datasource.reactive.* which is where we put the shared Vertx SQL client pool config settings like this.

aguibert on 3 Aug 2020

Actually, scratch my previous comment. The reconnect properties come from Vertx core, which is already configurable in Quarkus with these properties:

quarkus.vertx.eventbus.reconnect-attempts=5
quarkus.vertx.eventbus.reconnect-interval=50

See https://quarkus.io/guides/all-config#quarkus-vertx-core_quarkus.vertx.eventbus.reconnect-attempts

aguibert on 3 Aug 2020

👍1

This is fantastic, thank you for the pointer! I didn't realize these options would affect the datasource, perhaps it might be worthwhile mentioning it in the reactive datasource documentation.

kostacasa on 3 Aug 2020

👍1

Actually, scratch my previous comment. The reconnect properties come from Vertx core, which is already configurable in Quarkus with these properties:
quarkus.vertx.eventbus.reconnect-attempts=5
quarkus.vertx.eventbus.reconnect-interval=50
See https://quarkus.io/guides/all-config#quarkus-vertx-core_quarkus.vertx.eventbus.reconnect-attempts

@aguibert, we were under the impression that reconnection would take place whenever a network problem arises, no matter if it's the first time we're connecting to a database or later, with a successfully established connection. But that doesn't seem to be the case.

We did a simple test, setting the properties like this:

quarkus.vertx.eventbus.reconnect-attempts=10
quarkus.vertx.eventbus.reconnect-interval=1000

We then directed our reactive datasource to connect to a port that nothing is listening at:
quarkus.datasource.reactive.url=postgresql://localhost:54321/dev

We were expecting the connecting process would take at least 10 seconds to finish, but we got an exception much sooner (~2s), indicating that there were no 10 retries with one second pause between each of them.

Furthermore, we noticed the same behavior no matter what values the properties were set to, indicating that they don't make any difference for our PostgreSQL reactive datasource.

Are we doing something wrong? Is there a way for reactive datasources to cope with networking glitches, no matter what kind?

abutic on 13 Aug 2020

Hi @kostacasa @abutic , I am a Vert.x core team member and maintainer of Reactive SQL clients extension in Quarkus.

As @aguibert said, the vertx-mysql-postgresql-client is deprecated in Vert.x 3 and it is not the client that Quarkus integrates.

Quarkus integrates the Vert.x Reactive Clients for PostgreSQL, MySQL and DB2.

These three clients have connection options that extends io.vertx.core.net.NetClientOptions (given they all use the Vert.x TCP client).

NetClientOptions has reconnectAttempts and reconnectInterval fields but these are TCP connection options. Indeed, they allow to control how many times the TCP client tries to establish a connection if if fails because of a ConnectException (or FileNotFoundException for domain sockets).

The quarkus.vertx.eventbus.reconnect-attempts and quarkus.vertx.eventbus.reconnect-interval are not related to the SQL clients, but to the Vert.x event bus. Setting them has no impact on the SQL clients behavior.

tsegismont on 13 Aug 2020

Please reopen the issue if you need the reconnect-attempts and reconnect-intervaloptions to be exposed at the reactive datasource config level.

tsegismont on 13 Aug 2020

Thanks @tsegismont, this is useful context.

Rather than constantly playing a catch-up game with exposed Quarkus application.properties and the capabilities that the underlying drivers support, perhaps a better approach would be to expose the ability to configure the PgPool/PgConnectOptions programatically at instantiation time, similar to how the jackson ObjectMapper can be configured.

@aguibert any thoughts on this?

kostacasa on 13 Aug 2020

@tsegismont, thanks for getting back to us!

We don't use deprecated vertx-mysql-postgresql-client, so no worries there.

We use Vert.x Reactive Client for PostgreSQL and we'd like to be able to:

try reconnecting to a database if the first attempt fails
control the intervals between reconnection attempts
set client connection idle timeout, i.e. make a client disconnect from a database if the connection has not been used for specified amount of time
retry any DB call if it fails fails because of IOException or any of its subclasses

If I understand you right, we'll be able to control the first two issues once reconnect-attempts and reconnect-interval options get exposed at the reactive datasource config level? So if a DB, for any reason, is not capable of establishing a connection, our reactive client would retry as instructed?

What about 3. and 4.? Is there any way to get that kind of behavior?

abutic on 13 Aug 2020

for (3) you can solve that with quarkus.vertx.eventbus.idle-timeout

For (4) as @tsegismont and I mentioned previously, there are the TCP-level options to retry and TCP-level failures that may occur. However, there are not any options for higher level failures than that. If your application gets a DB-level I/O failure I think it would be dangerous to have a global setting to automatically retry everything that fails. That could easily result in unintended behavior and/or data integrity issues.

Instead, I would suggest application-level retries using the Mutiny API, for example:

// Run a query, retry at most 3 times if it fails
Uni<RowSet<Row>> rowSet = client.query("SELECT id, name FROM fruits ORDER BY name ASC")
                            .execute()
                            .onFailure().retry().atMost(3);

For more details see: https://smallrye.io/smallrye-mutiny/#_how_do_i_recover_from_failure

aguibert on 13 Aug 2020

👍1

@aguibert So, is it right to say that java.io.IOException: Connection reset by peer is a result of TCP-level failure you're mentioning? And having NetClientOptions' reconnectAttempts and reconnectInterval properties set could save as from that kind of troubles with reactive postgresql driver? Of course, provided that a retry was successful.

As for (4), you're right, we don't want to retry upon any error, but upon communication errors only. It sounds like having aforementioned properties set will provide what we want. Right?

abutic on 13 Aug 2020

Yeah, I think we should expose these properties in the Quarkus config.

gsmet on 13 Aug 2020

for (3) you can solve that with quarkus.vertx.eventbus.idle-timeout

@aguibert and @tsegismont, this is my test scenario:

set quarkus.vertx.eventbus.idle-timeout property value to 5000 and start an app with mvn quarkus:dev
trigger a request that results in connecting to database and executing a query
leave an app at rest

Based on your reply, I expected that a reactive client would disconnect from DB after 5 seconds of inactivity, but that wasn't the case: connection still stands and is visible at postgres' side using a query that selects it from pg_stat_activity view. Connection immediately disappears after stopping the app.

Am I missing something?

abutic on 13 Aug 2020

@gsmet which properties do you think we need to expose? I believe all of the properties being asked for here are already exposed. Specifically:

quarkus.vertx.eventbus.idle-timeout
quarkus.vertx.eventbus.reconnect-attempts
quarkus.vertx.eventbus.reconnect-interval

aguibert on 14 Aug 2020

@abutic what happens if you set quarkus.vertx.eventbus.idle-timeout=5? Looking at the code I think we have an inconsistency between the doc (which says value is in millis) because the code appears to read the value as seconds.

aguibert on 14 Aug 2020

@abutic what happens if you set quarkus.vertx.eventbus.idle-timeout=5? Looking at the code I think we have an inconsistency between the doc (which says value is in millis) because the code appears to read the value as seconds.

@aguibert, thanks for discovering seconds/milliseconds inconsistency.

Unfortunately, no difference - DB connection still stands.

First, I execute
SELECT * FROM pg_stat_activity where application_name = 'vertx-pg-client'
and get no results.

Then I make an app connect to a DB using reactive driver and get some results by executing some app-specific query. Issuing the aforementioned select statement again now retrieves a single row, related to newly established reactive driver DB connection, as expected.

Issuing it again after 5 seconds (and repeating the process couple of times) shows the connection is still there, i.e. the quarkus.vertx.eventbus.idle-timeout=5 had no expected effect, so I still see no way for setting reactive client connection idle timeout, i.e. making a reactive client disconnect from a database if the connection has not been used for specified amount of time.

Stopping the client application clears that connection right away, i.e. executing
SELECT * FROM pg_stat_activity where application_name = 'vertx-pg-client'
return no rows after the application has been stopped.

abutic on 16 Aug 2020

@aguibert I was talking about these ones:

Please reopen the issue if you need the reconnect-attempts and reconnect-interval options to be exposed at the reactive datasource config level.

gsmet on 17 Aug 2020

@gsmet and @aguibert, a little more context: we use Quarkus on Google Cloud Run, connecting to Aiven PostgreSQL database.

Upon using an app and leaving it idle for period of ~1 day, we get a java.io.IOException: Connection reset by peer when we try to use existing reactive client DB connection again - it seems as the client is not aware of the fact that the connection got closed for some reason.

To contrast this, the client is aware that connection got closed if we close it "by hand" on PostgreSQL, issuing

select pg_terminate_backend(pid)
FROM pg_stat_activity where pid = ?

statement. An app creates a new connection and uses it with no error reported in this case.

abutic on 17 Aug 2020

@aguibert about https://github.com/quarkusio/quarkus/issues/11149#issuecomment-674145521 :

This is wrong, these properties only apply to event bus connection and have no impact on SQL clients.

@gsmet @aguibert I'll work on adding the idle-timeout, reconnect-attempts and reconnect-interval props to the reactive datasource configuration.

What about 3. and 4.? Is there any way to get that kind of behavior?
@abutic it's not supported and I would recommend to follow @aguibert 's advice: use Mutiny's onFailure API.

tsegismont on 17 Aug 2020

🎉1

What about 3. and 4.? Is there any way to get that kind of behavior?

@abutic it's not supported and I would recommend to follow @aguibert 's advice: use Mutiny's onFailure API.

@tsegismont, I hope you meant (4) wasn't supported. Adding the idle-timeout property to the reactive datasource configuration should enable (3):