Description
The reactive datasource drivers from Vert.x support reconnection:
https://vertx.io/docs/vertx-mysql-postgresql-client/java/#_configuring_reconnections
However, this configuration seems unreachable using standard Quarkus configuration methods. (or undocumented)
Providing the ability to define reconnect properties would be hugely useful in order to gracefully handle database failovers and general network blips.
cc @aguibert
/cc @tsegismont
hi @kostacasa, the documentation you've linked is actually for the old MySQL / PostgreSQL clients from Vertx which are currently deprecated in Vertx and not the same thing as the Vertx SQL Client we have in Quarkus.
You can find the documentation for those drivers here:
https://vertx.io/docs/#data_access
Fortunately there are equivalents to these options in the current Vertx clients, which would look something like:
PgConnectOptions connectOptions = (PgConnectOptions) new PgConnectOptions()
.setPort(PORT)
.setHost(HOST)
// etc...
.setReconnectAttempts(5)
.setReconnectInterval(50); // ms
So I think we could expose these properties in Quarkus config under quarkus.datasource.reactive.* which is where we put the shared Vertx SQL client pool config settings like this.
Actually, scratch my previous comment. The reconnect properties come from Vertx core, which is already configurable in Quarkus with these properties:
quarkus.vertx.eventbus.reconnect-attempts=5
quarkus.vertx.eventbus.reconnect-interval=50
See https://quarkus.io/guides/all-config#quarkus-vertx-core_quarkus.vertx.eventbus.reconnect-attempts
This is fantastic, thank you for the pointer! I didn't realize these options would affect the datasource, perhaps it might be worthwhile mentioning it in the reactive datasource documentation.
Actually, scratch my previous comment. The reconnect properties come from Vertx core, which is already configurable in Quarkus with these properties:
quarkus.vertx.eventbus.reconnect-attempts=5 quarkus.vertx.eventbus.reconnect-interval=50See https://quarkus.io/guides/all-config#quarkus-vertx-core_quarkus.vertx.eventbus.reconnect-attempts
@aguibert, we were under the impression that reconnection would take place whenever a network problem arises, no matter if it's the first time we're connecting to a database or later, with a successfully established connection. But that doesn't seem to be the case.
We did a simple test, setting the properties like this:
quarkus.vertx.eventbus.reconnect-attempts=10
quarkus.vertx.eventbus.reconnect-interval=1000
We then directed our reactive datasource to connect to a port that nothing is listening at:
quarkus.datasource.reactive.url=postgresql://localhost:54321/dev
We were expecting the connecting process would take at least 10 seconds to finish, but we got an exception much sooner (~2s), indicating that there were no 10 retries with one second pause between each of them.
Furthermore, we noticed the same behavior no matter what values the properties were set to, indicating that they don't make any difference for our PostgreSQL reactive datasource.
Are we doing something wrong? Is there a way for reactive datasources to cope with networking glitches, no matter what kind?
Hi @kostacasa @abutic , I am a Vert.x core team member and maintainer of Reactive SQL clients extension in Quarkus.
As @aguibert said, the vertx-mysql-postgresql-client is deprecated in Vert.x 3 and it is not the client that Quarkus integrates.
Quarkus integrates the Vert.x Reactive Clients for PostgreSQL, MySQL and DB2.
These three clients have connection options that extends io.vertx.core.net.NetClientOptions (given they all use the Vert.x TCP client).
NetClientOptions has reconnectAttempts and reconnectInterval fields but these are TCP connection options. Indeed, they allow to control how many times the TCP client tries to establish a connection if if fails because of a ConnectException (or FileNotFoundException for domain sockets).
The quarkus.vertx.eventbus.reconnect-attempts and quarkus.vertx.eventbus.reconnect-interval are not related to the SQL clients, but to the Vert.x event bus. Setting them has no impact on the SQL clients behavior.
Please reopen the issue if you need the reconnect-attempts and reconnect-intervaloptions to be exposed at the reactive datasource config level.
Thanks @tsegismont, this is useful context.
Rather than constantly playing a catch-up game with exposed Quarkus application.properties and the capabilities that the underlying drivers support, perhaps a better approach would be to expose the ability to configure the PgPool/PgConnectOptions programatically at instantiation time, similar to how the jackson ObjectMapper can be configured.
@aguibert any thoughts on this?
@tsegismont, thanks for getting back to us!
We don't use deprecated vertx-mysql-postgresql-client, so no worries there.
We use Vert.x Reactive Client for PostgreSQL and we'd like to be able to:
If I understand you right, we'll be able to control the first two issues once reconnect-attempts and reconnect-interval options get exposed at the reactive datasource config level? So if a DB, for any reason, is not capable of establishing a connection, our reactive client would retry as instructed?
What about 3. and 4.? Is there any way to get that kind of behavior?
for (3) you can solve that with quarkus.vertx.eventbus.idle-timeout
For (4) as @tsegismont and I mentioned previously, there are the TCP-level options to retry and TCP-level failures that may occur. However, there are not any options for higher level failures than that. If your application gets a DB-level I/O failure I think it would be dangerous to have a global setting to automatically retry everything that fails. That could easily result in unintended behavior and/or data integrity issues.
Instead, I would suggest application-level retries using the Mutiny API, for example:
// Run a query, retry at most 3 times if it fails
Uni<RowSet<Row>> rowSet = client.query("SELECT id, name FROM fruits ORDER BY name ASC")
.execute()
.onFailure().retry().atMost(3);
For more details see: https://smallrye.io/smallrye-mutiny/#_how_do_i_recover_from_failure
@aguibert So, is it right to say that java.io.IOException: Connection reset by peer is a result of TCP-level failure you're mentioning? And having NetClientOptions' reconnectAttempts and reconnectInterval properties set could save as from that kind of troubles with reactive postgresql driver? Of course, provided that a retry was successful.
As for (4), you're right, we don't want to retry upon any error, but upon communication errors only. It sounds like having aforementioned properties set will provide what we want. Right?
Yeah, I think we should expose these properties in the Quarkus config.
for (3) you can solve that with
quarkus.vertx.eventbus.idle-timeout
@aguibert and @tsegismont, this is my test scenario:
quarkus.vertx.eventbus.idle-timeout property value to 5000 and start an app with mvn quarkus:devBased on your reply, I expected that a reactive client would disconnect from DB after 5 seconds of inactivity, but that wasn't the case: connection still stands and is visible at postgres' side using a query that selects it from pg_stat_activity view. Connection immediately disappears after stopping the app.
Am I missing something?
@gsmet which properties do you think we need to expose? I believe all of the properties being asked for here are already exposed. Specifically:
@abutic what happens if you set quarkus.vertx.eventbus.idle-timeout=5? Looking at the code I think we have an inconsistency between the doc (which says value is in millis) because the code appears to read the value as seconds.
@abutic what happens if you set
quarkus.vertx.eventbus.idle-timeout=5? Looking at the code I think we have an inconsistency between the doc (which says value is in millis) because the code appears to read the value as seconds.
@aguibert, thanks for discovering seconds/milliseconds inconsistency.
Unfortunately, no difference - DB connection still stands.
First, I execute
SELECT * FROM pg_stat_activity where application_name = 'vertx-pg-client'
and get no results.
Then I make an app connect to a DB using reactive driver and get some results by executing some app-specific query. Issuing the aforementioned select statement again now retrieves a single row, related to newly established reactive driver DB connection, as expected.
Issuing it again after 5 seconds (and repeating the process couple of times) shows the connection is still there, i.e. the quarkus.vertx.eventbus.idle-timeout=5 had no expected effect, so I still see no way for setting reactive client connection idle timeout, i.e. making a reactive client disconnect from a database if the connection has not been used for specified amount of time.
Stopping the client application clears that connection right away, i.e. executing
SELECT * FROM pg_stat_activity where application_name = 'vertx-pg-client'
return no rows after the application has been stopped.
@aguibert I was talking about these ones:
Please reopen the issue if you need the reconnect-attempts and reconnect-interval options to be exposed at the reactive datasource config level.
@gsmet and @aguibert, a little more context: we use Quarkus on Google Cloud Run, connecting to Aiven PostgreSQL database.
Upon using an app and leaving it idle for period of ~1 day, we get a java.io.IOException: Connection reset by peer when we try to use existing reactive client DB connection again - it seems as the client is not aware of the fact that the connection got closed for some reason.
To contrast this, the client is aware that connection got closed if we close it "by hand" on PostgreSQL, issuing
select pg_terminate_backend(pid)
FROM pg_stat_activity where pid = ?
statement. An app creates a new connection and uses it with no error reported in this case.
@aguibert about https://github.com/quarkusio/quarkus/issues/11149#issuecomment-674145521 :
This is wrong, these properties only apply to event bus connection and have no impact on SQL clients.
@gsmet @aguibert I'll work on adding the idle-timeout, reconnect-attempts and reconnect-interval props to the reactive datasource configuration.
What about 3. and 4.? Is there any way to get that kind of behavior?
@abutic it's not supported and I would recommend to follow @aguibert 's advice: use Mutiny'sonFailureAPI.
What about 3. and 4.? Is there any way to get that kind of behavior?
@abutic it's not supported and I would recommend to follow @aguibert 's advice: use Mutiny's
onFailureAPI.
@tsegismont, I hope you meant (4) wasn't supported. Adding the idle-timeout property to the reactive datasource configuration should enable (3):
- set client connection idle timeout, i.e. make a client disconnect from a database if the connection has not been used for specified amount of time
Am I right? Thanks!
@abutic right, I meant (4)
@gsmet can you please remove the triage/invalid label? Thank you