Pg-promise: Recover from idle_in_transaction_session_timeout

Created on 6 Dec 2019  路  26Comments  路  Source: vitaly-t/pg-promise

Expected behavior

After an idle_in_transaction_session_timeout, the used connection will be working properly when used again from the pool.

Actual behavior

After a statement timeout in transaction, the used connection will be "broken" and new queries will not work. The error-message is Client has encountered a connection error and is not queryable

Steps to reproduce

I've added a test-case to my project fork

Environment

  • Version of pg-promise: 10.2.1
  • OS type (Linux/Windows/Mac): Windows 10
  • Version of Node.js: 12.12.0

Related

Notes

I thought that adding another check to isConnectivityError for code 25P03 (see postgres error codes) should work, but it does not. In this case the err parameter has no code property, but a message string.
A nasty workaround for now is to check this message string:

    if (!code && err.message && typeof err.message === 'string') {
        if (err.message.includes('is not queryable')) return true;
    }

bug dependency issue fixed

All 26 comments

This is from invalid transaction usage, as the author indicated so himself.

I don't see why this issue is closed: IMHO it is clear that pg-postgres is currently not handling the idle_in_transaction_session_timeout error correctly: i.e. it must be treated as a connectivity error and the "broken" connection must not be reused. I hope we agree on that.

So maybe you can give me some thints on how to get the error-code (and not just the error-text) in the isConnectivityError, I'll try to come up with a PR.

I thought the issue was sorted.

Just so, this library does not handle idle_in_transaction_session_timeout. It is a lower level for node-postgres perhaps.

Well, the specific issue in our app is fixed.

But I think it's just too easy that such an issue can slip into the application code again in the future. And then it would be great to have the idle_in_transaction_session_timeout as a safety-net so that the app can recover automatically. Then the app would still be usable until we find the issue in the code and install the new version.

I'll check node-postgres and their issue-list when I find some time.

This is an unusual issue, and is the first time I ever hear about it.

And if you can detect it for a specific transaction code, you can force the physical connection to be released, using the following synthetic approach:

async function specialTransaction() {
    const obj = await db.connect(); // manually allocating the connection
    let idleTxIssueFound = false;
    let txData;
    try {
            txData = await obj.tx(async t => {
                 // execute the transaction, and detect the idle issue, if possible
                 idleTxIssueFound = true; // if found an issue
            });
    } catch(e) {
      } finally() {
          // passing true into method `done` will force physical release of the connection
           obj.done(idleTxIssueFound);
           return txData;
        }
} 

This is more of a proof of concept though.

After checking some code and issues of node-postgres, I think my initial assumption was wrong. We will never receive error-code 25P03 from the database. Instead we only get the error-string from node-postgres::

query.handleError(new Error('Client has encountered a connection error and is not queryable'), this.connection)

I think whenever this error happens (for whatever reason - not only when idle_in_transaction_session_timeout is used), pg-promise must remove this connection from the pool because it is broken.

Since node-postgres only returns an error string, and no error-code or error-class that can be checken, the only way to handle this case is to check the text of the error-message.

There are many related issues in node-postgres:

and in node-pg-pool:

It's a tough one to tackle. I mean, there is not good fix from this end, by the sound of it.

I can also make the test work without checking the error-string and use client._queryable instead

e.g. change line 59 of function poolConnect from
client.release(kill || client.$connectionError);
to :
client.release(kill || client.$connectionError || !client._queryable);

That is for sure better, but maybe the underscore in the variable name _queryable indicates that this variable is private and should not be used?

This does look like a hack-fix, i.e. should work in the current implementation of the client, but may get broken later, as we use a private variable.

That's what I think too: maybe the node-postgres guys will make this pubilc: see node-postgres #2026

I think a more appropriate request would be to make the pool automatically release non-queryiable clients. Asking to make a private variable into public isn't a good idea in this case, it's just too low-level for what you are trying to achieve with this.

I will think a little more about it, and perhaps will add that hack-fix, for now.

@tmtron Before I make a change, since I cannot reproduce the issue on my side, can you check something for me, please...

When this issue occurs, does the query throw an error? Because if it does, then the best place to amend the code is within isConnectivityError function. And if the query does not throw an error, then I will have to amend the connect method.

since I cannot reproduce the issue on my side

Really? So the reproduction test-case works on your side?
That's strange: the test case can reproduce the issue on my dev-PC and on GitLab CI.

When this issue occurs, does the query throw an error? Because if it does, then the best place to amend the code is within isConnectivityError function.

The first query that is run after the timeout will throw the error string Client has encountered a connection error and is not queryable.
As mentioned in the original post, we could handle this in the isConnectivityError function, but since we only get a string it's quite nasty.

The current version of pg-promise will not recover from this and also the 2nd query in the test will throw the same error-message (because the broken connection, which is not queryable anymore, is reused).

@tmtron Sorry for the late response. I can indeed see the issue from the test you provided.

Here I simplified it, enough to still see the issue.

I'm not sure about the proper fix yet. Will get back to it.

For now, I have created a branch tx-idle-timeout to experiment with this.

@tmtron If you have a suggestion, please do follow up ;)

@vitaly-t Thanks for looking into this. Unfortunately I have no idea why the broken connection is not removed and reused in the next query.
Just note, that this is not urgent for me, since we can live for now with the quick-hacky fix using client._queryable as explained in this comment above.

@tmtron I think I have solved it with this commit, to force killing every connection that's non-queryable.

Can you give it a go to confirm the fix (it is now in the master branch)?

Wow, noticed just now your comment above, as we published almost at the same time :) So it looks like in the end we came to the same conclusion :smile:

I will make a new release shortly.

Released the fix in v10.3.5.

@vitaly-t I've just updated to 10.3.5 and it works for me. All our tests are okay.
I think using the "private" _queryable property of postgres-node is okay for now, since we have a test-case that will fail, if this property will ever change.
And we will keep an eye on node-postgres #2026
Thanks again!

Hi, I upgraded to 10.3.5, and now I did not see error message Client has encountered a connection error and is not queryable any more, however, it starts to throw error: Connection terminated unexpectedly. I'm pretty new to pg-promise. Could anyone help on this issue? Should I open it as a new issue? Thank you very much.

The stacktrace I have in my logs looks like

Error: Connection terminated unexpectedly
    at u.<anonymous> (/var/task/webpack:/node_modules/pg/lib/client.js:252:9)
    at Object.onceWrapper (events.js:299:28)
    at u.emit (events.js:210:5)
    at u.emit (domain.js:476:20)
    at Socket.<anonymous> (/var/task/webpack:/node_modules/pg/lib/connection.js:76:10)
    at Socket.emit (events.js:210:5)
    at Socket.EventEmitter.emit (domain.js:476:20)
    at TCP.<anonymous> (net.js:658:12)

Environmen

  • Version of pg-promise: 10.3.5
  • OS type (Linux/Windows/Mac): Linux
  • Version of Node.js: 12.12.24

@zhanghuancs That sounds like an issue related to keeping a connection open, which might something to do with your hosting environment. This is not related to the original issue here.

@tmtron The underlying driver just took in the recommended change for dumping non-queryable connections.

This means the functionality added here will be removed in a new official release that starts using that branch. That probably will be version 11.

@zhanghuancs That sounds like an issue related to keeping a connection open, which might something to do with your hosting environment. This is not related to the original issue here.

Hi, @vitaly-t Thank you for the response. I'm using AWS Lambda and follow the instruction here to reuse database connection using AWS Lambda. This might be the issue related to keeping a connection open as you said. Is there a way in pgpromise that I can get the open connection and kill this specific connection? Besides, is it possible that this issue is related to this thread, and it's quoted that the connection errors (and query errors, from my findings) should be passed to client.release. If I catch this error, and how can I release the connection? In my API, I just use .task to chain .oneOrNone and .any queries, very similar to the [transaction example] (https://github.com/vitaly-t/pg-promise/blob/master/examples/transaction.js), but I did not execute .finally(db.$pool.end); in the end, as I do not want to exit the app.

Any help would be appreciated! Thanks.

@vitaly-t Hi, I think there is a regression in this item. I'm importing a big amount of data with a complex query from one remote server to another using this method: https://github.com/vitaly-t/pg-promise/wiki/Data-Imports#massive-inserts with, v10.4.4 I'm getting this error: Client has encountered a connection error and is not queryable . When I rollback to the version that you fixed v10.3.5, the query executes without issues

@Gustav0ar Could you, please, re-try, using the latest Beta version?

I am currently updating it, before v10.5.0 release.

@Gustav0ar Could you, please, re-try, using the latest Beta version?

I am currently updating it, before v10.5.0 release.

@vitaly-t Apparently it is working fine with the beta version. I'll update if I find any issues

Was this page helpful?
0 / 5 - 0 ratings

Related issues

calibermind picture calibermind  路  3Comments

cmelone picture cmelone  路  3Comments

dzaman picture dzaman  路  3Comments

cortopy picture cortopy  路  5Comments

ghost picture ghost  路  3Comments