Node-postgres: Fails to error on queries inside a transaction after a lost connection

Created on 15 Sep 2017  Â·  16Comments  Â·  Source: brianc/node-postgres

When we execute a lengthy operation inside a transaction, and the connection is suddenly lost during that time, then any query we execute on the client after that become stuck, i.e. they report nether success no error:

'use strict';

const Pool = require('pg-pool');

const cn = {
    database: 'newone',
    port: 5433,
    user: 'postgres'
};

const p = new Pool(cn);

p.on('error', error => {
    // do nothing
});

p.connect((err, client, release) => {
    if (err) {
        console.log('Failed to connect:', err);
        return;
    }
    const test = async function () {
        try {
            await client.query('BEGIN');
            await client.query('SELECT pg_sleep(10)'); // connection breaks during this one
            await client.query('COMMIT');
            console.log('success!');
        } catch (e) {
            console.log('about to rollback...'); // this is reported
            await client.query('ROLLBACK'); // this one is stuck, never reporting an error, if the error was due to lost connection
            console.log('rollback finished'); // this is never reported, if we are here because of lost connection
        } finally {
            release();
            p.end();
        }
    };

    test();
});

I've tried both pg-pool directly and via node-postgres, versions 6.x and 7.x - all the same.

Expected Behavior

Executing any query against client at that point should immediately report an error that informs us of the lost connectivity.

bug

Most helpful comment

UPDATE

This issue appears to be even more severe, reported against solutions running on a thin WiFi connection, i.e. the issue keeps happening even with small transactions that execute longer due to the slow network.

@charmander @brianc Guys, any chance to see a progress for this some day? :smile: This is a P1 bug :wink:

All 16 comments

Related (and closed for some reason?): #632, #718

One of those old bugs was kind-of more specific, even though the issue was a broad one, and the other one had too much code, not easy to see what the issue is.

My example is much easier to read and debug, and it is 100% reproducible, and quite generic, i.e. it is not that the ROLLBACK fails at that point, it is any query.

Hopefully, it will help us find and fix the issue this time 🤞

@charmander seems like #1322 is yet another dupe. This is one devious bug! 😈

That’s the one I was looking for! Thanks. Good thing it’s open.

Nice! Thanks for reporting this - I'll try to look at this this weekend.

@brianc any luck? Do you need help with it? 😉

Two week-ends out, still nothing?

Sorry for the delay: I broke a bone & haven't been @ the computer much, and when I am at the computer I have to work so I can pay my bills. One thing that would help is adding to your example above some code to manually kill the connection. Right now to simulate a connection drop I have to kill postgres, which doesn't work if I want to run the test in CI or automatically. Would it be possible to create a fully self-contained script that demonstrates the error without requiring manual intervention to kill the connection? That would be super helpful.

One thing that would help is adding to your example above some code to manually kill the connection

To temporarily kill all connections to your test database, execute the following SQL in pgAdmin:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='my-database-name';

You must provide the correct database name in the query, and make sure to execute it from a connection to a different database, or else you will be killing the current pgAdmin connection, which often causes it to crash the UI.


Get well soon! 🤧

gotcha! Thanks! It's a slow road but I'm gettin' better 😄

@brianc Are you still on sick? It's been almost 2 month since the last commit.

(pressed close instead of comment by mistake, as per freaking usual)

UPDATE

This issue appears to be even more severe, reported against solutions running on a thin WiFi connection, i.e. the issue keeps happening even with small transactions that execute longer due to the slow network.

@charmander @brianc Guys, any chance to see a progress for this some day? :smile: This is a P1 bug :wink:

While there is no fix for this, for now I am using the following work-around:

try {
      await client.query('BEGIN');
      await client.query('SELECT pg_sleep(10)'); // connection breaks during this one
      await client.query('COMMIT');
     } catch (e) {
        if(!isConnectivityError(e)) {
          await client.query('ROLLBACK');
       }
    }

And here's code from [pg-promise] of how I check for a connectivity error:

////////////////////////////////////////////
// Identifies a general connectivity error.
function isConnectivityError(err) {
    const code = err && typeof err.code === 'string' && err.code;
    const cls = code && code.substr(0, 2); // Error Class
    return code === 'ECONNRESET' || cls === '08' || cls === '57';
    // Code 'ECONNRESET' - Connectivity issue handled by the driver.
    // Class 08 - Connection Exception.
    // Class 57 - Operator Intervention.
    //
    // ERROR CODES: https://www.postgresql.org/docs/9.6/static/errcodes-appendix.html
}

I can’t reproduce this with your test script; await client.query('ROLLBACK') throws as expected. (Were you testing on a version of Node that shows unhandled promise rejection warnings? There’s no .catch() on test().)

Nevermind, I was testing on the wrong branch – confirming that #1503 does fix this as well.

Confirmed. This now works.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

spollack picture spollack  Â·  4Comments

ClueLessEggHead picture ClueLessEggHead  Â·  3Comments

dindurthy picture dindurthy  Â·  4Comments

v1co1n picture v1co1n  Â·  4Comments

wrod7 picture wrod7  Â·  4Comments