When we execute a lengthy operation inside a transaction, and the connection is suddenly lost during that time, then any query we execute on the client after that become stuck, i.e. they report nether success no error:
'use strict';
const Pool = require('pg-pool');
const cn = {
database: 'newone',
port: 5433,
user: 'postgres'
};
const p = new Pool(cn);
p.on('error', error => {
// do nothing
});
p.connect((err, client, release) => {
if (err) {
console.log('Failed to connect:', err);
return;
}
const test = async function () {
try {
await client.query('BEGIN');
await client.query('SELECT pg_sleep(10)'); // connection breaks during this one
await client.query('COMMIT');
console.log('success!');
} catch (e) {
console.log('about to rollback...'); // this is reported
await client.query('ROLLBACK'); // this one is stuck, never reporting an error, if the error was due to lost connection
console.log('rollback finished'); // this is never reported, if we are here because of lost connection
} finally {
release();
p.end();
}
};
test();
});
I've tried both pg-pool directly and via node-postgres, versions 6.x and 7.x - all the same.
Expected Behavior
Executing any query against client at that point should immediately report an error that informs us of the lost connectivity.
Related (and closed for some reason?): #632, #718
One of those old bugs was kind-of more specific, even though the issue was a broad one, and the other one had too much code, not easy to see what the issue is.
My example is much easier to read and debug, and it is 100% reproducible, and quite generic, i.e. it is not that the ROLLBACK fails at that point, it is any query.
Hopefully, it will help us find and fix the issue this time 🤞
@charmander seems like #1322 is yet another dupe. This is one devious bug! 😈
That’s the one I was looking for! Thanks. Good thing it’s open.
Nice! Thanks for reporting this - I'll try to look at this this weekend.
@brianc any luck? Do you need help with it? 😉
Two week-ends out, still nothing?
Sorry for the delay: I broke a bone & haven't been @ the computer much, and when I am at the computer I have to work so I can pay my bills. One thing that would help is adding to your example above some code to manually kill the connection. Right now to simulate a connection drop I have to kill postgres, which doesn't work if I want to run the test in CI or automatically. Would it be possible to create a fully self-contained script that demonstrates the error without requiring manual intervention to kill the connection? That would be super helpful.
One thing that would help is adding to your example above some code to manually kill the connection
To temporarily kill all connections to your test database, execute the following SQL in pgAdmin:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='my-database-name';
You must provide the correct database name in the query, and make sure to execute it from a connection to a different database, or else you will be killing the current pgAdmin connection, which often causes it to crash the UI.
Get well soon! 🤧
gotcha! Thanks! It's a slow road but I'm gettin' better 😄
@brianc Are you still on sick? It's been almost 2 month since the last commit.
(pressed close instead of comment by mistake, as per freaking usual)
UPDATE
This issue appears to be even more severe, reported against solutions running on a thin WiFi connection, i.e. the issue keeps happening even with small transactions that execute longer due to the slow network.
@charmander @brianc Guys, any chance to see a progress for this some day? :smile: This is a P1 bug :wink:
While there is no fix for this, for now I am using the following work-around:
try {
await client.query('BEGIN');
await client.query('SELECT pg_sleep(10)'); // connection breaks during this one
await client.query('COMMIT');
} catch (e) {
if(!isConnectivityError(e)) {
await client.query('ROLLBACK');
}
}
And here's code from [pg-promise] of how I check for a connectivity error:
////////////////////////////////////////////
// Identifies a general connectivity error.
function isConnectivityError(err) {
const code = err && typeof err.code === 'string' && err.code;
const cls = code && code.substr(0, 2); // Error Class
return code === 'ECONNRESET' || cls === '08' || cls === '57';
// Code 'ECONNRESET' - Connectivity issue handled by the driver.
// Class 08 - Connection Exception.
// Class 57 - Operator Intervention.
//
// ERROR CODES: https://www.postgresql.org/docs/9.6/static/errcodes-appendix.html
}
I can’t reproduce this with your test script; await client.query('ROLLBACK') throws as expected. (Were you testing on a version of Node that shows unhandled promise rejection warnings? There’s no .catch() on test().)
Nevermind, I was testing on the wrong branch – confirming that #1503 does fix this as well.
Confirmed. This now works.
Most helpful comment
UPDATE
This issue appears to be even more severe, reported against solutions running on a thin WiFi connection, i.e. the issue keeps happening even with small transactions that execute longer due to the slow network.
@charmander @brianc Guys, any chance to see a progress for this some day? :smile: This is a P1 bug :wink: