Schema:
CREATE TABLE test (
id SERIAL PRIMARY KEY,
data bytea
);
CREATE UNIQUE INDEX test_pkey ON test(id int4_ops);
test.js:
import fs from 'fs';
import pg from 'pg';
const image = fs.readFileSync('./test.jpg');
const main = async () => {
const client = new pg.native.Client();
await client.connect();
let index = 1000;
while (index--) {
await client.query('INSERT INTO "test" (data) VALUES ($1) RETURNING id', [
image
]);
}
await client.end();
};
main();
Evaluation:
$ time node ./test.js
Result with pg:
0.88s user
0.39s system
0% cpu
2:37.37 total
Result with pg-native:
4.95s user
0.16s system
19% cpu
26.118 total
That is a huge difference. What is the overhead?
It's likely the additional overhead of copying the large block of memory for the image that you're inserting.
There are two types of async external APIs in node. The first is APIs that are truely async, this includes things like interactions with TCP sockets and the file system. APIs that are not natively async pretend to be async by utilizing an internal thread pool. From the perspective of a running node app the operations happen async but under the covers they're being run in a background thread rather than using real event notification. That covers most external libraries that are integrated with node as they're usually not async.
The pg-native module is built atop the libpq module which exposes the libpq C library to node. It uses the internal thread pool rather that true async operation as that's how the primary API for libpq itself works (i.e. blocking sockets): https://www.npmjs.com/package/libpq#connection-functions
One negative side effect of using the internal thread pool is that it causes extra copying of memory from the shared context of the node app by the thread that will execute in the thread pool. This is to prevent concurrent memory corruption as otherwise the shared buffer could be modified while the thread pool is operating on it. The bigger the data being copied (ex: a large bytea image), the more overhead that copy is going to impose.
FYI, I've noticed a similar drop in performance with the zlib module when performing gzip compression async vs. sync. The async operation uses the internal thread pool and results in an extra memory copy that's noticeable when compressing large Buffers.
I don't think there's much you can do to work around this other than using the pure JS driver. I always stick with the pure JS driver as it works consistently on all platforms and has less dependencies (i.e. no need for libpq). Unless you have another reason to use the native driver I'd suggest switching to the pure JS one.
I don't think there's much you can do to work around this other than using the pure JS driver. I always stick with the pure JS driver as it works consistently on all platforms and has less dependencies (i.e. no need for libpq). Unless you have another reason to use the native driver I'd suggest switching to the pure JS one.
I think you've misread the benchmark results:
pg-native takes 26 secondsOh duh! I was only looking at the "user" line being 5x as much which I bet includes the additional memcpys I was talking about.
What version of node and pg / pg-native are you using?
Oh duh! I was only looking at the "user" line being 5x as much which I bet includes the additional memcpys I was talking about.
Impressive write up nonetheless. :-)
What version of node and pg / pg-native are you using?
Impressive write up nonetheless. :-)
Thanks!
Tried to reproduce this and I don't think your pg-native is actually running the commands as the API for pg-native doesn't use promises. Neither the client.connect() or client.query() return a promise. That means the await client.query(...) is awaiting a non-Promise result which would be immediate.
Can you check if it's actually inserting anything in your test table?
Can you check if it's actually inserting anything in your test table?
The records will exist. Just not all of them.
It didn't even occur for me to check whether client.connect() and client.query() returns promises. I just assumed it is compatible with the base API.
@brianc This should be mentioned in https://node-postgres.com/features/native.
Actually scratch on the API differences. I was mixing up the underlying libpq module with pg-native...
I do think there's some difference in Buffer handling though. Can you see if any actual data is being inserted in your data column (not just the id column?
My repro may be is incorrect but seems like the native driver is ignoring Buffer types as parameters and send a zero length bytea for them. Encoding them as hex strings and translating it on the DB side has the data appear there and the performance is closer to the driver.
Sorry @brianc. @sehrope keeps throwing me off. client.connect() and client.query() _do_ return promises.
My bad on that one. That's what I get for splitting eyeballs between coding and world cup.
My bad on that one. That's what I get for splitting eyeballs between coding and world cup.
No worries.
You are right regarding the data not being written though.
applaudience=> SELECT id, octet_length(data) FROM test;
id | octet_length
-----+--------------
283 | 55680
284 | 55680
285 | 55680
286 | 55680
287 | 55680
288 | 55680
289 | 55680
290 | 55680
291 | 55680
292 | 55680
273 | 12
274 | 12
275 | 12
276 | 12
277 | 12
278 | 12
279 | 12
280 | 12
281 | 12
282 | 12
(20 rows)
The first 10 rows is written by pg-native and the last 10 are written by pg.
If anything, this requires opening a new issue.
pg should convert Buffer to pg-native friendly format.
Are you saying you did require('pg-native') instead of require('pg').native?
Yes that was my mistake trying this out. Using the the .native approach works fine for the getting promises return values.
Real issue is that libpq expects a string[] for the query parameters and pg-native doesn鈥檛 perform any type conversions. It blindly passes the params array to libpq which somehow is interpreted as a zero length string for the parameter value (so it inserts a zero length value). The performance difference is because the native version isn鈥檛 actually inserting anything so it鈥檚 not a valid comparison, just a bug in native.
Should either be documented as insisting on string[] parameters (and throw an error if not that) or use the same type conversions as the pure js driver.
Most helpful comment
Yes that was my mistake trying this out. Using the the .native approach works fine for the getting promises return values.
Real issue is that libpq expects a string[] for the query parameters and pg-native doesn鈥檛 perform any type conversions. It blindly passes the params array to libpq which somehow is interpreted as a zero length string for the parameter value (so it inserts a zero length value). The performance difference is because the native version isn鈥檛 actually inserting anything so it鈥檚 not a valid comparison, just a bug in native.
Should either be documented as insisting on string[] parameters (and throw an error if not that) or use the same type conversions as the pure js driver.