Node-postgres: Drastic difference between pg and pg-native performance inserting bytea data

Created on 1 Jul 2018 · 13Comments · Source: brianc/node-postgres

Schema:

CREATE TABLE test (
    id SERIAL PRIMARY KEY,
    data bytea
);

CREATE UNIQUE INDEX test_pkey ON test(id int4_ops);

test.js:

import fs from 'fs';
import pg from 'pg';

const image = fs.readFileSync('./test.jpg');

const main = async () => {
  const client = new pg.native.Client();

  await client.connect();

  let index = 1000;

  while (index--) {
    await client.query('INSERT INTO "test" (data) VALUES ($1) RETURNING id', [
      image
    ]);
  }

  await client.end();
};

main();

test.jpg

Evaluation:

$ time node ./test.js

Result with pg:

0.88s user
0.39s system
0% cpu
2:37.37 total

Result with pg-native:

4.95s user
0.16s system
19% cpu
26.118 total

That is a huge difference. What is the overhead?

Source

gajus

Most helpful comment

Yes that was my mistake trying this out. Using the the .native approach works fine for the getting promises return values.

Real issue is that libpq expects a string[] for the query parameters and pg-native doesn’t perform any type conversions. It blindly passes the params array to libpq which somehow is interpreted as a zero length string for the parameter value (so it inserts a zero length value). The performance difference is because the native version isn’t actually inserting anything so it’s not a valid comparison, just a bug in native.

Should either be documented as insisting on string[] parameters (and throw an error if not that) or use the same type conversions as the pure js driver.

sehrope on 1 Jul 2018

👍3

All 13 comments

It's likely the additional overhead of copying the large block of memory for the image that you're inserting.

There are two types of async external APIs in node. The first is APIs that are truely async, this includes things like interactions with TCP sockets and the file system. APIs that are not natively async pretend to be async by utilizing an internal thread pool. From the perspective of a running node app the operations happen async but under the covers they're being run in a background thread rather than using real event notification. That covers most external libraries that are integrated with node as they're usually not async.

The pg-native module is built atop the libpq module which exposes the libpq C library to node. It uses the internal thread pool rather that true async operation as that's how the primary API for libpq itself works (i.e. blocking sockets): https://www.npmjs.com/package/libpq#connection-functions

One negative side effect of using the internal thread pool is that it causes extra copying of memory from the shared context of the node app by the thread that will execute in the thread pool. This is to prevent concurrent memory corruption as otherwise the shared buffer could be modified while the thread pool is operating on it. The bigger the data being copied (ex: a large bytea image), the more overhead that copy is going to impose.

FYI, I've noticed a similar drop in performance with the zlib module when performing gzip compression async vs. sync. The async operation uses the internal thread pool and results in an extra memory copy that's noticeable when compressing large Buffers.

I don't think there's much you can do to work around this other than using the pure JS driver. I always stick with the pure JS driver as it works consistently on all platforms and has less dependencies (i.e. no need for libpq). Unless you have another reason to use the native driver I'd suggest switching to the pure JS one.

sehrope on 1 Jul 2018

I don't think there's much you can do to work around this other than using the pure JS driver. I always stick with the pure JS driver as it works consistently on all platforms and has less dependencies (i.e. no need for libpq). Unless you have another reason to use the native driver I'd suggest switching to the pure JS one.

I think you've misread the benchmark results:

Pure JS takes 2 minutes 37 seconds
pg-native takes 26 seconds

gajus on 1 Jul 2018

Oh duh! I was only looking at the "user" line being 5x as much which I bet includes the additional memcpys I was talking about.

What version of node and pg / pg-native are you using?

sehrope on 1 Jul 2018

Oh duh! I was only looking at the "user" line being 5x as much which I bet includes the additional memcpys I was talking about.

Impressive write up nonetheless. :-)

What version of node and pg / pg-native are you using?

gajus on 1 Jul 2018

Impressive write up nonetheless. :-)

Thanks!

Tried to reproduce this and I don't think your pg-native is actually running the commands as the API for pg-native doesn't use promises. Neither the client.connect() or client.query() return a promise. That means the await client.query(...) is awaiting a non-Promise result which would be immediate.

Can you check if it's actually inserting anything in your test table?

sehrope on 1 Jul 2018

Can you check if it's actually inserting anything in your test table?

The records will exist. Just not all of them.

It didn't even occur for me to check whether client.connect() and client.query() returns promises. I just assumed it is compatible with the base API.

@brianc This should be mentioned in https://node-postgres.com/features/native.

gajus on 1 Jul 2018

Actually scratch on the API differences. I was mixing up the underlying libpq module with pg-native...

I do think there's some difference in Buffer handling though. Can you see if any actual data is being inserted in your data column (not just the id column?

My repro may be is incorrect but seems like the native driver is ignoring Buffer types as parameters and send a zero length bytea for them. Encoding them as hex strings and translating it on the DB side has the data appear there and the performance is closer to the driver.

sehrope on 1 Jul 2018

Sorry @brianc. @sehrope keeps throwing me off. client.connect() and client.query() _do_ return promises.

gajus on 1 Jul 2018

My bad on that one. That's what I get for splitting eyeballs between coding and world cup.

sehrope on 1 Jul 2018

My bad on that one. That's what I get for splitting eyeballs between coding and world cup.

No worries.

You are right regarding the data not being written though.

applaudience=> SELECT id, octet_length(data) FROM test;
 id  | octet_length
-----+--------------
 283 |        55680
 284 |        55680
 285 |        55680
 286 |        55680
 287 |        55680
 288 |        55680
 289 |        55680
 290 |        55680
 291 |        55680
 292 |        55680
 273 |           12
 274 |           12
 275 |           12
 276 |           12
 277 |           12
 278 |           12
 279 |           12
 280 |           12
 281 |           12
 282 |           12
(20 rows)

The first 10 rows is written by pg-native and the last 10 are written by pg.

gajus on 1 Jul 2018

If anything, this requires opening a new issue.

pg should convert Buffer to pg-native friendly format.

gajus on 1 Jul 2018

Are you saying you did require('pg-native') instead of require('pg').native?

charmander on 1 Jul 2018

Yes that was my mistake trying this out. Using the the .native approach works fine for the getting promises return values.

Should either be documented as insisting on string[] parameters (and throw an error if not that) or use the same type conversions as the pure js driver.

sehrope on 1 Jul 2018

👍3

Was this page helpful?

0 / 5 - 0 ratings