Postgrest: Large memory usage for bulk inserts

Created on 8 Aug 2016 · 5Comments · Source: PostgREST/postgrest

I measured that PostgREST 0.3.2.0 uses memory proportional to 20x the size of a JSON payload during bulk inserts. This doesn't seem reasonable.

QOS enhancement perf

Source

begriffs

👍2

Most helpful comment

I have a proposal for avoiding the json payload processing in INSERT and clearing this out, I'll use the articles table as a reference.

Currently when receiving a payload like [{"id": 11, "body": "uno"}, {"id": 12, "body": "dos"}] we process this in Haskell to check if the json is an array/object to adequately use pg json_populate_recordset/json_populate_record functions.

This part can be done in pure SQL:

WITH payload AS(
-- This json would be replaced by $1
  SELECT '[{"id": 11, "body": "uno"}, {"id": 12, "body": "dos"}]'::json AS json_data 
),
vals AS(
  -- With this we make sure that the query works for json arrays or a single json object
  SELECT json_data AS val FROM payload WHERE json_typeof(json_data) = 'array'
  UNION ALL
  SELECT json_build_array(json_data) AS val FROM payload WHERE json_typeof(json_data) = 'object'
)
INSERT INTO articles(id, body) SELECT id, body from json_populate_recordset(null::articles, (select val from vals));

Now the other thing that the Haskell processing does is making sure the json has the same columns, not sure but I'm guessing the original purpose of this was to somehow protect against unintentional null values in a bulk insert? Anyway I think this would be better handled by the db not null contraints.

However this part also handles the <cols> specifying in ÌNSERT INTO table(<cols>) SELECT <cols> FROM ... and that cannot be dynamically specified in a SQL statement, this is also necessary for handling column privileges.

So, my proposal is to have an additional INSERT mode that will be triggered when the columns querystring parameter is specified like this:

POST /articles?columns=id,body
[{"id": 11, "body": "uno"}, {"id": 12, "body": "dos"}]

If there are additional columns in the json payload they will be ignored, this mode will guarantee minimum memory usage for big payloads from pgrst end.

For anyone that's interested in this being implemented, you can sponsor the development on Patreon or you can contact me directly(email in profile).

steve-chavez on 3 Apr 2018

👍3

All 5 comments

Note to self: try using weigh in tests to measure the space used.

begriffs on 15 Aug 2016

Is this still a thing?
I remember it being a thing in #360 but that thread says it was fixed in 3.0

ruslantalpa on 3 Aug 2017

I'm seeing similar problems with CSV uploads as well.

maximsch2 on 27 Sep 2017

@maximsch2 would you be interested in helping us track down the problem? The issue that Ruslan linked to includes information about how to run the binary with profiling enabled. If you're feeling brave you could do a similar thing and include a graph of memory allocation.

begriffs on 30 Sep 2017