Hi, there is a problem i can't resolve.
I'm trying to insert data from text file into Postgres Db.
The description and code are here link.
The point is when file contains about 50 000 lines the code works fine, but when file contains more than 1000 000 lines the problem appears.
I made some research with debugger and found out that queries are queued and executed only when the file is read, so when the file is large the memory is overflowed.
So is there a way to force inserts in queue to be executed each let's say 10 000 lines ?
I found the answer (or workaround) and posted it on stackoverflow (link above)
We must watch PgClient.queryQueue and when there are a lot of elements in it, we pause file reading.
And here the code of changed section
lineReader.on('line', function(line) {
n++;
PgClient.query('INSERT INTO HUMANS (firstname,lastname) values ($1,$2)', [insert.firstname,insert.lastname],function (err,result){
if (err) console.log(err);
if (PgClient.queryQueue.length>15000) {
lineReader.pause();
}
else lineReader.resume();
});
});
I think you should be finding another solution, on your end. For example, why not put everything in a single query, (or a couple of queries). You don't seem to need multiple queries.
INSERT INTO HUMANS (firstname, lastname) VALUES
($1::text, $2::text), ($3::text, $4::text)
This you could do by building the query string with some append.
I am not sure this will work, but it is worth trying. Roundtrips are expensive. And the data needs to be transferred any way.
I wrote code using node transform stream and reader stream. Transform stream reads file, passing JSON in object mode to reader which can write to DB then.
Perhaps you should use something more specialised, like https://github.com/brianc/node-pg-copy-streams. However, why not use a proper tool for the job. https://github.com/lukasmartinelli/pgfutter works amazingly well with CSV (we load millions of rows with it), and you can easily use it from your Node.js app via spawn or exec