Loopback: Bulk Insert/Create

Created on 3 Apr 2015  Â·  69Comments  Â·  Source: strongloop/loopback

It'd be good if Loopback supported inserting many records at once. Many data sources would be able to insert records into their datasets far more efficiently. e.g. SQL databases can import all the records at once

Efficiency improvements could also be realized in validations. For example, where a model checks for uniqueness, all the instances could be checked at once rather than individually.

feature major p1 team-apex

Most helpful comment

Please stop posting +1 comments, they only spam the discussion. Click on the big yellow "thumbs up" button below the issue description instead, it will also allow us to filter issues by popularity.

You can also click on "Subscribe" button in the right menu-bar if you would like to be notified about updates. (I am not sure if up-voting an issue subscribes to notifications too.)

All 69 comments

+1

+1

+1

Il giorno mar 19 mag 2015 21:21 Vlad Miller [email protected] ha
scritto:

+1

—
Reply to this email directly or view it on GitHub
https://github.com/strongloop/loopback/issues/1275#issuecomment-103640495
.

+1

+1

+1

I'm implementing a mixin #1435 that inserts lft and rgt values (a tree table) based on others records in database. When I try insert several records at once, for example, suppose that the database is empty:

Category.create([
    {name: 'category 1'},
    {name: 'category 2'}], function (err, categories) {
    console.log(categories);
});

The expected values for lft and rgt values are

[
  { name: 'category 1', lft: 1, rgt: 2, id: 1 },
  { name: 'category 2', lft: 3, rgt: 4, id: 2 }
]

but the before save event is triggered 2 times. The second time is for {name:'category2'} and in this moment the category 1 is not in database yet and I don't know that the category 1 is going to be inserted. So I can't calculate the right lft and rgt values. So what I got is

[
  { name: 'category 1', lft: 1, rgt: 2, id: 1 },
  { name: 'category 2', lft: 1, rgt: 2, id: 2 }
]

I can see several ways to solve this problems. I'm listing them below:

  • The before save hook to be called only 1 time with a array of instances
  • When the before save is called for the second time, the first record is already in database
  • Everything remains as it is, but pass more one property in ctx with all instances to be inserted, so I will know what is happening.

OBS: If there is something wrong with my English, please correct me.

+1

+1

+1

+1

+1
I tried inserting 999 records with a medium sized document and looping for 100 times inside to insert related documents. It took more than 30 minutes to accomplish.

+1

+1

+1

+1

+1

+1

+1

+1

:+1:

+1

+1

+1

+1

+1

+1

+1

bulk create is possible by passing the array of json models as shown below,

var corporatePark1 = {"name": "one", "address":"mumbai"};
var corporatePark2 = {"name": "two", "address":"pune"};
app.models.CorporatePark.create(
[corporatePark1, corporatePark2]
, function (err, createdModel, created) {
if (err) {
console.error('error creating Model', err);
} else {
console.log("successfully created:" + createdModel);
}

@princecharmx loopback iterates each instance in the array and persist them into database one by one.

oh so @billinghamj was pointing at the behind the scene processing, gottcha :)

Also specifically when you're using the REST API, it's very inefficient to open so many connections.

+1

+1

+1

+1

+1

+1

+1

+1

@pantaluna Let's move our discussion here:
@bajtos suggests to use Promise.all() in https://github.com/strongloop/loopback/issues/2164#issuecomment-215645590
but when I run it I get this error:

jannyHous-MacBook-Pro:testPrimise jannyhou$ node .
/Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-datasource-juggler/lib/dao.js:206
  var connectionPromise = stillConnecting(this.getDataSource(), this, arguments);
                                               ^

TypeError: this.getDataSource is not a function
    at DataAccessObject.create (/Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-datasource-juggler/lib/dao.js:206:48)
    at Array.map (native)
    at Object.module.exports.app [as func] (/Users/jannyhou/lb/triage/testPrimise/server/boot/sample.js:22:20)
    at /Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/lib/executor.js:303:9
    at iterate (/Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/node_modules/async/lib/async.js:146:13)
    at /Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/node_modules/async/lib/async.js:157:25
    at /Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/lib/executor.js:305:7
    at iterate (/Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/node_modules/async/lib/async.js:146:13)
    at Object.async.eachSeries (/Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/node_modules/async/lib/async.js:162:9)
    at runScripts (/Users/jannyhou/lb/triage/testPrimise/node_modules/loopback-boot/lib/executor.js:293:9)

The whole sample script:

module.exports = function(app) {
  "use strict";
  let Job = app.models.myuser;

  let jobs = [{
   "username": "me",
   "password": "hah",
   "email": "[email protected]"
  }, {
   "username": "mee",
   "password": "hah",
   "email": "[email protected]"
  }];

  //let Job = app.models.Job;
  let jobsP = Job.create(jobs);

  // jobsP.then(jobs => {
  //   console.log('num jobs' + jobs.length);
  // });

  Promise.all(jobs.map(Job.create)).then(jobs => {
    console.log('num jobs' + jobs.length);
  });
};

I will take a look then.

@pantaluna I get Promise.all() work in a very tedious way:

var mydata = jobs.map(function(job) {
    Job.create(job).then(function(job){ return job });
 });
Promise.all(mydata).then(jobs => {
   console.log('num jobs' + jobs.length);
});

@jannyHou Nice. But the actions are not executed "in parallel" in your example (I think - note I'm not an expert on Bluebird).

The Bluebird .mapSeries(), which I'm currently using in my project, is more compact and also works fine as shown in my sandbox project but I was looking for a "parallel exec" solution:

    Promise.mapSeries(inputDataSet, function (data) {
        return cl.create(data);
      })
      .then(results => {
         console.info('num results:' + results.length);
      })
      .catch(function (error) {
        console.error('ERROR Promise.mapSeries cl.create() FAILED.');
      });

It still does not explain the original issue.

+1

Just to simplify a bit @jannyHou answer code, it's also possible to do it in this way:

Promise.all(
  jobs.map((job) => Job.create(job))
).then(jobsCreated => {
  console.log(jobsCreated)
})

+1

+1

+1

+1

+1

+1

Please stop posting +1 comments, they only spam the discussion. Click on the big yellow "thumbs up" button below the issue description instead, it will also allow us to filter issues by popularity.

You can also click on "Subscribe" button in the right menu-bar if you would like to be notified about updates. (I am not sure if up-voting an issue subscribes to notifications too.)

Or even better, would someone like to submit a PR instead since there are quite a bit of +1's here.

Any progress on this ?

I want to do something like this:

Task.files.create(
                    { id: task.id },
                    filesObj,
                    function(response) { /* success */

Bulk importing of items is a really important feature, IMO. You can already pass in a JSON collection to the POST method on a PersistedModel, so there shouldn't be any interface changes.

I just checked, it seems to work rightnow.

POST https://api.whatever.ch/students
[{
  "name": "refki"
}, {
  "name": "xhevdahire"
}, {
  "name": "peter fischli"
}]

and as a result, something like this is retrieved:

[
{ "id": "asdfasdfasdfsdf", "name": "refki" },
{ "id": "some_id", "name": "xhevdahire" },
{ "id": "some_id", "name": "peter fischli" }
]

So it seems to work. Maybe this issue should be closed, and docs updated, otherwhise it leads to confusion. I luckily tested it, I already prepared to start implementing an own remote method to cover this.

@iliraga which version of loopback you're using? @superkhau can this issue be closed if it's already merged?

@ahmetcetin Selam, this is what i have in my package.json:

{
    "loopback": "^2.22.0",
    "loopback-boot": "^2.6.5",
    "loopback-component-explorer": "^2.4.0",
    "loopback-connector-mongodb": "^1.15.2",
    "loopback-datasource-juggler": "^2.39.0"
}

So loopback version 2.22.0.

@iliraga tesekkurler, i'll give at a shot, i should update my loopback finally, this will be a good reason to update.

@iliraga If you read the thread, this is about the under the good, when you pass an array of entities loopback iterates and creates them one by one instead of doing a bulk update. So the problem still remains.

We'll be deferring this to the next major version of LoopBack, which will support batch operations.

@kjdelisle Is there a roadmap we could check to see when that is planned to be released?

It's still very much in the planning phase as far as I'm aware. @raymondfeng Do we have a rough timeline for 4.x?

Would also love to see a roadmap of upcoming and future features/fixes

+1

Cross-posting https://github.com/strongloop/loopback/issues/2164#issuecomment-296859742

@BoLaMN commented 4 days ago
I believe this should be handled at the connector as most drivers allow for inserting an array in one database call but instead the dao layer does one by one

FYI: https://github.com/strongloop/loopback-datasource-juggler/pull/1380 modified create() function to return Promise in batch mode too.

is batch update currently possible in v4 ?

+1

Was this page helpful?
0 / 5 - 0 ratings