Is there a workaround for this for now?
Use underlying mongodb driver is the only workaround right now unfortunately
It would look like Model.collection.initializeOrderedBulkOp();
right? Having some trouble getting it to cooperate...
Yeah, there's a somewhat nasty corner case there - initializeOrderedBulkOp()
is a sync operation, but mongoose queues operations while its connecting so you can do operations without waiting for it to connect. The queueing assumes async though, so you don't get the return value of initializeOrderedBulkOp()
, hence why this commit has a db.on('connected')
in the test. Other than that, I think it should work - what specific problems are you having?
ahh I see, that should fix it, thanks :+1:
+1 for this feature
+1 for this feature
+1 for this feature
Bulk insert on native mongo driver seems to be buggy, confirmed getting the RangeError: Maximum Call Stack Size Exceeded
, see: http://stackoverflow.com/questions/24466366/mongoose-rangeerror-maximum-call-stack-size-exceeded
+1 for this feature
+1
+1
Since initializeUnorderedBulkOp
and its ilk (the fluent bulk API) are deprecated, this isn't going to happen. More likely we're just going to implement the new CRUD API, which has more sane support for bulk inserts.
I am curious what is the status of this.
I was planning to use initializeUnorderedBulkOp
for upserts but if Mongo might remove it later, I would prefer to use a future-proof method, such as updateMany
with {upsert: true}
.
Is there any ETA on implementing the CRUD API?
https://github.com/Automattic/mongoose/issues/3997 for updateOne and updateMany. We added insertMany()
in 4.4.0. No real ETA yet for 4.6, right now focusing on 4.5. What's your use case for updateOne/updateMany?
Thanks. Our use case is that we are calculating regular statistics for a large number of users, so we are doing it in batches. Each user gets a slightly different result. So I was thinking of moving to:
// Calculate statistics for 1000 users
for (stat of stats) {
bulk.findOne({_id: stat.userId}).upsert().updateOne(stat);
}
bulk.execute(...);
In theory the calculation should be the heavy bit, but in practice it seems the writes dominate!
If there is a danger that Mongo might remove initializeUnorderedBulkOp
then I'll probably just stick with a slower method (1000 individual upserts) for now. But I could also pull this stuff into a reusable function, with a fallback if initializeUnorderedBulkOp
is unavailable, and an upgrade to upsertMany()
when it becomes available.
We could also consider keeping stats up-to-date in realtime, rather than performing scheduled calculations. Whether that is more or less efficient really depends on how many minor updates it would generate.
Yeah that's a pretty reasonable use case. Will implement.
It worked when i did this
Model.update({ query }, { update object }, { multi: true });
@ManeeshCh yeah multi update works well, the bulk API is more about sending multiple distinct query, update pairs in a single command to save network bandwidth.
@vkarpov15 I have a use case that's very similar to the one described by @joeytwiddle. I am calculating stats on a large number of records and updating them in place one at a time (see code below). I've found that the network traffic between my server and the database is the biggest bottleneck in this operation, and therefore want to try running these updates in bulk with a single request.
slow code:
// doc.stats object calculated elsewhere
async.each(docs, (doc, cb) => {
const updateStats = { stats: doc.stats };
Model.findByIdAndUpdate(doc._id, updateStats, cb);
}, done);
I see that updateMany
was just added as part of #3997. But I don't understand how to perform a similar operation with it? It seems like it can only apply the same update to multiple docs? Is there a way update different different documents with different values?
What am I missing?
Yeah updateMany
only applies the same update to multiple docs. Mongoose doesn't have a good approach to do what you're trying to do right now, but you can use the mongodb driver's bulkWrite
function: http://mongodb.github.io/node-mongodb-native/2.2/api/Collection.html#bulkWrite
Model.collection.bulkWrite([{ updateOne: { filter: { _id: doc._id }, update: { $set: { stats: doc.stats } } }]);
@vkarpov15 since last year, has there been any news on how the best perform bulk upserts?
My use case is that I need to processes potentially large numbers of logs at once, while checking if a similar log doesn't already exist according to certain criteria, and then update instead. I don't want to upsert them one by one, so I've been using bulk operations directly on the underlying collection.
There are many drawbacks to approach this however:
1) It bypasses Mongoose's schema validation/defaults, causing Date fields and ObjectID's for example to be stored as plain strings, requiring a lot of manual pre-processing and massaging of the data which Mongoose normally takes care of.
2) It doesn't depopulate fields for you if you are passing populated data for refs.
3) The API doesn't return promises, so you have to wrap it in your own promises.
4) In this thread I am reading about the bulk API possibly being deprecated, so that does not bode well when you're trying to write future proof implementations.
So from what I understand, insertMany
uses bulk operations for inserting many docs, yet judging by this discussion and the docs, updateMany
does not, and merely sets the {multi: true}
option for you.
Is there some kind of upsertMany
implementation, or could it be developed in Mongoose, to support bulk upserts _with_ schema validation?
My current implementation looks as follows:
/**
* Bulk upsert operation
*/
LogSchema.statics.upsertMany = async function(logs, matchFields) {
//Create bulk operation
const bulk = this.collection.initializeUnorderedBulkOp();
logs
.map(log => new this(log))
.map(log => log.toObject({depopulate: true}))
.forEach(log => {
//Extract match criteria
const match = {};
for (const field of matchFields) {
match[field] = log[field];
}
//Create upsert
bulk
.find(match)
.upsert()
.replaceOne(log);
});
//Execute bulk operation
await new Promise((resolve, reject) => {
bulk.execute((error, result) => {
if (error) {
return reject(error);
}
resolve(result;
});
});
};
I added some code that converts the logs to mongoose models first, and then back to objects, but I imagine that will not be a very efficient way of doing it and might lead to issues where mongoose creates _id's for me which will then clash in the bulk write op.
Any thoughts?
I've poured this into a generic plugin for now, but hope it can make it's way into an official Mongoose implementation.
@adamreisnz this issue was specifically related to the initializeUnorderedBulkOp()
and initializeOrderedBulkOp()
API, which is "soft deprecated" by mongodb. It has been replaced by the bulkWrite()
API, which is part of MongoDB's core CRUD spec and thus is much more widely implemented. In particular, mongoose has a Model.bulkWrite()
function as of 4.9.0 that has validation, casting, promises, and ref depopulating. Let me know if that works for you.
Thanks I will look into it. I must have missed it. Does it support upserts?
@adamreisnz yes, see mongodb driver docs for an example
Correct link to Mongoose docs has been changed to https://mongoosejs.com/docs/api.html#model_Model.bulkWrite
Most helpful comment
Correct link to Mongoose docs has been changed to https://mongoosejs.com/docs/api.html#model_Model.bulkWrite