Some documentation suggests doing e.g. a findOne()
followed by modifications, and then a .save()
. Isn't this recommendation prone to breaking the atomicity that MongoDB otherwise brings via the update operations? Could it not mean e.g. two REST operations to read/update some document could give incorrect end results?
Am I missing something, or should the Mongoose docs not recommend this?
user.save()
will insert a new document if it's a new document, or update it if it's already on the database.
Yes, it is the update case I am talking about. When doing a const doc = new Document
followed by doc.save()
I find no problem. It is the query->update->save that is a problem. There are multiple locations throughout all documentation that shows example of the query->update->save way of handling this.
Why do you think that this is bad? You may need data from the document to determine the update value.
Maybe I am missing something, but isn't that a classical race condition issue?
findOne()
to get document X.findOne()
to get the same document X. Both A and B hold the same exact same document as the result of the identical queries. This is entirely possible in a REST API situation with many users of the API..save()
. Depending on if flow A or flow B finishes first, the final document stored in the database will look different.The proper way to avoid this would be a transaction I assume. Still, the examples in the docs showing the query->update->save works _isolated_ but not in a production context.
There can be race conditions, but it has nothing to do with .save()
, it can happen with updateOne(...)
as well because save()
uses updateOne(...)
internally.
Document#save()
does not send the whole document after modifying it to the database. Instead, mongoose tracks the changes and sends those changes as an updateOne(...)
to Mongo.
Run mongoose.set('debug', true);
to see what gets sent to MongoDB.
There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.
I do not understand this sentence. How could it happen if I use only updateOne()
directly? In that instance the operation is atomic.
Thus it can, from my understanding, _not_ happen if using updateOne()
only.
The thing I am after is that only this flow is a potential race condition: query -> modify doc -> save()
. That flow is not atomic. Using updateOne()
is the better solution since it is atomic.
Am I missing something completely?
The next two snippets do exactly the same thing.
const userSchema = new Schema({
name: String,
paidAmount: Number,
isPremium: { type: Boolean, default: false }
});
const User = mongoose.model('User', userSchema);
await User.create({ name: 'Hafez', paidAmount: 500, isPremium: true });
// later
const user = await User.findOne();
if (user.isPremium) {
user.paidAmount += 100;
} else {
user.paidAmount += 50;
}
mongoose.set('debug', true);
// calls updateOne internally
await user.save();
users.updateOne({ _id: ObjectId("5ebe4a0a93864a5858c511c8") }, { '$set': { paidAmount: 600 } })
const user = await User.findOne();
const update = {};
if (user.isPremium) {
update.paidAmount = user.paidAmount + 100;
} else {
update.paidAmount = user.paidAmount + 50;
}
mongoose.set('debug', true);
await User.updateOne({ _id: user._id }, update);
users.updateOne({ _id: ObjectId("5ebe4a66209f4449b0ab55da") }, { '$set': { paidAmount: 600 } })
Yes, in that case. But not if you just plainly update a field (without using previous values). And in the second example, I assume you could achieve that with this: https://stackoverflow.com/a/56551655
Thus, making it atomic.
The documentation should thus preferably not recommend query->update->save() because it certainly has a high chance of having users run into race conditions that are very hard to discover.
I agree that in cases where we don't depend on data in the current document, and the example is not related to .save()
, we should use updateOne
to inform people that they can do that, it definitely has better performance.
As for the SO question regarding update aggregation pipelines, it's added recently to MongoDB, so we can't add it in the docs, let alone how verbose it is, most users will find it confusing.
Yet, I still don't see how query=>update=>save
can cause bugs, other from the performance side of it, I challenge you to come up with a script that demonstrates a bug with Document#save()
but not with Model#updateOne()
.
Document#save()
literally is Model#updateOne();
.
const mongoose = require('mongoose');
const db = mongoose.createConnection('mongodb://127.0.0.1:27017/database1', {
useNewUrlParser: true,
});
const test = new mongoose.Schema({
something: String,
});
const Test = db.model('Test', test);
(async () => {
await db.dropDatabase();
await Test.create({
something: 'initiated',
});
// Two different API endpoints where both users queries something at the same time
const test1 = await Test.findOne({});
let test2 = await Test.findOne({});
// User 1's API request updates something
test1.something = 'first update';
await test1.save();
// User 2's API request updates something, but it's for another route, where we do a check
if (test2.something === 'initiated') test2.something = 'second update';
await test2.save();
console.log(await Test.findOne({}));
// output: { _id: 5ec2b99d4768207ca9b6c34c, something: 'second update', __v: 0 }
await db.dropDatabase();
await Test.create({
something: 'initiated',
});
// We do the same thing again, but this time the first API endpoint is updated to
// use updateOne instead of query->update->save()
// Two different API endpoints where both users queries something at the same time
await Test.updateOne({ something: 'first update' }); // this update is now atomic
test2 = await Test.findOne({});
// User 2's API request updates something, but it's for another route, where we do a check
if (test2.something === 'initiated') test2.something = 'second update';
await test2.save();
console.log(await Test.findOne({}));
// output: { _id: 5ec2b9d202eb247d5dd2be50, something: 'first update', __v: 0 }
})();
Note that this is a contrived example, and it would have been much better for user 2 to also use updateOne()
as to not get into the race condition where user 2 does a findOne()
before user 1's updateOne()
.
In a distributed system, query->update->save causes hard-to-catch bugs.
In the second case, move test2 = await Test.findOne();
one line above, or move line 22=>27, then both cases would be equivalent, and will yield the same results,
@AbdelrahmanHafez The code should be read as the calls to the DB are happening in parallel, i.e. you cannot read the code literally. It should simulate two API endpoints accessing the same database. So it cannot be "moved" in a real life example as you state.
Regarding moving from 22=>27 I also specifically state in the end that it can be solved as well. But that is of smaller matter.
What the above proves is that query->modify->save does not work in a distributed, parallel system where API calls can come in any order.
@thernstig I know you can't read the code literally, I was demonstrating how updateOne/save can both cause a bug, and both can work correctly. This is not an issue with updateOne
vs save
.
Admittedly, though, updateOne
reduces (but doesn't eliminate) the risk of that race condition because users don't need data, therefore there's less chance of the second findOne
happening after that update.
Thoughts @vkarpov15?
I am not sure I agree, I feel there is an issue with updateOne
vs save
as demonstrated. https://github.com/Automattic/mongoose/issues/9001#issuecomment-631093037 was taking incorrect assumptions that code could be moved around in a way that is not possible when parallel requests come in.
updateOne
can reduce the risk of race conditions if you just write it properly in certain cases. I do not think this is an issue of covering all cases, but _most_ cases. Isn't that what is important, to give good guidance to users that _reduces_ the risk? Using updateOne
always instead would thus minimize a very real risk in larger, concurrent API request scenarios.
I'm not sure I have more value to add to this thread, I believe I have demonstrated how a problem can be minimized.
Do not take this for not being very happy with mongoose, I just started this thread to try to help users in something I believe is a hard-to-track issue happening in production systems.
Sorry I didn't keep up with this thread, I've been a bit behind on GitHub notifications.
@thernstig your concern in this comment is a non-issue for most use cases, at least in my experience. Whether A or B wins out in the end doesn't matter if the user double-clicked a "save" button to update their profile. And versioning handles the nasty cases that can happen when modifying arrays in parallel.
I wrote about the tradeoffs between different ways of updating documents in Mongoose here. I think this image sums it up:
Here's another option: instead of recommending updateOne()
for all cases, which is somewhat clunky, we can add support for a stricter form of versioning where save()
fails if the underlying document in the database has changed since it was loaded based on a timestamp. Like OCC. I'd prefer that to be opt-in, but we've had enough asks for officially supported OCC that we should consider it.
@vkarpov15 I think my concern is "it can happen", it can be another admin clicking a save button for a user. And those cases can be concerning, as they are hard to spot. Any way to eliminate such concerns is of value I think.
OCC seems like a feature worth having, but obviously an investment from your side. An idea would be a recommendation in docs that if a user has a more serious case where atomicity is a must, they could use transactions, at least if affecting multiple docs.
Thanks for a great package.
You're right that these cases can be concerning, but in practice they rarely are in my experience. One update wins out or the other, which is typically fine, unless the data ends up in an inconsistent state.
We'll add OCC support in a future release.