Mongoose: OCC support

Created on 14 May 2020 · 17Comments · Source: Automattic/mongoose

Some documentation suggests doing e.g. a findOne() followed by modifications, and then a .save(). Isn't this recommendation prone to breaking the atomicity that MongoDB otherwise brings via the update operations? Could it not mean e.g. two REST operations to read/update some document could give incorrect end results?

Am I missing something, or should the Mongoose docs not recommend this?

discussion

Source

thernstig

All 17 comments

user.save() will insert a new document if it's a new document, or update it if it's already on the database.

AbdelrahmanHafez on 14 May 2020

Yes, it is the update case I am talking about. When doing a const doc = new Document followed by doc.save() I find no problem. It is the query->update->save that is a problem. There are multiple locations throughout all documentation that shows example of the query->update->save way of handling this.

thernstig on 14 May 2020

Why do you think that this is bad? You may need data from the document to determine the update value.

AbdelrahmanHafez on 14 May 2020

👍1

Maybe I am missing something, but isn't that a classical race condition issue?

Flow A sends a findOne() to get document X.
Flow B sends a findOne() to get the same document X. Both A and B hold the same exact same document as the result of the identical queries. This is entirely possible in a REST API situation with many users of the API.
Flow A modifies some data in document X.
Flow B modifies the same data, but to some other value, in document X.
Both use .save(). Depending on if flow A or flow B finishes first, the final document stored in the database will look different.

The proper way to avoid this would be a transaction I assume. Still, the examples in the docs showing the query->update->save works _isolated_ but not in a production context.

thernstig on 14 May 2020

There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.

Document#save() does not send the whole document after modifying it to the database. Instead, mongoose tracks the changes and sends those changes as an updateOne(...) to Mongo.

Run mongoose.set('debug', true); to see what gets sent to MongoDB.

AbdelrahmanHafez on 15 May 2020

There can be race conditions, but it has nothing to do with .save(), it can happen with updateOne(...) as well because save() uses updateOne(...) internally.

I do not understand this sentence. How could it happen if I use only updateOne() directly? In that instance the operation is atomic.

Thus it can, from my understanding, _not_ happen if using updateOne() only.

The thing I am after is that only this flow is a potential race condition: query -> modify doc -> save(). That flow is not atomic. Using updateOne() is the better solution since it is atomic.

Am I missing something completely?

thernstig on 15 May 2020

The next two snippets do exactly the same thing.

With save

  const userSchema = new Schema({
    name: String,
    paidAmount: Number,
    isPremium: { type: Boolean, default: false }
  });

  const User = mongoose.model('User', userSchema);

  await User.create({ name: 'Hafez', paidAmount: 500, isPremium: true });

  // later
  const user = await User.findOne();
  if (user.isPremium) {
    user.paidAmount += 100;
  } else {
    user.paidAmount += 50;
  }

  mongoose.set('debug', true);
  // calls updateOne internally
  await user.save();

Output

users.updateOne({ _id: ObjectId("5ebe4a0a93864a5858c511c8") }, { '$set': { paidAmount: 600 } })

With update

  const user = await User.findOne();
  const update = {};
  if (user.isPremium) {
    update.paidAmount = user.paidAmount + 100;
  } else {
    update.paidAmount = user.paidAmount + 50;
  }

  mongoose.set('debug', true);

  await User.updateOne({ _id: user._id }, update);

Output (same)

users.updateOne({ _id: ObjectId("5ebe4a66209f4449b0ab55da") }, { '$set': { paidAmount: 600 } })

AbdelrahmanHafez on 15 May 2020

Yes, in that case. But not if you just plainly update a field (without using previous values). And in the second example, I assume you could achieve that with this: https://stackoverflow.com/a/56551655

Thus, making it atomic.

The documentation should thus preferably not recommend query->update->save() because it certainly has a high chance of having users run into race conditions that are very hard to discover.

thernstig on 15 May 2020

I agree that in cases where we don't depend on data in the current document, and the example is not related to .save(), we should use updateOne to inform people that they can do that, it definitely has better performance.

As for the SO question regarding update aggregation pipelines, it's added recently to MongoDB, so we can't add it in the docs, let alone how verbose it is, most users will find it confusing.

Yet, I still don't see how query=>update=>save can cause bugs, other from the performance side of it, I challenge you to come up with a script that demonstrates a bug with Document#save() but not with Model#updateOne().

Document#save() literally is Model#updateOne();.

AbdelrahmanHafez on 16 May 2020

const mongoose = require('mongoose');

const db = mongoose.createConnection('mongodb://127.0.0.1:27017/database1', {
  useNewUrlParser: true,
});

const test = new mongoose.Schema({
  something: String,
});

const Test = db.model('Test', test);

(async () => {
  await db.dropDatabase();

  await Test.create({
    something: 'initiated',
  });

  // Two different API endpoints where both users queries something at the same time
  const test1 = await Test.findOne({});
  let test2 = await Test.findOne({});

  // User 1's API request updates something
  test1.something = 'first update';
  await test1.save();

  // User 2's API request updates something, but it's for another route, where we do a check
  if (test2.something === 'initiated') test2.something = 'second update';
  await test2.save();
  console.log(await Test.findOne({}));

  // output: { _id: 5ec2b99d4768207ca9b6c34c, something: 'second update', __v: 0 }

  await db.dropDatabase();

  await Test.create({
    something: 'initiated',
  });

  // We do the same thing again, but this time the first API endpoint is updated to
  // use updateOne instead of query->update->save()

  // Two different API endpoints where both users queries something at the same time
  await Test.updateOne({ something: 'first update' }); // this update is now atomic
  test2 = await Test.findOne({});

  // User 2's API request updates something, but it's for another route, where we do a check
  if (test2.something === 'initiated') test2.something = 'second update';
  await test2.save();
  console.log(await Test.findOne({}));
  // output: { _id: 5ec2b9d202eb247d5dd2be50, something: 'first update', __v: 0 }
})();

Note that this is a contrived example, and it would have been much better for user 2 to also use updateOne() as to not get into the race condition where user 2 does a findOne() before user 1's updateOne().

In a distributed system, query->update->save causes hard-to-catch bugs.

thernstig on 18 May 2020

In the second case, move test2 = await Test.findOne(); one line above, or move line 22=>27, then both cases would be equivalent, and will yield the same results,

AbdelrahmanHafez on 19 May 2020

@AbdelrahmanHafez The code should be read as the calls to the DB are happening in parallel, i.e. you cannot read the code literally. It should simulate two API endpoints accessing the same database. So it cannot be "moved" in a real life example as you state.

Regarding moving from 22=>27 I also specifically state in the end that it can be solved as well. But that is of smaller matter.

What the above proves is that query->modify->save does not work in a distributed, parallel system where API calls can come in any order.

thernstig on 21 May 2020

@thernstig I know you can't read the code literally, I was demonstrating how updateOne/save can both cause a bug, and both can work correctly. This is not an issue with updateOne vs save.

Admittedly, though, updateOne reduces (but doesn't eliminate) the risk of that race condition because users don't need data, therefore there's less chance of the second findOne happening after that update.

Thoughts @vkarpov15?

AbdelrahmanHafez on 25 May 2020

I am not sure I agree, I feel there is an issue with updateOne vs save as demonstrated. https://github.com/Automattic/mongoose/issues/9001#issuecomment-631093037 was taking incorrect assumptions that code could be moved around in a way that is not possible when parallel requests come in.

updateOne can reduce the risk of race conditions if you just write it properly in certain cases. I do not think this is an issue of covering all cases, but _most_ cases. Isn't that what is important, to give good guidance to users that _reduces_ the risk? Using updateOne always instead would thus minimize a very real risk in larger, concurrent API request scenarios.

I'm not sure I have more value to add to this thread, I believe I have demonstrated how a problem can be minimized.

Do not take this for not being very happy with mongoose, I just started this thread to try to help users in something I believe is a hard-to-track issue happening in production systems.

thernstig on 25 May 2020

👍1

Sorry I didn't keep up with this thread, I've been a bit behind on GitHub notifications.

@thernstig your concern in this comment is a non-issue for most use cases, at least in my experience. Whether A or B wins out in the end doesn't matter if the user double-clicked a "save" button to update their profile. And versioning handles the nasty cases that can happen when modifying arrays in parallel.

I wrote about the tradeoffs between different ways of updating documents in Mongoose here. I think this image sums it up:

Here's another option: instead of recommending updateOne() for all cases, which is somewhat clunky, we can add support for a stricter form of versioning where save() fails if the underlying document in the database has changed since it was loaded based on a timestamp. Like OCC. I'd prefer that to be opt-in, but we've had enough asks for officially supported OCC that we should consider it.

vkarpov15 on 2 Jun 2020

@vkarpov15 I think my concern is "it can happen", it can be another admin clicking a save button for a user. And those cases can be concerning, as they are hard to spot. Any way to eliminate such concerns is of value I think.

OCC seems like a feature worth having, but obviously an investment from your side. An idea would be a recommendation in docs that if a user has a more serious case where atomicity is a must, they could use transactions, at least if affecting multiple docs.

Thanks for a great package.

thernstig on 4 Jun 2020

You're right that these cases can be concerning, but in practice they rarely are in my experience. One update wins out or the other, which is typically fine, unless the data ends up in an inconsistent state.

We'll add OCC support in a future release.

vkarpov15 on 11 Jun 2020

🎉1

Was this page helpful?

0 / 5 - 0 ratings