Mongoose: Convert populate to use $lookup

Created on 17 Dec 2015 · 31Comments · Source: Automattic/mongoose

Before:

lib.db.model('Stock').find(stockQuery).populate('product') // ...

After:

            lib.db.model('Stock').collection.aggregate([
              { $match: stockQuery },
              {
                $lookup: {
                  from: 'products',
                  localField: 'product',
                  foreignField: '_id',
                  as: 'product'
                }
              },
              {
                $project: {
                  product: {
                    $arrayElemAt: [ '$product', 0 ]
                  }
                }
              },
              { $sort: sort },
              { $limit: req.query.limit },
              { $skip: req.offset }
            ], callback);

Can we normalize this functionality for population using aggregate and $lookup into a rewrite of the populate function (so it would still function the same, but it'd use $lookup instead?

discussion

Source

niftylettuce

👍19

Most helpful comment

Here's a handful of benchmarks with MongoDB 3.4 (v3.2 was similar). Numbers are the median of 15 replicates.

`findOne`

| query | 1x concurrency | 6x | 15x |
| --- | --- | --- | --- |
| primary query | 0.911 ms | 9.226 | 19.55
| primary query, then 2nd query for relationship (~.populate()) | 1.201 ms | 5.782* | 11.44
| $lookup | 0.598 ms | 1.763 | 4.575

*Note: not sure why this is faster than first row; might be cached at some layer, but it's consistent.

`find` returning array of 140 docs

| query | 1x concurrency
| --- | --- |
| primary query | 3.401 ms
| primary query, then 2nd query for relationship (~.populate()) | 6.872 ms
| $lookup | 8.705 ms

Code: https://gist.github.com/zbjornson/746a477ee715685fb60963d9e3c5d2d1. The foreignField was indexed, and I verified that the index was used. The collection had about 60k documents in it. I used the mongodb driver directly for the benchmark to show the best possible case.

Looks like it helps with latency for individual documents, but is detrimental for arrays (low-throughput). In the MongoDB profiler it looks like each $lookup goes out as a separate query, so that makes sense.

zbjornson on 4 Oct 2017

❤3

All 31 comments

By default $lookup returns an array of results, and therefore that is why for a single document DB Ref I've put in the $arrayElemAt operator.

niftylettuce on 17 Dec 2015

Note that $lookup is only available in MongoDB 3.2

niftylettuce on 17 Dec 2015

Its worth considering, but not for the forseeable future because I'd imagine the majority of mongoose users don't use mongodb 3.2. Also, the only rationale that I could think of for doing this would be "it improves performance considerably." While that might be the case, we'd have to run a benchmark to see.

vkarpov15 on 17 Dec 2015

+1 for using $lookup internally.

Also, how about implementing this conditionally? This way mongoose users who work with MongoDB 3.2+ could already benefit from the (suspected) performance improvements.

macrostart on 9 Jan 2016

👍2

Possible, but it's not really a high priority atm. The difficult part is that $lookup only works with the aggregation framework, so .find().populate() needs to get converted into an aggregation query, which seems unwieldy.

vkarpov15 on 10 Jan 2016

👍2

+1 for $lookup.

avchugaev on 13 Feb 2016

Is there real benefits of using $lookup insteal of populate? that's the point, however mongodb3.2 it's still to new for the majority of users.

santimacia on 19 Feb 2016

Performance in some cases - populate() requires 2 distinct queries to mongodb and thus 2 sequential round trips over the network, $lookup only takes 1. However, there's ways to structure your schema so populate-like operations can be done in parallel, so $lookup doesn't help as much as you might think.

vkarpov15 on 19 Feb 2016

Another thing to consider about $lookup within arrays

http://stackoverflow.com/questions/34967482/lookup-on-objectids-in-an-array

santimacia on 19 Feb 2016

Ah yeah that would be tricky to implement - we'd need to be smart enough to build an aggregation pipeline that includes $unwind for every nested array.

vkarpov15 on 19 Feb 2016

TrejGun on 13 May 2016

+1 on principle, but does anyone have any good benchmarking data?
Good news, though: it looks like the issue with $lookup on arrays will be resolved as of mongo 3.3.4: https://jira.mongodb.org/browse/SERVER-22881

pswoodworth on 17 May 2016

faller on 8 Jul 2016

populate works with multiple database. How would $lookup work then?

sidgate on 4 Oct 2016

@sidgate that's a great point, it actually would not work at all and that's a pretty core feature for mongoose. I'm gonna close this issue because neither mongodb 3.2 nor 3.4 are going to support cross-db $lookup...

vkarpov15 on 7 Oct 2016

For anyone interested in an ES7 version of this, here you go!

// get paginated result of job postings using $lookup
  // <https://github.com/Automattic/mongoose/issues/3683>
  let [ positions, itemCount ] = await Promise.all([
    Positions.aggregate([
      { $match },
      {
        $lookup: {
          from: 'users',
          localField: 'user',
          foreignField: '_id',
          as: 'user'
        }
      },
      {
        $project: {
          user: {
            $arrayElemAt: [ '$user', 0 ]
          }
        }
      },
      { $sort },
      { $limit: ctx.query.limit },
      { $skip: ctx.paginate.skip  }
    ]),
    Positions.count($match).exec()
  ]);

Per https://github.com/crocodilejs/crocodile-node-mvc-framework

I'm also using koa-ctx-paginate with this at https://github.com/koajs/ctx-paginate

niftylettuce on 7 Oct 2016

Good stuff @niftylettuce but async/await is not ES6 :p

vkarpov15 on 8 Oct 2016

Typo 😆 🚀

niftylettuce on 9 Oct 2016

Hopefully it'll make it into ES2017 :rocket:

vkarpov15 on 17 Oct 2016

This doesn't change the complications of implementing this., but just a comment on how $lookup could "improve performance considerably":

Using populate on a cursor tends to be unusably slow due to the amount of single queries being performed.

One way around this could be to build a batched cursor, populate per batch, and then unpack each document in that batch and provide it as a readable stream.

kintel on 25 Oct 2016

That could work. Unfortunately, $lookup still executes a separate query for each individual doc, so you're saving a round trip by using $lookup but using approximately the same db processing time and hogging the connection. I like the cursor batch idea though :+1:

vkarpov15 on 28 Oct 2016

@vkarpov15 What is the state of allow population by lookup. Can we use it whenever possible and revert to normal population whenever necessary.

lykmapipo on 10 Jan 2017

@lykmapipo not planning on doing this currently because $lookup doesn't support lookups between different databases, so this would break the populate API. Yeah we could fall back to doing conventional populate in cross-db cases but that would add a lot of code and be more work than it's really worth IMO.

vkarpov15 on 19 Jan 2017

Can we revisit this? Why do we need to do cross database lookups?

niftylettuce on 18 Aug 2017

Seems like closing this issue due to cross-db issues is a cheap excuse, the feature would be excellent.

warrenday on 19 Aug 2017

👍1

Per my chat with @vkarpov15 on Slack, can someone in the community create a performance test with these requirements to compare $lookup vs populate?

6 concurrent $lookups with 2 fast findOne() queries VS. 6 populates with 2 fast findOne() queries

niftylettuce on 19 Aug 2017

👍3

Here's a handful of benchmarks with MongoDB 3.4 (v3.2 was similar). Numbers are the median of 15 replicates.

`findOne`

*Note: not sure why this is faster than first row; might be cached at some layer, but it's consistent.

`find` returning array of 140 docs

| query | 1x concurrency
| --- | --- |
| primary query | 3.401 ms
| primary query, then 2nd query for relationship (~.populate()) | 6.872 ms
| $lookup | 8.705 ms

zbjornson on 4 Oct 2017

❤3

There are things that I found that works with populate but not with lookup. I think a model that looks like the following doesn't work well with populate (i can't find the $lookup i wrote for it, though).

Schema({
nestedObject: {
item1: { name: 'tuna' },
list: [{ type: ObjectId, ref: 'AnotherModel' }]
}
});

I will try to find a more concrete example, but that was my recollection of my attempts at replacing a nested populate.

tunatoksoz on 4 Oct 2017

@tunatoksoz yeah if you want to use $lookup with an array you need to add a $unwind.

@zbjornson yeah that doesn't surprise me too much. $lookup does end up as a separate query for each doc because of how aggregation works, so the performance impact isn't quite as cut-and-dried. I'd imagine the concurrency impact is even bigger on .find()...

vkarpov15 on 5 Oct 2017

Seems like closing this issue due to cross-db issues is a cheap excuse, the feature would be excellent.

Just wanted to weigh in that the cross-db feature has been a time saver for us, and I think it's one of those things that really makes mongoose an ODM, an extra layer of complexity with additional features from the base driver.

Ncifra on 25 Jan 2019

For future reference, here's an example of where $lookup gets super slow:

  const db = await mongodb.MongoClient
    .connect('mongodb://localhost:27017/test', {
      useNewUrlParser: true,
      poolSize: 1 // Only 1 operation can run at a time
    })
    .then(client => client.db())

  await db.dropDatabase()

  for (let i = 0; i < 5000; ++i) {
    await db.collection('Foo').insertOne({ _id: i })
  }
  console.log('Inserted foo docs')

  for (let i = 0; i < 5000; ++i) {
    await db.collection('Bar').insertOne({ _id: i, fooId: i })
  }
  console.log('Inserted bar docs')

  // This aggregation takes 150ms on my laptop, That's because this query executes one full
  // collection scan of 'Bar' for every document in 'Foo', or 20000 collection scans
  const promise = db.collection('Foo').aggregate([
    { $lookup: { from: 'Bar', localField: '_id', foreignField: 'fooId', as: 'bars' } }
  ]).toArray()

Versus what populate does:

  let startTime = Date.now()
  const foos = await db.collection('Foo').find().toArray()
  // "Found foos in 28ms"
  console.log(`Found foos in ${Date.now() - startTime}ms`)

  startTime = Date.now()
  const bars = await db.collection('Bar').find({ fooId: { $in: foos.map(f => f._id) } }).toArray()
  // "Found bars in 15ms"
  console.log(`Found bars in ${Date.now() - startTime}ms`)
  const barMap = bars.reduce((map, b) => map.set(b.fooId.toString(), b), new Map())

  for (const foo of foos) {
    foo.bar = barMap.get(foo._id.toString())
  }

$lookup compounds index misses, populate() minimizes their impact because it only executes one query for each populate(), regardless of the number of docs being populated.

vkarpov15 on 15 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings