Before:
lib.db.model('Stock').find(stockQuery).populate('product') // ...
After:
lib.db.model('Stock').collection.aggregate([
{ $match: stockQuery },
{
$lookup: {
from: 'products',
localField: 'product',
foreignField: '_id',
as: 'product'
}
},
{
$project: {
product: {
$arrayElemAt: [ '$product', 0 ]
}
}
},
{ $sort: sort },
{ $limit: req.query.limit },
{ $skip: req.offset }
], callback);
Can we normalize this functionality for population using aggregate
and $lookup
into a rewrite of the populate
function (so it would still function the same, but it'd use $lookup
instead?
By default $lookup returns an array of results, and therefore that is why for a single document DB Ref I've put in the $arrayElemAt
operator.
Note that $lookup is only available in MongoDB 3.2
Its worth considering, but not for the forseeable future because I'd imagine the majority of mongoose users don't use mongodb 3.2. Also, the only rationale that I could think of for doing this would be "it improves performance considerably." While that might be the case, we'd have to run a benchmark to see.
+1 for using $lookup internally.
Also, how about implementing this conditionally? This way mongoose users who work with MongoDB 3.2+ could already benefit from the (suspected) performance improvements.
Possible, but it's not really a high priority atm. The difficult part is that $lookup only works with the aggregation framework, so .find().populate()
needs to get converted into an aggregation query, which seems unwieldy.
+1 for $lookup.
Is there real benefits of using $lookup
insteal of populate
? that's the point, however mongodb3.2 it's still to new for the majority of users.
Performance in some cases - populate() requires 2 distinct queries to mongodb and thus 2 sequential round trips over the network, $lookup
only takes 1. However, there's ways to structure your schema so populate-like operations can be done in parallel, so $lookup
doesn't help as much as you might think.
Another thing to consider about $lookup
within arrays
http://stackoverflow.com/questions/34967482/lookup-on-objectids-in-an-array
Ah yeah that would be tricky to implement - we'd need to be smart enough to build an aggregation pipeline that includes $unwind
for every nested array.
+1
+1 on principle, but does anyone have any good benchmarking data?
Good news, though: it looks like the issue with $lookup on arrays will be resolved as of mongo 3.3.4: https://jira.mongodb.org/browse/SERVER-22881
+1
populate
works with multiple database. How would $lookup
work then?
@sidgate that's a great point, it actually would not work at all and that's a pretty core feature for mongoose. I'm gonna close this issue because neither mongodb 3.2 nor 3.4 are going to support cross-db $lookup...
For anyone interested in an ES7 version of this, here you go!
// get paginated result of job postings using $lookup
// <https://github.com/Automattic/mongoose/issues/3683>
let [ positions, itemCount ] = await Promise.all([
Positions.aggregate([
{ $match },
{
$lookup: {
from: 'users',
localField: 'user',
foreignField: '_id',
as: 'user'
}
},
{
$project: {
user: {
$arrayElemAt: [ '$user', 0 ]
}
}
},
{ $sort },
{ $limit: ctx.query.limit },
{ $skip: ctx.paginate.skip }
]),
Positions.count($match).exec()
]);
Per https://github.com/crocodilejs/crocodile-node-mvc-framework
I'm also using koa-ctx-paginate
with this at https://github.com/koajs/ctx-paginate
Good stuff @niftylettuce but async/await is not ES6 :p
Typo 馃槅 馃殌
Hopefully it'll make it into ES2017 :rocket:
This doesn't change the complications of implementing this., but just a comment on how $lookup
could "improve performance considerably":
Using populate
on a cursor tends to be unusably slow due to the amount of single queries being performed.
One way around this could be to build a batched cursor, populate
per batch, and then unpack each document in that batch and provide it as a readable stream.
That could work. Unfortunately, $lookup
still executes a separate query for each individual doc, so you're saving a round trip by using $lookup
but using approximately the same db processing time and hogging the connection. I like the cursor batch idea though :+1:
@vkarpov15 What is the state of allow population
by lookup
. Can we use it whenever possible and revert to normal population whenever necessary.
@lykmapipo not planning on doing this currently because $lookup
doesn't support lookups between different databases, so this would break the populate API. Yeah we could fall back to doing conventional populate in cross-db cases but that would add a lot of code and be more work than it's really worth IMO.
Can we revisit this? Why do we need to do cross database lookups?
Seems like closing this issue due to cross-db issues is a cheap excuse, the feature would be excellent.
Per my chat with @vkarpov15 on Slack, can someone in the community create a performance test with these requirements to compare $lookup
vs populate
?
6 concurrent $lookups with 2 fast findOne()
queries VS. 6 populates with 2 fast findOne()
queries
Here's a handful of benchmarks with MongoDB 3.4 (v3.2 was similar). Numbers are the median of 15 replicates.
findOne
| query | 1x concurrency | 6x | 15x |
| --- | --- | --- | --- |
| primary query | 0.911 ms | 9.226 | 19.55
| primary query, then 2nd query for relationship (~.populate()
) | 1.201 ms | 5.782* | 11.44
| $lookup
| 0.598 ms | 1.763 | 4.575
*Note: not sure why this is faster than first row; might be cached at some layer, but it's consistent.
find
returning array of 140 docs| query | 1x concurrency
| --- | --- |
| primary query | 3.401 ms
| primary query, then 2nd query for relationship (~.populate()
) | 6.872 ms
| $lookup
| 8.705 ms
Code: https://gist.github.com/zbjornson/746a477ee715685fb60963d9e3c5d2d1. The foreignField
was indexed, and I verified that the index was used. The collection had about 60k documents in it. I used the mongodb driver directly for the benchmark to show the best possible case.
Looks like it helps with latency for individual documents, but is detrimental for arrays (low-throughput). In the MongoDB profiler it looks like each $lookup
goes out as a separate query, so that makes sense.
There are things that I found that works with populate but not with lookup. I think a model that looks like the following doesn't work well with populate (i can't find the $lookup i wrote for it, though).
Schema({
nestedObject: {
item1: { name: 'tuna' },
list: [{ type: ObjectId, ref: 'AnotherModel' }]
}
});
I will try to find a more concrete example, but that was my recollection of my attempts at replacing a nested populate.
@tunatoksoz yeah if you want to use $lookup
with an array you need to add a $unwind
.
@zbjornson yeah that doesn't surprise me too much. $lookup
does end up as a separate query for each doc because of how aggregation works, so the performance impact isn't quite as cut-and-dried. I'd imagine the concurrency impact is even bigger on .find()
...
Seems like closing this issue due to cross-db issues is a cheap excuse, the feature would be excellent.
Just wanted to weigh in that the cross-db feature has been a time saver for us, and I think it's one of those things that really makes mongoose an ODM, an extra layer of complexity with additional features from the base driver.
For future reference, here's an example of where $lookup
gets super slow:
const db = await mongodb.MongoClient
.connect('mongodb://localhost:27017/test', {
useNewUrlParser: true,
poolSize: 1 // Only 1 operation can run at a time
})
.then(client => client.db())
await db.dropDatabase()
for (let i = 0; i < 5000; ++i) {
await db.collection('Foo').insertOne({ _id: i })
}
console.log('Inserted foo docs')
for (let i = 0; i < 5000; ++i) {
await db.collection('Bar').insertOne({ _id: i, fooId: i })
}
console.log('Inserted bar docs')
// This aggregation takes 150ms on my laptop, That's because this query executes one full
// collection scan of 'Bar' for every document in 'Foo', or 20000 collection scans
const promise = db.collection('Foo').aggregate([
{ $lookup: { from: 'Bar', localField: '_id', foreignField: 'fooId', as: 'bars' } }
]).toArray()
Versus what populate does:
let startTime = Date.now()
const foos = await db.collection('Foo').find().toArray()
// "Found foos in 28ms"
console.log(`Found foos in ${Date.now() - startTime}ms`)
startTime = Date.now()
const bars = await db.collection('Bar').find({ fooId: { $in: foos.map(f => f._id) } }).toArray()
// "Found bars in 15ms"
console.log(`Found bars in ${Date.now() - startTime}ms`)
const barMap = bars.reduce((map, b) => map.set(b.fooId.toString(), b), new Map())
for (const foo of foos) {
foo.bar = barMap.get(foo._id.toString())
}
$lookup
compounds index misses, populate()
minimizes their impact because it only executes one query for each populate()
, regardless of the number of docs being populated.
Most helpful comment
Here's a handful of benchmarks with MongoDB 3.4 (v3.2 was similar). Numbers are the median of 15 replicates.
findOne
| query | 1x concurrency | 6x | 15x |
| --- | --- | --- | --- |
| primary query | 0.911 ms | 9.226 | 19.55
| primary query, then 2nd query for relationship (~
.populate()
) | 1.201 ms | 5.782* | 11.44|
$lookup
| 0.598 ms | 1.763 | 4.575*Note: not sure why this is faster than first row; might be cached at some layer, but it's consistent.
find
returning array of 140 docs| query | 1x concurrency
| --- | --- |
| primary query | 3.401 ms
| primary query, then 2nd query for relationship (~
.populate()
) | 6.872 ms|
$lookup
| 8.705 msCode: https://gist.github.com/zbjornson/746a477ee715685fb60963d9e3c5d2d1. The
foreignField
was indexed, and I verified that the index was used. The collection had about 60k documents in it. I used the mongodb driver directly for the benchmark to show the best possible case.Looks like it helps with latency for individual documents, but is detrimental for arrays (low-throughput). In the MongoDB profiler it looks like each
$lookup
goes out as a separate query, so that makes sense.