I'm finding eager loading works via a second query with an IN clause, is there some reason joins are not used?
This prevents me from e.g aggregating an eagerly loaded value like so
Product.
query().
eager('reviews')
avg('reviews.rating').
as('avgRating').
groupBy('column1', 'column2', ...)
Thanks
Implementing the whole eager query system with joins would be _alot_ more complex than it is with separate queries. So there is no reason other than the complexity of implementation.
You probably know that you can use the join method to do what you want just as you would with knex:
Product.
query().
join('reviews', 'products.id', 'reviews.product_id')
avg('reviews.rating').
as('avgRating').
groupBy('column1', 'column2', ...)
Of course, this won't create a reviews array for the products. You need to add the eager('reviews').
Understood. Is this on the roadmap at all? Would certainly be nice to have.
I ended up plugging the values from Product.relationMappings.reviews in to the .join() to keep things DRY.
If you're interested resulting code was
Product.
query().
join(ProductReview.tableName,
Product.relationMappings.reviews.join.from,
Product.relationMappings.reviews.join.to).
avg(ProductReview.tableName + '.rating').
as('avgScore').
groupBy(_.map(Object.keys(Product.columnInfo), function(columnName) {
return Product.tableName + '.' + columnName;
}))
Note .columnInfo is a property I set after globbing up all my model files - the result of knexQuery().columnInfo(). This saves me from typing out all the column names when grouping by all columns.
Reimplementing the eager stuff using joins is not going to happen any time soon. How would you feel about a method like this:
Product.
query().
joinRelation('reviews')
avg('reviews.rating').
as('avgScore').
groupBy(_.map(Object.keys(Product.columnInfo), function(columnName) {
return Product.tableName + '.' + columnName;
}))
joinRelation method would only take a relation name and not a relation expression. It would be really easy to implement and I think it would be a useful addition.
That would be great, with the relation name being used to alias the joined table.
I suppose the only difference between this and a fuller implementation would be that joined columns wouldn't be nested & the source model would be repeated for every joined row?
Also would this support many to many relations through a joining table?
Yes, exactly. This is such an easy and obvious feature that I will put it on my todo list. I will add a version for all relevant join types. I'll post here when I have something ready.
Excellent, will keep an eye out for it.
Thanks
I'll soon release the 0.5.0-alpha.0 version that has the new joinRelation method. You can install it using npm install objection@next
Great work, thanks!
There's a very cool project called Treeize that I have been using for a few months to do similar eager-loading using joins. It works very well and integrating it into Objection should cut down some complexity of implementing eager-loading using joins.
That's interesting @ArsalanDotMe . I will be trying that out in combination with the new joinRelation method when I get a chance. @koskimas if that works out well would you be open to plugging that in? I think it would be a powerful enhancement at (speculating) relatively low cost
This seems to interest people even though it is closed so I'm reopening it. I'm drunk right now so I'll answer this later 馃榾
Eagerly awaiting further discussion on this :)
Finally sober 馃槈
To make the eager stuff use joins while keeping the current features we would basically have to rewrite objection. As an example, let's consider this simple eager query:
A.query().eager('[b.c, d]')
We would need to generate a query like this:
select
A.id as A_id,
A.p1 as A_p1,
A.p2 as A_p2,
A_b.id as A_b_id,
A_b.p3 as A_b_p3,
A_b.p4 as A_b_p4,
A_b_c.id as A_b_c_id,
A_b_c.p5 as A_b_c_p5,
A_b_c.p6 as A_b_c_p6,
A_d.id as A_d_id,
A_d.p7 as A_d_p7,
A_d.p8 as A_d_p8,
from A
inner join B as A_b on A_b.aid = A.id
inner join C as A_b_c on A_b_c.bid = A_b.id
inner join D as A_d on A_d.aid = A.id
And assuming that each relation yields just 2 rows the result object would be something like this:
[
{ A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 2, A_p2: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
{ A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 }
]
Now don't care about the errors in the SQL or the actual result column values, I just wrote this down without testing it.
Two main points to notice here:
This example only had a simple eager expression with tables that had only two columns and all relations had just 2 results. And even with this tiny example the result object is big. Now try to create this example in your minds with eager expression like [a.b.c.d.e.[f, g.h], i.j, l.m.n, o.[p, q, r]] with realistic tables that have 6 or more columns and with result sets with hundreds of rows. This is not a far fetched example, I write code like this _all the time_.
jsonSchema required or add some other mechanism for finding out the columns..filterEager('a.b.d', builder => builder.where('p1', 10)) We would need to map the property p1 into A_a_c_d.p1 behind the scenes and because of the lack of reflection features in knex, it is not easily doable.So replacing the current query system with joins is just not going to happen. We can however consider adding another mechanism for building nested joins. Or possibly with a lot of work, we could provide an option to choose which method to use for the eager queries, but really this would mean rewriting everything.
I don't think that the performance benefits are that big. Please send any benchmarks you have that show that using separate queries is significantly slower than using joins for the common cases.
I'm going to add join based eager queries as an option to the current where in queries. The loading algorithm can be selected per-query, per-model and per-project. I'll keep you posted.
Currently this SQL is generated for query:
Model1
.query()
.where("Model1.id", 1)
.eager('[model1Relation1, model1Relation2.model2Relation1]')
.filterEager('model1Relation2.model2Relation1', {
builder => builder.where('id', '>', 5)
})
select "Model1"."id" as "Model1:id",
"Model1"."model1Id" as "Model1:model1Id",
"Model1"."model1Prop1" as "Model1:model1Prop1",
"Model1"."model1Prop2" as "Model1:model1Prop2",
"_t1"."id" as "_t1:id",
"_t1"."model1Id" as "_t1:model1Id",
"_t1"."model1Prop1" as "_t1:model1Prop1",
"_t1"."model1Prop2" as "_t1:model1Prop2",
"_t3"."id_col" as "_t3:id_col",
"_t3"."model_1_id" as "_t3:model_1_id",
"_t3"."model_2_prop_1" as "_t3:model_2_prop_1",
"_t3"."model_2_prop_2" as "_t3:model_2_prop_2",
"_t5"."id" as "_t5:id",
"_t5"."model1Id" as "_t5:model1Id",
"_t5"."model1Prop1" as "_t5:model1Prop1",
"_t5"."model1Prop2" as "_t5:model1Prop2",
"_t6"."extra3" as "_t5:extra3"
from "Model1" as "Model1"
left join (select * from "Model1") as "_t1" on "_t1"."id" = "Model1"."model1Id"
left join (select * from "model_2") as "_t3" on "_t3"."model_1_id" = "Model1"."id"
left join "Model1Model2" as "_t6" on "_t6"."model2Id" = "_t3"."id_col"
left join (select * from "Model1" where "id" > 5) as "_t5" on "_t6"."model1Id" = "_t5"."id"
where "Model1"."id" = 1
The table aliases need to be minified because at least postgres has a 63 char limit for identifiers. Therefore we cannot alias them like this Model1.model1Relation1.model1Relation2.id etc. I'll add an option to turn off the minification, but longer eager expressions will fail with that option on.
Joins is done against a sub query so that we can filter that query using filterEager. This is a workaround because knex doesn't provide a full query builder for the ON clause. At least postgres produces identical queries in both cases.
Thoughts/ideas at this point?
Looks good! Will it be required to add the base table name in the top-level where clause? How does it compare performance wise?
Regarding problem 2 you mentioned above, it would be nice if it was possible to enforce some default columns in queries to eliminate "select *" queries. It might make sense to create a "defaultColumns" property that is read and applied automatically (also for each join). Using .select() should override it and the idColumn and foreign keys should always be present.
With the query above, you need to use the full TableName.columnName format in the where clause or you get "ambiguous identifier id" error. There may be a way to avoid this by aliasing the top-level columns with their own name like select Model.id as id.
I'm using knex's columnInfo to fetch the columns and selecting everything by default at the moment. You can use filterEager and select only the columns you want. defaultColumns could be a good idea!
I've only ran tests with tiny datasets, but the join version seems to be just as fast/slow as the "where in" version. I'll write some tests later with realistic-ish data.
First draft of the join based eager queries is now released with 0.6.0-rc.4. You can use the join algorithm for a query like this:
return Model1
.query()
.where('Model1.model1Prop2', '<', 100)
.eagerAlgorithm(Model1.JoinEagerAlgorithm)
.eager('[model1Relation1, model1Relation2.model2Relation1]')
and it will produce SQL like this by default:
select
"Model1"."id" as "Model1:id",
"Model1"."model1Id" as "Model1:model1Id",
"Model1"."model1Prop1" as "Model1:model1Prop1",
"Model1"."model1Prop2" as "Model1:model1Prop2",
"_t1"."id" as "_t1:id",
"_t1"."model1Id" as "_t1:model1Id",
"_t1"."model1Prop1" as "_t1:model1Prop1",
"_t1"."model1Prop2" as "_t1:model1Prop2",
"_t3"."id_col" as "_t3:id_col",
"_t3"."model_1_id" as "_t3:model_1_id",
"_t3"."model_2_prop_1" as "_t3:model_2_prop_1",
"_t3"."model_2_prop_2" as "_t3:model_2_prop_2",
"_t5"."id" as "_t5:id",
"_t5"."model1Id" as "_t5:model1Id",
"_t5"."model1Prop1" as "_t5:model1Prop1",
"_t5"."model1Prop2" as "_t5:model1Prop2",
"_t6"."extra3" as "_t5:extra3"
from "Model1" as "Model1"
left join (select * from "Model1") as "_t1" on "_t1"."id" = "Model1"."model1Id"
left join (select * from "model_2") as "_t3" on "_t3"."model_1_id" = "Model1"."id"
left join "Model1Model2" as "_t6" on "_t6"."model2Id" = "_t3"."id_col"
left join (select * from "Model1") as "_t5" on "_t6"."model1Id" = "_t5"."id"
where "Model1"."model1Prop2" < 100
As I mentioned earlier, the _t3, _t4 etc. aliases are generated to avoid the 63 character identifier limit. You can disable the minimization like this:
return Model1
.query()
.where('Model1.model1Prop2', '<', 100)
.eagerAlgorithm(Model1.JoinEagerAlgorithm)
.eagerOptions({minimize: false})
.eager('[model1Relation1, model1Relation2.model2Relation1]')
And now the generated sql is this:
select
"Model1"."id" as "Model1:id",
"Model1"."model1Id" as "Model1:model1Id",
"Model1"."model1Prop1" as "Model1:model1Prop1",
"Model1"."model1Prop2" as "Model1:model1Prop2",
"Model1:model1Relation1"."id" as "Model1:model1Relation1:id",
"Model1:model1Relation1"."model1Id" as "Model1:model1Relation1:model1Id",
"Model1:model1Relation1"."model1Prop1" as "Model1:model1Relation1:model1Prop1",
"Model1:model1Relation1"."model1Prop2" as "Model1:model1Relation1:model1Prop2",
"Model1:model1Relation2"."id_col" as "Model1:model1Relation2:id_col",
"Model1:model1Relation2"."model_1_id" as "Model1:model1Relation2:model_1_id",
"Model1:model1Relation2"."model_2_prop_1" as "Model1:model1Relation2:model_2_prop_1",
"Model1:model1Relation2"."model_2_prop_2" as "Model1:model1Relation2:model_2_prop_2",
"Model1:model1Relation2:model2Relation1"."id" as "Model1:model1Relation2:model2Relation1:id",
"Model1:model1Relation2:model2Relation1"."model1Id" as "Model1:model1Relation2:model2Relation1:model1Id",
"Model1:model1Relation2:model2Relation1"."model1Prop1" as "Model1:model1Relation2:model2Relation1:model1Prop1",
"Model1:model1Relation2:model2Relation1"."model1Prop2" as "Model1:model1Relation2:model2Relation1:model1Prop2",
"Model1:model1Relation2:model2Relation1_join"."extra3" as "Model1:model1Relation2:model2Relation1:extra3"
from "Model1" as "Model1"
left join (select * from "Model1") as "Model1:model1Relation1" on "Model1:model1Relation1"."id" = "Model1"."model1Id"
left join (select * from "model_2") as "Model1:model1Relation2" on "Model1:model1Relation2"."model_1_id" = "Model1"."id"
left join "Model1Model2" as "Model1:model1Relation2:model2Relation1_join" on "Model1:model1Relation2:model2Relation1_join"."model2Id" = "Model1:model1Relation2"."id_col"
left join (select * from "Model1") as "Model1:model1Relation2:model2Relation1" on "Model1:model1Relation2:model2Relation1_join"."model1Id" = "Model1:model1Relation2:model2Relation1"."id"
where "Model1"."model1Prop2" < 100
With the non-minimized format it is cleaner to filter using the joined relations:
...
.query()
.where('Model1:model1Relation2.id', 10)
...
The relation names can be aliased for shorter identifiers like this:
...
.query()
.eagerOptions({
minimize: false,
aliases: { model1Relation2: 'm1r2', model2Relation1: 'm2r1' }
})
.where('Model1:m1r2:m2r1.id', 10)
...
Maybe we need to add a some other way to specify the aliases so that they don't have to be given for each query. Maybe in relationMappings?
Also I will probably lose the useless root table name from the table aliases (The Model1: prefix in the examples above).
Please try this out and give me some feedback! Everything can still be changed, since this a new feature in an RC release and no-one is likely to start using this yet. So if a method name, the built SQL or anything else could be done better, please let me know.
As for the performance, the join algorithm seems to be a bit slower no matter what size the eager expression or the result set as I predicted. There is a huge overhead in parsing the flattened rows (you can check out the code, I'm using every last bit of my V8 knowledge to optimize it). The queries I tested were simple though and I ran the db on localhost so the query time and roundtrip time were small compared to the javascript overhead.
In rc5 the root model prefix is removed. The genereated non-minified SQL for the example above is now this:
select "Model1"."id" as "id",
"Model1"."model1Id" as "model1Id",
"Model1"."model1Prop1" as "model1Prop1",
"Model1"."model1Prop2" as "model1Prop2",
"model1Relation1"."id" as "model1Relation1:id",
"model1Relation1"."model1Id" as "model1Relation1:model1Id",
"model1Relation1"."model1Prop1" as "model1Relation1:model1Prop1",
"model1Relation1"."model1Prop2" as "model1Relation1:model1Prop2",
"model1Relation2"."id_col" as "model1Relation2:id_col",
"model1Relation2"."model_1_id" as "model1Relation2:model_1_id",
"model1Relation2"."model_2_prop_1" as "model1Relation2:model_2_prop_1",
"model1Relation2"."model_2_prop_2" as "model1Relation2:model_2_prop_2",
"model1Relation2:model2Relation1"."id" as "model1Relation2:model2Relation1:id",
"model1Relation2:model2Relation1"."model1Id" as "model1Relation2:model2Relation1:model1Id",
"model1Relation2:model2Relation1"."model1Prop1" as "model1Relation2:model2Relation1:model1Prop1",
"model1Relation2:model2Relation1"."model1Prop2" as "model1Relation2:model2Relation1:model1Prop2",
"model1Relation2:model2Relation1_join"."extra3" as "model1Relation2:model2Relation1:extra3"
from "Model1" as "Model1"
left join (select * from "Model1") as "model1Relation1" on "model1Relation1"."id" = "Model1"."model1Id"
left join (select * from "model_2") as "model1Relation2" on "model1Relation2"."model_1_id" = "Model1"."id"
left join "Model1Model2" as "model1Relation2:model2Relation1_join" on "model1Relation2:model2Relation1_join"."model2Id" = "model1Relation2"."id_col"
left join (select * from "Model1") as "model1Relation2:model2Relation1" on "model1Relation2:model2Relation1_join"."model1Id" = "model1Relation2:model2Relation1"."id"
where "Model1"."model1Prop2" < 100
in rc6 the minimization is disabled by default.
Most helpful comment
Finally sober 馃槈
To make the eager stuff use joins while keeping the current features we would basically have to rewrite objection. As an example, let's consider this simple eager query:
We would need to generate a query like this:
And assuming that each relation yields just 2 rows the result object would be something like this:
Now don't care about the errors in the SQL or the actual result column values, I just wrote this down without testing it.
Two main points to notice here:
This example only had a simple eager expression with tables that had only two columns and all relations had just 2 results. And even with this tiny example the result object is big. Now try to create this example in your minds with eager expression like
[a.b.c.d.e.[f, g.h], i.j, l.m.n, o.[p, q, r]]with realistic tables that have 6 or more columns and with result sets with hundreds of rows. This is not a far fetched example, I write code like this _all the time_.The problems:
jsonSchemarequired or add some other mechanism for finding out the columns..filterEager('a.b.d', builder => builder.where('p1', 10))We would need to map the propertyp1intoA_a_c_d.p1behind the scenes and because of the lack of reflection features in knex, it is not easily doable.Conclusions
So replacing the current query system with joins is just not going to happen. We can however consider adding another mechanism for building nested joins. Or possibly with a lot of work, we could provide an option to choose which method to use for the eager queries, but really this would mean rewriting everything.
I don't think that the performance benefits are that big. Please send any benchmarks you have that show that using separate queries is significantly slower than using joins for the common cases.