When doing HasOne and BelongsTo Preloads (and maybe others also), GORM is quite slow when the there are many query results and many referenced results.
Example scenario:
I've looked at the code, and found that handleBelongsToPreload and handleHasOnePreload in callback_query_preload.go are inefficient when assigning results of relation queries.
Currently, ForeignField values from one set of results and AssociatedForeignField values from the other set are compared as strings in a nested for loop. This is quadratic complexity, which is quite bad if there is large number of rows (more than a few thousand).
This can be improved by storing field values of one set of results in a map with a string key, iterating through the field values/objects of the other set and looking for the same values in the map.
I found queries like this:
SELECT * FROM "gateway" WHERE ("id" IN ('8','8','8','8','8','8','8','8','8','8'))
Would that be reduced to the following?
SELECT * FROM "gateway" WHERE ("id" IN ('8'))
The issue and speed improvement I mentioned happens after the SELECT query, when the foreign field values returned by query are actually assigned to objects.
Duplicate keys in queries should also be optimized potentially, although I don't think impact on preload speed is as big as the assignment of field values to objects. I believe sql is smart enough to remove duplicates when executing the query. However, removing duplicates before would help to reduce data transmitted over network and might also help with speed in some cases.
The more reasonable way to tackle a performance problem is not to try to guess where the bottleneck is, but use a profiler and actually see what is causing slowness.
A quadratic-time loop is kind of a low hanging fruit for immediate speed improvements in this case though. How about a PR?
For me I just noticed the redundancy inside my query while debugging. I do not face actual performance issues. I was just wondering if this issue could be related and posted here instead of opening a separate issue.
I did not intended to "guess" where the bottleneck is, if that's what you mean ;)
Is this still an issue ? am closing
cos imho, it a chat about things that not relevant.. atmo
BUT please repoen if missing the point.. thanks all
I believe the issue from my OP still exists.
Since I needed this in my project immediately, I forked the repo and did a quick fix of one case. You can see the change here.
Unfortunately, I don鈥檛 have time to make a proper PR here...
I also have a need for this improvement, since the Preloads take several tens of seconds for fairly average resultsets (30000 simple records).
For now I am manually querying the Preload-objects and then looping the main resultset in order to populate the missing artifacts (which only takes a couple of milliseconds for the same resultset). It isn't a complex task, shouldn't gorm be doing it in the Preloads out-of-the-box?
I'm also having this problem, Preloading is very slow, and with @vanjapt 's patch it works much faster.
It would be great if this improvement can be merged.
Most helpful comment
I believe the issue from my OP still exists.
Since I needed this in my project immediately, I forked the repo and did a quick fix of one case. You can see the change here.
Unfortunately, I don鈥檛 have time to make a proper PR here...