Gorm: Improve preload speed

Created on 8 Nov 2017 · 8Comments · Source: go-gorm/gorm

When doing HasOne and BelongsTo Preloads (and maybe others also), GORM is quite slow when the there are many query results and many referenced results.

Example scenario:

Table User and table Profile, where User "belongs to" Profile (i.e. User has a ProfileID FK referencing a column ID in Profile)
Query all users in a table with Preload("Profile")
If there is large number of users (and large number of profiles), GORM is slow

I've looked at the code, and found that handleBelongsToPreload and handleHasOnePreload in callback_query_preload.go are inefficient when assigning results of relation queries.
Currently, ForeignField values from one set of results and AssociatedForeignField values from the other set are compared as strings in a nested for loop. This is quadratic complexity, which is quite bad if there is large number of rows (more than a few thousand).

This can be improved by storing field values of one set of results in a map with a string key, iterating through the field values/objects of the other set and looking for the same values in the map.

Source

vanjapt

👍5

Most helpful comment

I believe the issue from my OP still exists.

Since I needed this in my project immediately, I forked the repo and did a quick fix of one case. You can see the change here.

Unfortunately, I don’t have time to make a proper PR here...

vanjapt on 14 Dec 2017

👍5

All 8 comments

I found queries like this:

SELECT * FROM "gateway"  WHERE ("id" IN ('8','8','8','8','8','8','8','8','8','8'))

Would that be reduced to the following?
SELECT * FROM "gateway" WHERE ("id" IN ('8'))

Niondir on 9 Nov 2017

The issue and speed improvement I mentioned happens after the SELECT query, when the foreign field values returned by query are actually assigned to objects.

Duplicate keys in queries should also be optimized potentially, although I don't think impact on preload speed is as big as the assignment of field values to objects. I believe sql is smart enough to remove duplicates when executing the query. However, removing duplicates before would help to reduce data transmitted over network and might also help with speed in some cases.

vanjapt on 9 Nov 2017

The more reasonable way to tackle a performance problem is not to try to guess where the bottleneck is, but use a profiler and actually see what is causing slowness.

A quadratic-time loop is kind of a low hanging fruit for immediate speed improvements in this case though. How about a PR?

ezk84 on 17 Nov 2017

For me I just noticed the redundancy inside my query while debugging. I do not face actual performance issues. I was just wondering if this issue could be related and posted here instead of opening a separate issue.

I did not intended to "guess" where the bottleneck is, if that's what you mean ;)

Niondir on 22 Nov 2017

Is this still an issue ? am closing
cos imho, it a chat about things that not relevant.. atmo
BUT please repoen if missing the point.. thanks all

pedromorgan on 14 Dec 2017

I believe the issue from my OP still exists.

Since I needed this in my project immediately, I forked the repo and did a quick fix of one case. You can see the change here.

Unfortunately, I don’t have time to make a proper PR here...

vanjapt on 14 Dec 2017

👍5

I also have a need for this improvement, since the Preloads take several tens of seconds for fairly average resultsets (30000 simple records).

For now I am manually querying the Preload-objects and then looping the main resultset in order to populate the missing artifacts (which only takes a couple of milliseconds for the same resultset). It isn't a complex task, shouldn't gorm be doing it in the Preloads out-of-the-box?