Realm-java: Iterating result very slow

Created on 6 Oct 2017  路  31Comments  路  Source: realm/realm-java

Hi,

Here are my java classes :

public class PublicArticleEntity extends RealmObject {

    /**
     * For fast import no @PrimaryKey defined
     */
    private String id;

    private String type;

    @Index
    private String manufacturerId;

    private Boolean sterilizable;

    private Boolean active;

    private Date importModified;

    private RealmList<PublicArticlePartNumberEntity> partNumbers = new RealmList<>();
    private RealmList<PublicArticleCodificationEntity> codifications = new RealmList<>();

public class PublicArticlePartNumberEntity extends RealmObject {

    @Index
    private String partNumber;

    @Index
    private String normalizedPartNumber;

    private PublicArticleEntity publicArticle;
}

public class PublicArticleCodificationEntity extends RealmObject {

    @Index
    private String type;

    @Index
    private String value;

    private PublicArticleEntity publicArticle;
}

And here is the piece of code where I have a problem. The query returns 24 results and I use the limit to get the 21 first elements. It tooks 2000ms to do the iteration :o

RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .equalTo("publicArticle.manufacturerId", manufacturerId.toString())
                .beginGroup()
                .contains("partNumber", search, Case.INSENSITIVE)
                .or()
                .contains("normalizedPartNumber", search, Case.INSENSITIVE)
                .endGroup()
                .findAllSorted("partNumber", Sort.ASCENDING);

        Set<String> items = new HashSet<>();
        int listSize = entities.size();
        for (int i = 0; i < listSize && i < limit; i++) {
            items.add(entities.get(i).getPartNumber());
        }

Here the state of each :
screenshot at oct 06 15-04-13

Because Realm is using lazy loading the request if very very fast, but the for is taking 2s.
Did I miss something ?

T-Bug

Most helpful comment

So here is the test as @beeender requested.
I'm in debug mode so it takes more time and on a production device that is waaaaaay slower than my One Plus 3T, but it's a good thing to see differences.

First test with current code (not changed) :
first

Test with code

entities.get(0).getPartNumber();
int listSize = entities.size();

screenshot at oct 11 12-38-29

All 31 comments

Did I miss something ?

Reading from 900000 elements and putting them all into a Set is not what I call "lazy-loaded" 馃槃

However you could use a RealmResults<PublicArticlePartNumberEntity> and result.get(i).getPartNumber() on a query constructed as

3.7.2 and below:

    RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .equalTo("publicArticle.manufacturerId", manufacturerId.toString())
                .beginGroup()
                .contains("partNumber", search, Case.INSENSITIVE)
                .or()
                .contains("normalizedPartNumber", search, Case.INSENSITIVE)
                .endGroup()
                .distinct("partNumber");

4.0.0-RC1 and above:

    RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .equalTo("publicArticle.manufacturerId", manufacturerId.toString())
                .beginGroup()
                .contains("partNumber", search, Case.INSENSITIVE)
                .or()
                .contains("normalizedPartNumber", search, Case.INSENSITIVE)
                .endGroup()
                .findAllSorted("partNumber", Sort.ASCENDING)
                .where()
                .distinct("partNumber");

Edit : Updated my post, the query returns only 24 results, not 900k ;)

24 results takes 2000 ms?!

And I guess if you do method profiling, then you'll find that the culprit is nativeGet()? Can you verify this?

This might be something only the Realm people can help with ( @beeender @cmelchior ) then.

Here it is, sorted by exclusive time.

screenshot at oct 06 17-36-12

I wonder if this is related in any way to https://github.com/realm/realm-java/issues/5328 but the query is different (it also involves a link)

Just out of curiosity, can you try benchmarking the first time you touch the RealmResults, i.e. entities.size()? I have a suspicion, that our lazy-loading might be a bit too lazy.

This may also be related to https://github.com/realm/realm-java/issues/5391 since the OP is reporting delays when using getDefaultInstance on his model

So here are some benchmarkings done with my phone (One Plus 3T) that is way faster than devices we use in production, so I did the same first bench, then with the first touch as @cmelchior asked.
It's not automated benchmarks :)
Is it good for you ?

realm_one_plus_total

realm_one_plus_first_touch

Got to say that we use Realm in prod, so this is pretty important for us :/

@mgohin did some testing for this

  1. Because of the lazy loading, when the RealmResults created, the query hasn't been executed. By design, it should be executed once when accessing the query results. In this case, the query actually should be executed when int listSize = entities.size(); called.

  2. The data set is relatively big, so the query would take a bit long time. From my testing, for 9k data, it roughly takes 250+ms on a Mi5 phone. (Please notice the my testing data set is different from yours and the query condition is not the same either.)

  3. I did see a problem that the query has been executed again when the first time items.add(entities.get(i).getPartNumber()); gets called. This should be a bug in the Realm side.

So i guess by fixing the bug 3, your case could be 50% faster.
To verify my assumption, would be please add this code before the int listSize = entities.size() gets called?

        try {
            entities.get(0).getPublicArticle();
        } catch (IndexOutOfBoundsException e) {
        }

If my assumption is correct for you case, by adding below code, it will actually make your case around 50% faster

PS.: my testing code

public class MainActivity extends AppCompatActivity {

    private Realm realm;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        Realm.init(this);
        setContentView(R.layout.activity_main);

        realm = Realm.getDefaultInstance();
        if (realm.isEmpty()) {
            populateData();
        }
        long start = System.currentTimeMillis();

        RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .equalTo("publicArticle.manufacturerId", "ID4232")
                .beginGroup()
                .contains("partNumber", "PArtNumber4232", Case.INSENSITIVE)
                .or()
                .contains("normalizedPartNumber", "SER4232", Case.INSENSITIVE)
                .endGroup()
                .findAll();
        /*
        // By this, it will magically make it faster.
        try {
            entities.get(0).getPublicArticle();
        } catch (IndexOutOfBoundsException e) {
        }
        */
        long size = entities.size();
        entities.get(0).getPublicArticle();

        long end = System.currentTimeMillis();
        long time = end - start;
        Log.e("TTT", "Query size is " + size + " takes " + time + "ms");
    }

    @Override
    protected void onDestroy() {
        super.onDestroy();
        realm.close();
    }

    private void populateData() {
        Log.e("TTT", "start populating");
        realm.beginTransaction();
        for (int i = 0; i < 932423; i++) {
            PublicArticleEntity publicArticleEntity = realm.createObject(PublicArticleEntity.class);
            publicArticleEntity.setManufacturerId("ID" + i);

            PublicArticlePartNumberEntity publicArticlePartNumberEntity = realm.createObject(PublicArticlePartNumberEntity.class);
            publicArticlePartNumberEntity.setPartNumber("PArtNumber" + i);
            publicArticlePartNumberEntity.setNormalizedPartNumber("SER" + i);
            publicArticlePartNumberEntity.setPublicArticle(publicArticleEntity);
        }
        realm.commitTransaction();
        Log.e("TTT", "populating done");
    }
}

If the query is executed, then why can nativeGet() still take lot of time?

So here is the test as @beeender requested.
I'm in debug mode so it takes more time and on a production device that is waaaaaay slower than my One Plus 3T, but it's a good thing to see differences.

First test with current code (not changed) :
first

Test with code

entities.get(0).getPartNumber();
int listSize = entities.size();

screenshot at oct 11 12-38-29

It downs the time but it's still way too long :p

@mgohin

  1. is the Realm encrypted? encryption will slow down the query, especially on low end devices.
  2. There are three query conditions and one sort conditions in your case, would you please try them individually and let us know which part is the bottle neck?
  3. is it possible for you to share this 900K+ records realm file with us? to [email protected] if you want to share it privately.

Also, would it make a difference if you change the query to this:

RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .beginGroup()
                .contains("partNumber", search, Case.INSENSITIVE)
                .or()
                .contains("normalizedPartNumber", search, Case.INSENSITIVE)
                .endGroup()
                .equalTo("publicArticle.manufacturerId", manufacturerId.toString())
                .findAllSorted("partNumber", Sort.ASCENDING);
  1. The Realm isn't encrypted
    2.3. May I send you the file and let you bench everything you want ? @beeender

@mgohin sure, but you might need to wait a bit longer. there are some other tasks waiting for me as well :)

@beeender sure, if it's not in 7 days :p

Looking at the query, I see that it has two contains conditions, which are probably the most expensive conditions you can make in a query (as they have the scan the entirety of each string for a possible substring match). Would it be possible to change them to equalTo or startsWith, or does the format of the strings in the values preclude that?

Searching through 900K elements in 2 secs is 2.2 micro seconds per element.

How large are the members which are touched by the query?
(part number, normalized part number and manufacturer id)

Is the condition on manufacturer id something that typically constrains the result set a lot?
If so, perhaps investigate the performance effect of setting an index on it.

@beeender file sent.

@astigsen : no I need to search with contains :/

@finnschiermer : the article has some codifications and partNumbers, but not that much.
The condition on manufacturerId si mandatory because it constrains a lot the result set

One potential change could be to encode the manufacturerID directly in the objects of the class you search instead of going through an indirection. That would allow you to set an index on it, which would speed up the search drastically. Obviously, it leaves you with the problem of keeping the manufacturerID fields consistent. May be a no-go.. or may be straightforward, depending on the rest of the application.

I'm trying all of your remarks, I'll let you know soon ;)

Another option could be to keep the classes as they are, but first do a search on the articles to get the articles with the right manufacturerID, then do a constrained search only among those part number entities which link to the right articles. This should be possible at Core level, but I don't know how it would be represented at Java level. Perhaps @beeender or @cmelchior can translate my idea?

note: these ideas all rely heavily on the assumption that the manufacturerID is an important constraint in the sense that it will filter out a lot of results. If this is not the case, these suggestions will not help.

@astigsen @finnschiermer I think there might be up with sorted result access based on my issue at https://github.com/realm/realm-java/issues/5328#issuecomment-332219195 , but I should really check what version introduced it.

btw what you said can probably be done as

    RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .equalTo("publicArticle.manufacturerId", manufacturerId.toString())
                .findAll()
                .where()
                .beginGroup()
                .contains("partNumber", search, Case.INSENSITIVE)
                .or()
                .contains("normalizedPartNumber", search, Case.INSENSITIVE)
                .endGroup()
                .findAllSorted("partNumber", Sort.ASCENDING);

If the query we're discussing here only results in ~30 items, then sorting cannot be expensive, so I doubt it can be linked to performance problems with sort.

@finnschiermer yeah you're most likely right about the sort, I was tackling a very similar issue, and came to the conclusion that the performance was decreased heavily when object store results were integrated, so this means that results.cpp's get() can do too many things for each index (compared to what it used to do before), but I'm not sure what it is.

Some good news about my search. I add the manufacturerId field into PublicArticlePartNumberEntity.java with an @Index and changed the search to

RealmResults<PublicArticlePartNumberEntity> entities = realm.where(PublicArticlePartNumberEntity.class)
                .equalTo("manufacturerId", manufacturerId.toString())
                .beginsWith("normalizedPartNumber", normalizePartNumber)
                .findAllSorted("partNumber", Sort.ASCENDING);

and now it works as expected, very fast !

I don't know if we need to keep the subject open ?

I am wondering maybe support something like:

public class PublicArticlePartNumberEntity extends RealmObject {
    @Index("manufacturerId")
    private PublicArticleEntity publicArticle;

is useful.
Not sure if this is possible/easy for core ? @finnschiermer

If I use contains it's a bit long for UX. I'll stick with beginWith for now

Was this page helpful?
0 / 5 - 0 ratings