Apollo-android: [Performance] Move normalized ApolloStore write transactions to background thread.

Created on 26 Jun 2020 · 11Comments · Source: apollographql/apollo-android

Question.

We noticed a significant performance regression (particularly on large responses) after the Apollo 2.x upgrade, and the switch androidx/sqldelight database while using the normalized sql cache.

Part of this can be mitigated by running the database actions in a transaction like the old sql cache, but even with that we noticed a perf hit with the new db.

A more complete solution is to make the ApolloCacheInterceptor perform the response saving on a new thread - there may be a minute risk that a new request that is executed after a response of a previous request will read before the first request has dispatched it's save, but that seems quite unlikely, and the worst case is that you get a cache miss.

Overall it seems pretty safe as:

1) The Store is already thread safe

2) All side effects of saving a response are already dispatched on a new thread

Does anyone see any issues with making this change?

Question

Source

BenSchwab

All 11 comments

We should be able to reach the same performance without adding new threads, right?

Or is there a more fundamental difference between SQLDelight and the Android DB? I'm always a bit reluctant to adding threads if not 100% necessary as it's another source of race conditions and sneaky bugs.

martinbonnin on 26 Jun 2020

I know the lack of the transaction is one reason of the slowdown, I need to do a bit more research into what else changed. Agree that we should optimize performance of the database regardless.

However, from a design perspective, it doesn't seem correct that a database write blocks an already parsed response from being used. We have been searching for a while why migrating to GraphQL generally causes performance to regress on a page, and I am pretty sure this is the cause -- I will pull some profiles to give a bit more context.

BenSchwab on 26 Jun 2020

I've made a small benchmark that writes a query to the cache. The query is a list of Github repositories containing "lorem ipsum" in the description.

Results are (run on a Pixel 3 XL):

~* 1.4.5 -> 12417ms to write 100M queries~
~* 2.2.1 -> 13544ms to write 100M queries~

Edit: these numbers were completely wrong because I forgot to execute the actual write. In reality, it's closer to:

1.4.5 -> 19735ms to write 100 queries
2.2.1 -> 21400ms to write 100 queries

Edit2: on top of that, the data needs to be randomized. See the comment from Ben below for more details.

That's ~10% slower if I do the maths right. Is that the same level of degradation you observe on your end?

martinbonnin on 30 Jun 2020

We saw a ~10x increase across all percentiles of parsing time. On P95 this was about 4 seconds

BenSchwab on 30 Jun 2020

@martinbonnin So I dug a bit deeper into the issue - before the 2.1.0 upgrade, we actually had our normalized cache configured wrong, and were only getting the in memory cache. So the upgrade was more akin to re-enabling the sql cache.

We have some queries with 2000 records, which means we are performing 2000 read/write operations on a sql database. It does make sense that re-enabling a transaction will help a bit and could account for the 10% profile change you see (particularly on the write path), but that volume of operations is definitely going to result in a perf hit compared to not using the sql cache.

Blocking a response by up to 4 seconds by enabling sql cache seems like pretty compelling support for moving forward with making the apollo store operations non-blocking (or at least a configurable option).

BenSchwab on 30 Jun 2020

Will explore more if we can improve the efficiency of the sql cache in general. A 4 second operation on a non-blocking thread isn't fantastic either.

BenSchwab on 30 Jun 2020

A couple things about the performance example need to be updated:

1) execute is never called on the write operation, so it isn't doing anything
2) The dispatcher needs to be set to be blocking to simulate the api which is used while writing a response, as the write operation api uses a dispatcher
3) The cache is optimized to avoid writes if the data has not changed, so generateData should have some random seed and be invoked on each iteration

After making those changes, and dropping the iterations down to 100 I get (At cache record size of ~1000, pixel 4):

1.4.5 -> 31738 millis or 317ms per operation
2.2.1 -> 114298 millis or 1142 ms per operation

Without the change to add randomness the two versions performed similarly

BenSchwab on 1 Jul 2020

Adding back the transaction block brought the performance to a comparable 429 ms per operation. I'll go ahead an put up a PR to fix that.

BenSchwab on 1 Jul 2020

👍1

That benchmark was pretty wrong indeed, Thanks for fixing it! I pushed a new commit to master with most, if not all, of your changes. Thanks!

Regarding making the writes non-blocking, we should keep the guarantee that a read after a write returns the updated value:

 // returns as soon as the network response is received and writes the cache in the background
val response1 = client.query(SomeQuery()).responseFetcher(ApolloResponseFetchers.NETWORK_ONLY).toDeferred().await()
// should always succeed, no matter the timings
val response2 = client.query(SomeQuery()).responseFetcher(ApolloResponseFetchers.CACHE_ONLY).toDeferred().await()

Maybe it can work with some kind of "memory-barrier" that waits until all writes are performed before a read can be done?

martinbonnin on 1 Jul 2020

So long as the dispatcher that is used with Apollo executes tasks in the order they are submitted, I think we can achieve the ordering requirement pretty easily if we change ApolloInterceptor to use enqueue vs execute when it reads from cache.

Edit: actually I think we are safe to just move write to use the dispatcher as the entire interceptor is already enqueued in the dispatcher.

This would mean that a cache write is guaranteed to be enqueued before any subsequent cache read so long as we enqueue the write before returning the data.

BenSchwab on 1 Jul 2020

Other less pressing ideas of how to improve performance:

batch all the record reads in the normalized cache (unverified, but I imagine reading a row one at a time in a loop is slower than selecting the entire set)
have a more robust read/write lock system, that allows reads while a disjoint write is occurring. I believe that even if we offload this expensive work to a new thread, multiple operations executed in close succession face a risk where the expensive write blocks the read, essentially delaying the read by the original cost of the write we see in this thread

BenSchwab on 1 Jul 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings