Question.
We noticed a significant performance regression (particularly on large responses) after the Apollo 2.x upgrade, and the switch androidx/sqldelight database while using the normalized sql cache.
Part of this can be mitigated by running the database actions in a transaction like the old sql cache, but even with that we noticed a perf hit with the new db.
A more complete solution is to make the ApolloCacheInterceptor perform the response saving on a new thread - there may be a minute risk that a new request that is executed after a response of a previous request will read before the first request has dispatched it's save, but that seems quite unlikely, and the worst case is that you get a cache miss.
Overall it seems pretty safe as:
1) The Store is already thread safe
2) All side effects of saving a response are already dispatched on a new thread
Does anyone see any issues with making this change?
We should be able to reach the same performance without adding new threads, right?
Or is there a more fundamental difference between SQLDelight and the Android DB? I'm always a bit reluctant to adding threads if not 100% necessary as it's another source of race conditions and sneaky bugs.
I know the lack of the transaction is one reason of the slowdown, I need to do a bit more research into what else changed. Agree that we should optimize performance of the database regardless.
However, from a design perspective, it doesn't seem correct that a database write blocks an already parsed response from being used. We have been searching for a while why migrating to GraphQL generally causes performance to regress on a page, and I am pretty sure this is the cause -- I will pull some profiles to give a bit more context.
I've made a small benchmark that writes a query to the cache. The query is a list of Github repositories containing "lorem ipsum" in the description.
Results are (run on a Pixel 3 XL):
~* 1.4.5 -> 12417ms to write 100M queries~
~* 2.2.1 -> 13544ms to write 100M queries~
Edit: these numbers were completely wrong because I forgot to execute the actual write. In reality, it's closer to:
Edit2: on top of that, the data needs to be randomized. See the comment from Ben below for more details.
That's ~10% slower if I do the maths right. Is that the same level of degradation you observe on your end?
We saw a ~10x increase across all percentiles of parsing time. On P95 this was about 4 seconds
@martinbonnin So I dug a bit deeper into the issue - before the 2.1.0 upgrade, we actually had our normalized cache configured wrong, and were only getting the in memory cache. So the upgrade was more akin to re-enabling the sql cache.
We have some queries with 2000 records, which means we are performing 2000 read/write operations on a sql database. It does make sense that re-enabling a transaction will help a bit and could account for the 10% profile change you see (particularly on the write path), but that volume of operations is definitely going to result in a perf hit compared to not using the sql cache.
Blocking a response by up to 4 seconds by enabling sql cache seems like pretty compelling support for moving forward with making the apollo store operations non-blocking (or at least a configurable option).
Will explore more if we can improve the efficiency of the sql cache in general. A 4 second operation on a non-blocking thread isn't fantastic either.
A couple things about the performance example need to be updated:
1) execute is never called on the write operation, so it isn't doing anything
2) The dispatcher needs to be set to be blocking to simulate the api which is used while writing a response, as the write operation api uses a dispatcher
3) The cache is optimized to avoid writes if the data has not changed, so generateData should have some random seed and be invoked on each iteration
After making those changes, and dropping the iterations down to 100 I get (At cache record size of ~1000, pixel 4):
Without the change to add randomness the two versions performed similarly
Adding back the transaction block brought the performance to a comparable 429 ms per operation. I'll go ahead an put up a PR to fix that.
That benchmark was pretty wrong indeed, Thanks for fixing it! I pushed a new commit to master with most, if not all, of your changes. Thanks!
Regarding making the writes non-blocking, we should keep the guarantee that a read after a write returns the updated value:
// returns as soon as the network response is received and writes the cache in the background
val response1 = client.query(SomeQuery()).responseFetcher(ApolloResponseFetchers.NETWORK_ONLY).toDeferred().await()
// should always succeed, no matter the timings
val response2 = client.query(SomeQuery()).responseFetcher(ApolloResponseFetchers.CACHE_ONLY).toDeferred().await()
Maybe it can work with some kind of "memory-barrier" that waits until all writes are performed before a read can be done?
So long as the dispatcher that is used with Apollo executes tasks in the order they are submitted, I think we can achieve the ordering requirement pretty easily if we change ApolloInterceptor to use enqueue vs execute when it reads from cache.
Edit: actually I think we are safe to just move write to use the dispatcher as the entire interceptor is already enqueued in the dispatcher.
This would mean that a cache write is guaranteed to be enqueued before any subsequent cache read so long as we enqueue the write before returning the data.
Other less pressing ideas of how to improve performance: