Firebase-js-sdk: FR: limit cost for reading unchanged data

Created on 16 Sep 2020 · 6Comments · Source: firebase/firebase-js-sdk

Hello,

Scenarios were data do not change a lot are common cases, and Firebase is very expensive in these situations.
I understand that an "out of the box" solution is very difficult to implement (from your answer here: https://github.com/firebase/firebase-js-sdk/issues/3422)

In May, I was requesting the addition of a "source" parameter for the onSnapshot that would give devs the opportunity to optimize themselves the requests: https://github.com/firebase/firebase-js-sdk/issues/3040

Do you have more insights on this?

Thanks

firestore feature request

Source

Pitouli

👍1

Most helpful comment

I talked to my team about this today. While we can't make a firm commitment, if we do have some spare cycles this is something we would like to work on during the remainder of the year. This is rough for me to translate to firm release timelines, but we will post updates here if we can narrow done when this might be released.

schmidt-sebastian on 29 Sep 2020

👍3

All 6 comments

@Pitouli Thanks for chiming in.

I will bring #3040 back up with the team again, but it would still help us to know if there is more user demand. Are you aware of other developers that are running into this billing limitations? We certainly don't need an online petition, but it would be good to have some more feedback before we spend a couple weeks of engineering resources on implementing this feature across all of our platforms.

You are likely aware that you can achieve some of the cost savings you envision by querying documents that have changes since a timestamp, which would ensure that documents that haven't changed are excluded from transfer even if the query hasn't been listened to for more than 30 minutes. Since this doesn't handle all use cases (e.g. deletes or collections with a high write rate), this is not something we can do on behalf of all of our users, but it is something that may work on a case by case basis.

schmidt-sebastian on 18 Sep 2020

Hello @schmidt-sebastian, thanks for your answer :)

Are you aware of other developers that are running into this billing limitations?

Concerning the user demand for methods to reduce the reading cost of cached data, it's hard to me to say since no one is asking me, but I can found some examples on the internet:

https://medium.com/firebase-tips-tricks/how-to-drastically-reduce-the-number-of-reads-when-no-documents-are-changed-in-firestore-8760e2f25e9e -> 462 claps + 222 claps on a repost (not mines of course)

Other tickets https://github.com/firebase/firebase-js-sdk/issues/471 https://github.com/firebase/firebase-js-sdk/issues/3422

Questions about how to reuse cache to limit cost:
https://stackoverflow.com/questions/52700648/how-to-avoid-unnecessary-firestore-reads-with-cache
And few other there: https://stackoverflow.com/search?page=1&tab=Relevance&q=firestore%20read%20cache

Here we have people who are asking for the same thing than me (onSnapshot from cache):
https://stackoverflow.com/questions/63712571/force-stream-based-on-firestore-snapshots-to-read-cache
https://stackoverflow.com/questions/60374413/reduce-firebase-cloud-firestore-reads-by-using-cache

it would be good to have some more feedback before we spend a couple weeks of engineering resources on implementing this feature across all of our platforms

I can completely understand this, especially if it involves lot of rework.

From my perspective -- which obviously is very "naive" -- my request is "only" to make available something that I believe you use internally (for offline case) and therefore harmonize get() and onSnapshot().

On the contrary, I understand that making a perfect out-of-the-box "onSnapshot" that intelligently and automatically merge cache and server data without re-reading data already there is highly complicated (if not impossible) to make it work in all cases for all the reasons you shared (would require adding metadata, how to handle deletions, etc.)

Of course, I could be underevaluating the difficulty of my "simple" request, and perhaps it also raises complicated questions I have not anticipated.

You are likely aware that you can achieve some of the cost savings you envision by querying documents that have changes since a timestamp

In my case where each user as its "private" list of items (between 10 and 1000 items approximately) which can only be modified by the user itself, I see 3 solutions to not re-read cached data, considering that I have a timestamps which tells when the last modification has occurred:

I get from cache if my local "last modification timestamp" is the same than the server one. Otherwise, I resync everything from the server.
- Pros: easy to do the initial load
- Cons: when the user uses the app, add, update or remove items, the list is no more reactive since the "get" is not a "listener". So I have to re-get from cache at every change. Performance wise, I think it can be quite costly, and means a big rework to detect the changes and redo the get.
I get from cache all my cache and find the most recent "update_timestamp" to determine when was my last sync. And I do a "onSnapshot" of all the items modified since.
- Pros: the most cost effective solution (I could almost run eternally without resyncing)
- Cons: I have to take care of intelligently merging the two lists by replacing the correct item when I receive an update (and I cannot count on the "index" metadata). I cannot delete items, I have to update them with a "deleted" flag instead so they trigger the listener and I can remove them from the merged list. So occasionally I must run a batch to remove all "deleted" flagged items, and re-sync all the cache with server (for example by saving the date of last "batch deletion").
I call a onSnapshot sourced from the cache if my local "last modification timestamp" is the same than the server one. Otherwise I resync everything from the server.
- Pros: easy to implement, it works exactly like offline (contrary to solution 1, the update are automatically taken care of by the no-latency feature)
- Cons: like solution 1, if I detect an update has been made from another device, I have to re-sync everything.

The 3rd solution is in my opinion a good balance between "cost efficiency" and "easyness to implement". But it requires the "listen from the cache" functionality.

Pitouli on 18 Sep 2020

@Pitouli Thank you for this very thorough and reasonable response. I will talk to my team about this and get back to you either this or next week.

schmidt-sebastian on 22 Sep 2020

👍2

schmidt-sebastian on 29 Sep 2020

👍3

Just found this thread, this would also be very valuable for us. We're using a lot of workarounds, to prevent firestore costs from exploding. If we could just sync changed data and pay for reads of the changed data, firestore would become way more flexibel and easy to use.

Benny739 on 29 Sep 2020

@Benny739 note that this is a discussion of how to create snapshot listeners that only read from cached data, avoiding reads from the server altogether. Perfect incremental sync is not possible today and this proposed API change doesn't change that.