Firebase-tools: Add Firestore indexes to local emulator

Created on 14 Mar 2020  路  22Comments  路  Source: firebase/firebase-tools

[REQUIRED] Environment info


firebase-tools:
7.15.0

Platform:
Ubuntu 18.04.4

[REQUIRED] Test case


I'm trying to test our Firestore indexes, but they seem to always be working (regardless of the indexing settings included in firestore.indexes.json). Below is the test case in question:

const FILTERS = {
    'grade': ['==', TUTOR.grade],
    'gender': ['==', TUTOR.gender],
    'location': ['==', TUTOR.location],
    'type': ['==', TUTOR.type],
    'payments.type': ['==', TUTOR.payments.type],
    'config.showProfile': ['==', true],
};
const SORTERS = [
    'avgRating',
    'numRatings',
];

describe('Tutorbook\'s Database Indexing', () => {
    const db = authedApp({
        uid: PUPIL.uid,
        email: PUPIL.email,
        access: PUPIL.access,
    });
    const filterCombos = combinations(Object.keys(FILTERS));
    return Promise.all(filterCombos.map(filters => it('lets users filter ' +
        'profiles by ' + filters.join(', '), () => {
            var query = db.collection('users')
                .where('access', 'array-contains-any', PUPIL.access);
            filters.map(filter => query =
                query.where(filter, FILTERS[filter][0], FILTERS[filter][1]));
            return Promise.all(SORTERS.map(sorter =>
                firebase.assertSucceeds(query.orderBy(sorter).get())));
        })));
});

[REQUIRED] Steps to reproduce


Just try testing any type of composite or collection group query that would (normally) require an index. It will work on the emulator suite regardless of the existence of the required index (but it won't work in production without the index).

[REQUIRED] Expected behavior


These tests should fail unless there are Firestore indexes (specified in firestore.indexes.json) that support the composite indexes tested.

[REQUIRED] Actual behavior


All the tests succeed (even though they shouldn't).

emulator-suite firestore

Most helpful comment

Not really, no. We have dozens of services interacting with the same database. Each service is developed in a separate repository, by independent teams, deployed individually. We also follow a true to the bone continuous delivery process, so we have multiple (tens/hundreds) of production deployments _daily_.

With that in mind, what you're asking is if for these hundreds deployments daily we can run tests of all our services, instead of just one service, for which the change is made. Let's explore it with an example:

Let's assume 50 services, 300 deployments daily (total across all services), 10 minutes of CI time to run tests for each service. That consumes 300*10=3000 build minutes daily. If we'd run that for all services each time, it's 50 times more, so the cost increase is very significant.

On top of that, what's even worse, some services already run tests on 32-CPU machines (fortunately the emulator supports multithreading). We obviously can't afford a test run that takes 5010 minutes, so we'd have to run it in parallel. But then we'd need a VM that has 5032=1600 CPU cores, and such a thing doesn't exist. We could of course run Firestore emulator on a separate VM and execute all tests against it, but the emulator is not very well optimized and it consumes very significant resources for tests of each service, e.g. on a 32 core machine we usually dedicate ~16 cores to the emulator.

For your second point, we could theoretically do that, but that sounds a little complex. I imagine that for each CI run we'd have to pull current Firestore indexes from a central repository, run tests for a service and then validate that all indexes exist. But then we'd be implementing ourselves what we defined here as the first option ("match production"): for the emulator itself to fail when indexes don't exist. And as it was previously mentioned in this very thread, it's not as easy as it might look like, specifically due to features such as index merging.

So for me it drills down to implementing "match production" in Firestore or implementing it myself, so I'd obviously prefer it to be implemented in the Emulator.

I also understand your point, I see usefulness of it for small applications, but I believe that with the growth of any application "match production" will eventually become a preference for everyone.

All 22 comments

Just FYI the ultimate goal here is to test each possible combination of the different filters (as defined in FILTERS and SORTERS) available in our app's search screen.

@nicholaschiang thanks for the feedback! Right now you're correct that the Firestore emulator will execute any query even if the index has not been "created".

@yuchenshi do you know if we will be able to support stricter indexing in the Firestore emulator?

Strict indexing in the Firestore Emulator is not a planned feature (yet) but I'll definite discuss this with the team.

On the other hand, we do plan to support dumping all indexes automatically created (say during the test), which you can compare against firestore.indexes.json. I believe this will address OP's use case as well.

@yuchenshi That would be a useful (although not exactly what I'm asking for) feature; those "tests" would ensure that my firestore.indexes.json supported the "tested" queries. Just make sure that feature accounts for Firestore's ability to merge indexes and doesn't just create a composite index for each query type.

Oh, and @samtstern you should have this specification (that the Firestore emulator will execute any query even if there isn't an index to support it) added to the differences between emulator and production webpage.

Its unfortunate that firestore emulator doesn't support index rules.
@yuchenshi I see it useful feature considering the merge will be handled. Thats like a master piece!

What are the chances to consider this feature?

We're considering both features of dumping and tracking index rules and we'd appreciate input on the use cases.

As a developer, how do you plan on using emulators to prototype, test, and gain more confidence about your index rules? Shoot us some workflows just like what @nicholaschiang did and we'll figure out the best way to make that happen.

Great!
Our use-case is pretty similar to the OP mentioned.
We have a REST Api's which has optional query parameters. To check if each request has required indexes already in place, we use jest framework to fire the requests with all possible combinations. If it throws errors that indexes are missing and we can create the required indexes.

So for 4 optional query params with multiple values allowed in each param, we need to test around more than 200 possibilities. And creating indexes for multiple combinations manually is error prone.

If there is a provision for firestore emulator to create the required indexes list, we can just run the all possible cases generated through code and use the indexes created from firestore emulator. This in-fact saves lot of time.

@yuchenshi Can you please let us know if anything planned related to this feature?

@yuchenshi we just got bitten by this.

We deployed a quick fix, which actually broke later in production due to the missing index. Note that that part of the code was tested, so it'd be great if the emulator had the same behavior as production.

@manwithsteelnerves While we don't share timelines and cannot commit to any plans, but I do want to let you know that we hear you and real pain points (including the latest one shared by @sk- ) helps a lot driving our internal discussion. It's a technically challenging and involved feature for sure to get the index semantics match production, but that doesn't mean we're not considering / doing it.

As I see it may take more time, I ended up creating 139 manual indexes for my project for all possible test cases. Hopefully, there will be a better solution through emulator workflow!

Out of the two options discussed "match production" and "dumping all indexes automatically created [during some set of tests]" , if we could only have one then I prefer the second one. For me, the point of writing a bunch of automated tests is that so later when I add some feature or fix some bug, I don't break a bunch of other things. With that in mind, if the emulator could dump the list of indexes that I need that aren't already in my firestore.indexes.json- that seems pretty helpful.

If I ran a test set and nothing came out then I'd know that some tests I added (to verify some bug) weren't increasing the indexes I need. Or if I prototype some feature, it would be nice if the emulator made it clear "hey buddy, you're going to need another 100 indexes to support this" by dumping a list of those hundred indexes and I can realize I had a stupid idea before getting too far into it.

Matching production by giving me an error is kinda useful, but just giving the list of additional indexes needed is going to help me make progress faster I think. And @nicholaschiang 's comment is critical "Just make sure that feature accounts for Firestore's ability to merge indexes and doesn't just create a composite index for each query type." Might even be nice if it also said "hey, you've got this set of indexes [X,Y,Z] in your firestore.indexes.json that were not necessary during this test run".

I'd be in favor of the former (match production) because the latter works only if you have a single application talking to the database - what if you have dozens services in separate repositories talking to the same database?

the latter works only if you have a single application talking to the database

Wouldn't you just reset the emulator, run whatever tests you have on your dozens of services in separate repositories, and then end up with the final set of indexes that you need? I guess I don't see how it's any different than a single application. Just as a single application is using one Firebase project, presumably your dozens are services are as well?

I guess in the case were you had multiple firebase projects being served by one emulator, you'd want your indexes grouped by project. But otherwise I don't see a difference- though I could easily be missing something :-)

I was referring to

if the emulator could dump the list of indexes that I need that aren't already in my firestore.indexes.json

I assumed you'd like to run your tests and then deploy the indexes in the CI. That wouldn't work in my case, where I have dozens of separate repositories (services) and indexes are stored in a separate repository, shared by all of the services.

So, "if we could only have one", for my use case I'd prefer to load my manually crafted indexes (where you also have optimizations for de-duplication of queries) in all of the services and run test against it. If something fails, I'd add these indexes to my repository with indexes.

Hope that explains my reasoning better 馃檪

I assumed you'd like to run your tests and then deploy the indexes in the CI. That wouldn't work in my case, where I have dozens of separate repositories (services) and indexes are stored in a separate repository, shared by all of the services.

To understand better,

If you have single firestore db used by multiple services, isn't it possible to run all the diff service tests and update the dumped index file from emulator?
Or
Run tests per project, merge each dumped indexe data, and push to your main firestore?

Not really, no. We have dozens of services interacting with the same database. Each service is developed in a separate repository, by independent teams, deployed individually. We also follow a true to the bone continuous delivery process, so we have multiple (tens/hundreds) of production deployments _daily_.

With that in mind, what you're asking is if for these hundreds deployments daily we can run tests of all our services, instead of just one service, for which the change is made. Let's explore it with an example:

Let's assume 50 services, 300 deployments daily (total across all services), 10 minutes of CI time to run tests for each service. That consumes 300*10=3000 build minutes daily. If we'd run that for all services each time, it's 50 times more, so the cost increase is very significant.

On top of that, what's even worse, some services already run tests on 32-CPU machines (fortunately the emulator supports multithreading). We obviously can't afford a test run that takes 5010 minutes, so we'd have to run it in parallel. But then we'd need a VM that has 5032=1600 CPU cores, and such a thing doesn't exist. We could of course run Firestore emulator on a separate VM and execute all tests against it, but the emulator is not very well optimized and it consumes very significant resources for tests of each service, e.g. on a 32 core machine we usually dedicate ~16 cores to the emulator.

For your second point, we could theoretically do that, but that sounds a little complex. I imagine that for each CI run we'd have to pull current Firestore indexes from a central repository, run tests for a service and then validate that all indexes exist. But then we'd be implementing ourselves what we defined here as the first option ("match production"): for the emulator itself to fail when indexes don't exist. And as it was previously mentioned in this very thread, it's not as easy as it might look like, specifically due to features such as index merging.

So for me it drills down to implementing "match production" in Firestore or implementing it myself, so I'd obviously prefer it to be implemented in the Emulator.

I also understand your point, I see usefulness of it for small applications, but I believe that with the growth of any application "match production" will eventually become a preference for everyone.

Ok, thats really big scale. Its great to see you are able to manage those many projects where I'm struggling to fit firestore in one because of its inconsistent latencies and limitations.

So, will dumping the missing indexes compared to production will definitely solve all problems right? It's great to see an index merged dump though. And these indexes will be updated to production manually.

This will work for us too as initially nothing will be on production for us and all dumped indexes will be shown by the emulator which covers our case too.

TL;DR - I agree with the argument for "match production", it would help very large projects a lot. Would still be nice if there was also a way to generate an optimized list of indexes after running a suite of tests, both for new projects and for large long running projects that may have built up a list of non-optimized indexes.

Loooong answer:
Yes, you clearly have a well oiled machine going there. The only reason I'm commenting is since they asked us for use cases, I'm trying to help hash it out so they'll have a clear picture (one way or the other) to implement this thing we all need :-)

I was coming at this from the view of having a project with a firestore.indexes.json file that you update with whatever indexes you need, and do a firestore deploy and it uploads your indexes, your rules, your functions, etc. But you aren't doing that @merlinnot , are you? Your devs are writing their own projects, making changes, the CI is running tests for that project, and if the tests pass then it deploys that project's artifacts to production Firestore. But the CI is not updating any rules or indexes, is it? This is where the pain is now- the tests can pass but then in production there's a missing index so there's a failure so then someone goes into the Firestore console (or you have your own client that assists you with your flow) and adds that index. This is why you said:

I imagine that for each CI run we'd have to pull current Firestore indexes from a central repository, run tests for a service and then validate that all indexes exist

So I think it helps your use case to be clear that when you say "match production", you do NOT mean that some central repository's firestore.indexes.json and firestore.rules is loaded to the emulator. You mean that the emulator downloads the indexes and rules from production, and proceeds to "match production" as fully as possible. You need it to behave this way because due to index merging, it's possible that you'll waste indexes by only loading a firestore.indexes.json for a single project (unless they all have completely separated collections and there's no collectionGroups across them). Is that correct?

That would be pretty useful in your large case because then the CI would fail, and it would fail specifically on the test that was lacking the hypothetical "new needed index". That would go right back to the developer that wrote it without needing to test the entire set of projects.

For new developing projects, it's nice to have the other way where you just get a list of needed indexes after running all your tests, update your firestore.indexes.json file, and then feel comfortable to deploy to production. Especially if you could get an optimized list that takes advantage of merging. But as @merlinnot points out, with the benefit of wisdom from hard knocks I'm sure, is that as projects grow they will begin to prefer "match production". I can't disagree with that. The pluses are large in the true CI case with many projects- you get failure right where it belongs. These large projects are likely where Firebase should optimize (at least for financial benefit). The downside is you can end up with a very non-optimized set of indexes because you added each as they were needed.

Hope this helps Firebase devs! Thanks @merlinnot for your very well thought out and detailed description of your use case.

@jpangburn @merlinnot I just want to say this conversation is great and we're listening to both of you!

@jpangburn By "match production" I mean loading indexes manually myself, not downloading it automatically. I'd prefer that as we strive to keep our development environments fully offline, both for performance and ability to develop under any conditions - aircrafts or simply when one's internet breaks down. We'd probably end up fetching it from a central place and caching, but definitely loading ourselves to the emulator, the same as it's currently done with Firestore Rules. That also allows people to test changes to indexes, we'd definitely have tests in the "Firestore indexes" repository.

At the same time I fully agree that auto-generating indexes would be very helpful for small/medium size applications, would speed up development a lot.

So I guess we can agree that both features would be very useful. Maybe they could be somehow combined, e.g. when some indexes are missing, print helpful messages:

  • "A new index is required for this query, add to your indexes.json file."
  • "Extending an existing index is required for this query, add to your index."

We could also have some flag that would result in generating these indexes automatically instead of failing queries.

Was this page helpful?
0 / 5 - 0 ratings