Firebase-js-sdk: Firestore: Batch operations hang in Node.js

Created on 20 Dec 2018  路  7Comments  路  Source: firebase/firebase-js-sdk


Describe your environment

  • Operating System version: Windows 10 Pro 1803 (but same problem on Arch Linux)
  • Browser version: Node v. 10.14.2 (same problem on Node 8.x)
  • Firebase SDK version: 5.7.0
  • Firebase Product: firestore

Describe the problem

When writing firestore data in using batch.set and batch.commit after some number of commit calls Firestore stops responding as if stuck in some kind of awaiting.

This happens when storing a lot of data. I have prepared a reproduction code. It seems to be sensitive to the size of data actually written in each commit (making the batch smaller mitigates the issue or making record data smaller). I also tried this in the firebase issue sandbox in chrome and the problem did not appear, so it seems it is node.js specific.

Steps to reproduce:

1) Run the following code
2) Writing of data freezes at some point and does not resume nor finish

In my personal experience this happens when saving batch 326. Output tail:

Saved batch #291, 14550-14599
Saving batch #321, 16050-16099
Saved batch #292, 14600-14649
Saving batch #322, 16100-16149
Saved batch #293, 14650-14699
Saving batch #323, 16150-16199
Saved batch #294, 14700-14749
Saving batch #324, 16200-16249
Saved batch #295, 14750-14799
Saving batch #325, 16250-16299
Saved batch #296, 14800-14849
Saving batch #326, 16300-16349

Then nothing happens (there should be 2000 batches written in total).

Relevant Code:

    const firebase = require("firebase");
    const { mergeMap, bufferCount } = require("rxjs/operators");
    const { range } = require("rxjs");
    firebase.initializeApp({
        apiKey: "AIzaSyBNHCyZ-bpv-WA-HpXTmigJm2aq3z1kaH8",
        authDomain: "jscore-sandbox-141b5.firebaseapp.com",
        databaseURL: "https://jscore-sandbox-141b5.firebaseio.com",
        projectId: "jscore-sandbox-141b5",
        storageBucket: "jscore-sandbox-141b5.appspot.com",
        messagingSenderId: "280127633210"
    });

    const firestore = firebase.firestore();
    const settings = { timestampsInSnapshots: true };
    firestore.settings(settings);
    firebase.firestore.setLogLevel("error");

    const data = {
        "name": "reporter",
        "version": "1.0.0",
        "description": "",
        "main": "dist/index.js",
        "scripts": {
          "start": "ts-node -r tsconfig-paths/register ."
        },
        "author": "",
        "license": "ISC",
        "dependencies": {
          "@types/auth0": "2.9.7",
          "@types/colors": "^1.1.3",
          "@types/node": "^7.0.29",
          "@types/node-fetch": "^1.6.7",
          "atob": "^2.0.3",
          "auth0": "2.14.0",
          "chalk": "^2.4.1",
          "colors": "^1.1.2",
          "firebase": "^5.7.0",
          "global-tunnel-ng": "^2.7.1",
          "https-proxy-agent": "^2.2.1",
          "node-fetch": "^1.7.1",
          "node-gzip": "^1.1.2",
          "promise.prototype.finally": "^3.1.0",
          "prompt": "^1.0.0",
          "ts-node": "^7.0.1",
          "tsconfig-paths": "^3.7.0",
          "typescript": "^3.2.2"
        }
      }

    range(0, 100000).pipe(
        bufferCount(50),
        mergeMap(async (ids, index) => {
            const batch = firestore.batch();
            ids.forEach(id => {
                const path = "test/"+id
                batch.set(firestore.doc(path), data);
            });
            console.log(`Saving batch #${index}, ${ids[0]}-${ids[ids.length-1]} `);
            await batch.commit();
            console.log(`Saved batch #${index}, ${ids[0]}-${ids[ids.length-1]} `);
            return ids;
        }, 30)
    ).subscribe(null, null, async () => await firebase.app().delete());

firestore

Most helpful comment

Just to update: my preliminary investigation shows that our stream to the backend is silently going dead... After ~300 batches, we stop receiving responses from the backend. The SDK has builtin throttling to never have more than 10 outstanding batches at a time, so it ends up waiting indefinitely for the next server response, but it never arrives.

Normally if there's any sort of failure (a write fails, the network connection is dropped, etc.), the stream should be closed with an error that is either reported to the user or automatically retried, but in this case the SDK isn't seeing any sort of error and so it just waits indefinitely, thinking the stream is still healthy.

I'll need to dig in deeper to figure out where the failure is happening. Perhaps gRPC (our underlying network transport) is encountering an error but not surfacing it to the SDK as an error. Or perhaps the backend is encountering some error that causes it to stop processing requests, but it's not properly terminating the stream.

I apologize but with the US holidays coming up, I likely won't be able to get to the bottom of this until January.

All 7 comments

I found a few problems with this issue:

  • I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
  • This issue does not seem to follow the issue template. Make sure you provide all the required information.

Output with firestore debug level:

[2018-12-20T13:56:06.854Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Acknowledge batch
[2018-12-20T13:56:06.869Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
Saved batch #295, 14750-14799 
Saving batch #325, 16250-16299 
[2018-12-20T13:56:06.883Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Get next mutation batch
[2018-12-20T13:56:06.899Z]  @firebase/firestore: Firestore (5.5.9) [Connection]: GRPC stream sending: { streamToken: <Buffer 10 a8 02 19 10 68 42 02 81 f4 b5 b5>,
  writes: 
   [ ... ]
[2018-12-20T13:56:06.939Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Locally write mutations
[2018-12-20T13:56:06.941Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
[2018-12-20T13:56:07.117Z]  @firebase/firestore: Firestore (5.5.9) [Connection]: GRPC stream received: { streamId: '',
  streamToken: <Buffer 10 a9 02 19 10 68 42 02 81 f4 b5 b5>,
  writeResults: 
   [ ... ]
[2018-12-20T13:56:07.117Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Acknowledge batch
[2018-12-20T13:56:07.121Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
Saved batch #296, 14800-14849 
Saving batch #326, 16300-16349 
[2018-12-20T13:56:07.123Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Get next mutation batch
[2018-12-20T13:56:07.128Z]  @firebase/firestore: Firestore (5.5.9) [Connection]: GRPC stream sending: { streamToken: <Buffer 10 a9 02 19 10 68 42 02 81 f4 b5 b5>,
  writes: 
   [ ... ]
[2018-12-20T13:56:07.159Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Locally write mutations
[2018-12-20T13:56:07.160Z]  @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges

Thanks for the report and for including repro code. I've been able to reproduce it and will let you know what I can figure out.

Just to update: my preliminary investigation shows that our stream to the backend is silently going dead... After ~300 batches, we stop receiving responses from the backend. The SDK has builtin throttling to never have more than 10 outstanding batches at a time, so it ends up waiting indefinitely for the next server response, but it never arrives.

Normally if there's any sort of failure (a write fails, the network connection is dropped, etc.), the stream should be closed with an error that is either reported to the user or automatically retried, but in this case the SDK isn't seeing any sort of error and so it just waits indefinitely, thinking the stream is still healthy.

I'll need to dig in deeper to figure out where the failure is happening. Perhaps gRPC (our underlying network transport) is encountering an error but not surfacing it to the SDK as an error. Or perhaps the backend is encountering some error that causes it to stop processing requests, but it's not properly terminating the stream.

I apologize but with the US holidays coming up, I likely won't be able to get to the bottom of this until January.

@mikelehen Thanks for the report. We've created a workaround - we're running the code in a browser instead, so this is not a blocker for us anymore. Happy holiday. 馃巹

I believe we've found the root cause of this as an issue in the backend which is tracked internally by b/124060178. It could be a few weeks before we get a fix rolled out. In the meantime, the best workaround is to call disableNetwork() followed by enableNetwork() periodically if you are sending a large amount of writes in a short timeframe. Sorry for the trouble!

For anybody coming across this in the future, the backend issue has been fixed and released, so this issue should no longer occur. If you see something similar, please open a new issue.

Was this page helpful?
0 / 5 - 0 ratings