When writing firestore data in using batch.set and batch.commit after some number of commit calls Firestore stops responding as if stuck in some kind of awaiting.
This happens when storing a lot of data. I have prepared a reproduction code. It seems to be sensitive to the size of data actually written in each commit (making the batch smaller mitigates the issue or making record data smaller). I also tried this in the firebase issue sandbox in chrome and the problem did not appear, so it seems it is node.js specific.
1) Run the following code
2) Writing of data freezes at some point and does not resume nor finish
In my personal experience this happens when saving batch 326. Output tail:
Saved batch #291, 14550-14599
Saving batch #321, 16050-16099
Saved batch #292, 14600-14649
Saving batch #322, 16100-16149
Saved batch #293, 14650-14699
Saving batch #323, 16150-16199
Saved batch #294, 14700-14749
Saving batch #324, 16200-16249
Saved batch #295, 14750-14799
Saving batch #325, 16250-16299
Saved batch #296, 14800-14849
Saving batch #326, 16300-16349
Then nothing happens (there should be 2000 batches written in total).
const firebase = require("firebase");
const { mergeMap, bufferCount } = require("rxjs/operators");
const { range } = require("rxjs");
firebase.initializeApp({
apiKey: "AIzaSyBNHCyZ-bpv-WA-HpXTmigJm2aq3z1kaH8",
authDomain: "jscore-sandbox-141b5.firebaseapp.com",
databaseURL: "https://jscore-sandbox-141b5.firebaseio.com",
projectId: "jscore-sandbox-141b5",
storageBucket: "jscore-sandbox-141b5.appspot.com",
messagingSenderId: "280127633210"
});
const firestore = firebase.firestore();
const settings = { timestampsInSnapshots: true };
firestore.settings(settings);
firebase.firestore.setLogLevel("error");
const data = {
"name": "reporter",
"version": "1.0.0",
"description": "",
"main": "dist/index.js",
"scripts": {
"start": "ts-node -r tsconfig-paths/register ."
},
"author": "",
"license": "ISC",
"dependencies": {
"@types/auth0": "2.9.7",
"@types/colors": "^1.1.3",
"@types/node": "^7.0.29",
"@types/node-fetch": "^1.6.7",
"atob": "^2.0.3",
"auth0": "2.14.0",
"chalk": "^2.4.1",
"colors": "^1.1.2",
"firebase": "^5.7.0",
"global-tunnel-ng": "^2.7.1",
"https-proxy-agent": "^2.2.1",
"node-fetch": "^1.7.1",
"node-gzip": "^1.1.2",
"promise.prototype.finally": "^3.1.0",
"prompt": "^1.0.0",
"ts-node": "^7.0.1",
"tsconfig-paths": "^3.7.0",
"typescript": "^3.2.2"
}
}
range(0, 100000).pipe(
bufferCount(50),
mergeMap(async (ids, index) => {
const batch = firestore.batch();
ids.forEach(id => {
const path = "test/"+id
batch.set(firestore.doc(path), data);
});
console.log(`Saving batch #${index}, ${ids[0]}-${ids[ids.length-1]} `);
await batch.commit();
console.log(`Saved batch #${index}, ${ids[0]}-${ids[ids.length-1]} `);
return ids;
}, 30)
).subscribe(null, null, async () => await firebase.app().delete());
I found a few problems with this issue:
Output with firestore debug level:
[2018-12-20T13:56:06.854Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Acknowledge batch
[2018-12-20T13:56:06.869Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
Saved batch #295, 14750-14799
Saving batch #325, 16250-16299
[2018-12-20T13:56:06.883Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Get next mutation batch
[2018-12-20T13:56:06.899Z] @firebase/firestore: Firestore (5.5.9) [Connection]: GRPC stream sending: { streamToken: <Buffer 10 a8 02 19 10 68 42 02 81 f4 b5 b5>,
writes:
[ ... ]
[2018-12-20T13:56:06.939Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Locally write mutations
[2018-12-20T13:56:06.941Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
[2018-12-20T13:56:07.117Z] @firebase/firestore: Firestore (5.5.9) [Connection]: GRPC stream received: { streamId: '',
streamToken: <Buffer 10 a9 02 19 10 68 42 02 81 f4 b5 b5>,
writeResults:
[ ... ]
[2018-12-20T13:56:07.117Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Acknowledge batch
[2018-12-20T13:56:07.121Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
Saved batch #296, 14800-14849
Saving batch #326, 16300-16349
[2018-12-20T13:56:07.123Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Get next mutation batch
[2018-12-20T13:56:07.128Z] @firebase/firestore: Firestore (5.5.9) [Connection]: GRPC stream sending: { streamToken: <Buffer 10 a9 02 19 10 68 42 02 81 f4 b5 b5>,
writes:
[ ... ]
[2018-12-20T13:56:07.159Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: Locally write mutations
[2018-12-20T13:56:07.160Z] @firebase/firestore: Firestore (5.5.9) [MemoryPersistence]: Starting transaction: notifyLocalViewChanges
Thanks for the report and for including repro code. I've been able to reproduce it and will let you know what I can figure out.
Just to update: my preliminary investigation shows that our stream to the backend is silently going dead... After ~300 batches, we stop receiving responses from the backend. The SDK has builtin throttling to never have more than 10 outstanding batches at a time, so it ends up waiting indefinitely for the next server response, but it never arrives.
Normally if there's any sort of failure (a write fails, the network connection is dropped, etc.), the stream should be closed with an error that is either reported to the user or automatically retried, but in this case the SDK isn't seeing any sort of error and so it just waits indefinitely, thinking the stream is still healthy.
I'll need to dig in deeper to figure out where the failure is happening. Perhaps gRPC (our underlying network transport) is encountering an error but not surfacing it to the SDK as an error. Or perhaps the backend is encountering some error that causes it to stop processing requests, but it's not properly terminating the stream.
I apologize but with the US holidays coming up, I likely won't be able to get to the bottom of this until January.
@mikelehen Thanks for the report. We've created a workaround - we're running the code in a browser instead, so this is not a blocker for us anymore. Happy holiday. 馃巹
I believe we've found the root cause of this as an issue in the backend which is tracked internally by b/124060178. It could be a few weeks before we get a fix rolled out. In the meantime, the best workaround is to call disableNetwork() followed by enableNetwork() periodically if you are sending a large amount of writes in a short timeframe. Sorry for the trouble!
For anybody coming across this in the future, the backend issue has been fixed and released, so this issue should no longer occur. If you see something similar, please open a new issue.
Most helpful comment
Just to update: my preliminary investigation shows that our stream to the backend is silently going dead... After ~300 batches, we stop receiving responses from the backend. The SDK has builtin throttling to never have more than 10 outstanding batches at a time, so it ends up waiting indefinitely for the next server response, but it never arrives.
Normally if there's any sort of failure (a write fails, the network connection is dropped, etc.), the stream should be closed with an error that is either reported to the user or automatically retried, but in this case the SDK isn't seeing any sort of error and so it just waits indefinitely, thinking the stream is still healthy.
I'll need to dig in deeper to figure out where the failure is happening. Perhaps gRPC (our underlying network transport) is encountering an error but not surfacing it to the SDK as an error. Or perhaps the backend is encountering some error that causes it to stop processing requests, but it's not properly terminating the stream.
I apologize but with the US holidays coming up, I likely won't be able to get to the bottom of this until January.