Client
Firestore
Environment
Emulator running in the way described here: https://firebase.google.com/docs/rules/emulator-setup
firebase emulators:start --only firestore
Go Environment
$ go version
go version go1.14.4 linux/amd64
$ go env
O111MODULE=""
GOARCH="amd64"
GOBIN="/home/robert/go/bin"
GOCACHE="/home/robert/.cache/go-build"
GOENV="/home/robert/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/robert/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build613925690=/tmp/go-build -gno-record-gcc-switches"
FIRESTORE_EMULATOR_HOST=localhost:8080
Code
e.g.
package main
import (
"context"
"fmt"
"sync"
"time"
"cloud.google.com/go/firestore"
)
func main() {
// it can be even 1 minute, it's still working in the same way
timeout := time.Second * 1
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
workersCount := 2 // two workers are enough to reproduce
workersDone := sync.WaitGroup{}
workersDone.Add(workersCount)
docID := fmt.Sprintf("%d", time.Now().UnixNano())
startWorkers := make(chan struct{})
// it doesn't matter if I'm re-using the client and using separate one
client := newFirestoreClient()
for worker := 0; worker < workersCount; worker++ {
workerNum := worker
go func() {
defer workersDone.Done()
<-startWorkers
err := client.RunTransaction(ctx, func(ctx context.Context, transaction *firestore.Transaction) error {
docRef := client.Collection("test-coll").Doc(docID)
// without that function, it's working fine
_, _ = transaction.Get(docRef)
if err := transaction.Set(docRef, map[string]string{"foo": "bar"}); err != nil {
return err
}
return nil
})
fmt.Printf("err from worker %d: %+v\n", workerNum, err)
}()
}
close(startWorkers)
workersDone.Wait()
}
func newFirestoreClient() *firestore.Client {
firebaseClient, err := firestore.NewClient(context.Background(), "project-id")
if err != nil {
panic(err)
}
return firebaseClient
}
Expected behavior
No deadlock in RunTransaction.
Actual behavior
After running the code, RunTransaction is blocked until the context is canceled because of timeout.
err from worker 0: rpc error: code = DeadlineExceeded desc = context deadline exceeded
err from worker 1: rpc error: code = DeadlineExceeded desc = context deadline exceeded
When we will get rid of _, _ = transaction.Get(docRef) everything is working fine.
Additional context
I'm not sure that it's not just a problem with the newest version of the emulator or client - we have this issue for a couple of months.
It's also hard to say if it's the emulator problem or Go client. Anyway, I tried to find any similar issues in other clients and I didn't found anything useful.
If you think that it may be useful, I can also create issue for the emulator and link it here :)
Thanks for filing the issue. I don't see where in your code or environment you are enabling the emulator-- could you clarify on this?
@benwhitehead do we support this functionality through the emulator?
Hello @tritone, I added the info :)
Sounds good-- just wanted to confirm that you are also using FIRESTORE_EMULATOR_HOST to indicate that the client library should call the emulator?
Yes, I also added that for clarification.
As I said, commenting _, _ = transaction.Get(docRef) out is fixing the issue, so I'm sure that the problem is not with connection timeout, but with transaction.
Great; thanks very much for the quick response. I'll make sure this use case is supported and we can look into filing a bug in the emulator if the problem is there.
Do you have any news about that @tritone?
I'm asking, because I'm writing an article about testing repositories which includes Firestore, MySQL, and in-memory implementation.
Unfortunately, in Firestore implementation I need to explicitly skip some tests because they are failing because of that bug :(
I didn't see any obvious reason that this shouldn't be supported in the emulator. @benwhitehead is taking a look-- he's going to see if he can repro using the Java client as well. We will keep you posted.
I've spent some time attempting to create a repro in Java. Some mixed news I'm afraid. I was able to create what appeared to be a hang when attempting to have multiple transactions modify the same document at the same time, however, the hang resulted in a transaction timeout and each of the failed transactions were retried and then succeed. The noticeable hang I was able to observe in my testing was ~30 seconds while the transaction was timing out.
A few more questions to try and help narrow things down:
INFO: operation failed: transaction timeout?hello @BenWhitehead :)
I increased timeout to 60s on the client level, and it's correct - it's timeouts after 30s
2020/07/30 08:45:12 create firestore
2020/07/30 08:45:12 starting workers
2020/07/30 08:45:12 running tx
2020/07/30 08:45:12 running tx
2020/07/30 08:45:42 err from worker 0: rpc error: caode = Unknown desc =
2020/07/30 08:45:42 err from worker 1: rpc error: code = Unknown desc =
When I'm setting 60s timeout on the client level
Jul 30, 2020 8:45:12 AM io.gapi.emulators.netty.HttpVersionRoutingHandler channelRead
INFO: Detected HTTP/2 connection.
Jul 30, 2020 8:45:31 AM com.google.cloud.datastore.emulator.impl.util.WrappedStreamObserver onError
INFO: operation failed: transaction timeout
Jul 30, 2020 8:45:31 AM com.google.cloud.datastore.emulator.impl.util.WrappedStreamObserver onError
INFO: operation failed: transaction timeout
Jul 30, 2020 8:45:42 AM com.google.cloud.datastore.emulator.impl.util.WrappedStreamObserver onError
INFO: operation failed: transaction timeout
Jul 30, 2020 8:45:42 AM com.google.cloud.datastore.emulator.impl.util.WrappedStreamObserver onError
INFO: operation failed: transaction timeout
With smaller 1s timeout on the consumer level, it actually also return an error. But what is interesting it's after the client is already dead (I assume that it's not handling the situation when the client connection is closed?)
Jul 30, 2020 8:48:22 AM io.gapi.emulators.netty.HttpVersionRoutingHandler channelRead
INFO: Detected HTTP/2 connection.
Jul 30, 2020 8:48:52 AM com.google.cloud.datastore.emulator.impl.util.WrappedStreamObserver onError
INFO: operation failed: transaction timeout
Jul 30, 2020 8:48:52 AM com.google.cloud.datastore.emulator.impl.util.WrappedStreamObserver onError
INFO: operation failed: transaction timeout
It's interesting, but on the first run the error is different. But anyway, the problem is the same (deadlock and error after client or server timeout, in that case it's on the client side):
# first run
2020/07/30 08:51:21 err from worker 0: rpc error: code = Canceled desc = stream terminated by RST_STREAM with error code: CANCEL
2020/07/30 08:51:21 err from worker 1: rpc error: code = Canceled desc = stream terminated by RST_STREAM with error code: CANCEL
# second run
2020/07/30 08:51:37 err from worker 0: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2020/07/30 08:51:37 err from worker 1: rpc error: code = DeadlineExceeded desc = context deadline exceeded
# third run
2020/07/30 08:51:46 err from worker 1: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2020/07/30 08:51:46 err from worker 0: rpc error: code = DeadlineExceeded desc = context deadline exceeded
UPDATE:
I also tried to run fresh emulator and 60s timeout, in that case server didn't timeout after 30s as usual, but client timeout worked after 60s
2020/07/30 08:53:23 create firestore
2020/07/30 08:53:23 starting workers
2020/07/30 08:53:23 running tx
2020/07/30 08:53:23 running tx
2020/07/30 08:54:23 err from worker 1: context deadline exceeded
2020/07/30 08:54:23 err from worker 0: context deadline exceeded
Thanks for the additional info @roblaszczak. This sounds like it is something that will need some deeper investigation into the internals for the Go Firestore client to determine the best way to address the issue.
There is an integration tests in the Java Firestore client that specifically tests this type of scenario and ensure that after the contention, there is a re-attempt. We probably need to ensure that something similar is added to the test suite here.
95% certain this is related to https://github.com/firebase/firebase-tools/issues/2452
Any progress on this issue?
Most helpful comment
Great; thanks very much for the quick response. I'll make sure this use case is supported and we can look into filing a bug in the emulator if the problem is there.