Client
PubSub
Environment
Ubuntu 18.04.4 on a GCP Compute VM
Go Environment
$ go version
go version go1.14.4 linux/amd64
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/wstorey_maxmind_com/.cache/go-build"
GOENV="/home/wstorey_maxmind_com/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/wstorey_maxmind_com/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build654305621=/tmp/go-build -gno-record-gcc-switches"
Code
https://gist.github.com/horgh/3e095519da9082fe26b1104cbe5909da
Expected behavior
When the context.Context passed to subscription.Receive() gets canceled, subscription.Receive() returns promptly. Afterwards, when I do another Receive() call, messages get received at the previous rate.
Actual behavior
subscription.Receive() either never returns (after waiting minutes) or eventually returns. After eventually returning, messages are received at a much slower rate (for example, 500 at a time, when previously I'd see 10,000 at a time).
Additional context
I'm using cloud.google.com/go/pubsub v1.4.0.
For my test, I have around 800k messages in the subscription (enough that the worker won't run out of messages to work on). To see the issue, I run the test program and wait several minutes. The behaviour seems non deterministic.
The test program is structured to match a production program I have that exhibited the behaviour. The production program processes messages somewhat more slowly than it can receive them, so it periodically cancels the Receive() call and waits until more messages can be worked on downstream before receiving again.
From what I can tell, usually canceling the context causes the Receive() to end quickly. However after a few minutes of canceling and receiving anew it seems to get stuck. I end up seeing many of these messages logged and I never see this message logged. Essentially the function passed to Receive keeps getting called with a canceled context, long after the context was canceled.
I haven't looked at the library code to try to understand this, but given it's non deterministic, perhaps there is a race of some kind?
If there's any more info I can provide, please let me know.
Thank you!
I noticed #2217 which may be about the same thing. Sorry about the duplicate if so!
As an update, I'm still currently trying to find stable reproduction in order to diagnose the root cause.
In the meantime, could you try switching your code to use Synchronous Pull? This can be done by adding sub.ReceiveSettings.Synchronous = true before calling sub.Receive. This will help me narrow down the problem to the StreamingPull section of the code if the problem goes away when using Synchronous Pull.
Thanks for the update! I'm going to try your change. However I'm trying to reproduce the original way first and today I'm having trouble seeing it 馃槥. Strange because I definitely saw it repeatedly the day I opened this issue. I'll try again tomorrow!
Unfortunately I haven't been able to reproduce this again. I've tried a few times. I'm not sure what's up! I do believe there is an issue but I can't explain why I haven't reproduced it since the day I opened the issue 馃槙.
I'll close in favor of #2217. If you continue to have issues in the future, please comment on that issue instead. Thanks!