Google-cloud-ruby: gRPC "2:Stream removed" error in pubsub listen

Created on 22 Jan 2020  路  12Comments  路  Source: googleapis/google-cloud-ruby

I am getting a very occasional error message "2:Stream removed" from a pubsub subscription on_error callback. This subscription does not have very much traffic, and the few times the error has happened it does not seem to be related to the processing of an active message.

It has not seemed to affect the processing of messages, however I am somewhat alarmed because the API dashboard is now showing 96% error rate (see chart at bottom), even though no pubsub subscription errors are appearing in the log besides these very occasional (once every few days) "2:Stream removed" messages. I've restarted the subscription processes and the error rate seems to have dropped.

Apologies if this is related to another open issue, I was not sure.

Environment details

  • OS: Windows 2012 R2
  • Ruby version: Ruby 2.6-x64
  • Gem name and version:
    google-cloud-pubsub (1.2.0)
      concurrent-ruby (~> 1.1)
      google-cloud-core (~> 1.2)
      google-gax (~> 1.8)
      googleapis-common-protos (>= 1.3.9, < 2.0)
      googleapis-common-protos-types (>= 1.0.4, < 2.0)
      grpc-google-iam-v1 (~> 0.6.9)

Steps to reproduce

See code below

Code example

    client = Google::Cloud::PubSub.new(credentials: opts.auth_file)
    sub = client.subscription opts.queue

    @logger.info "Listening for tasks for worker #{@worker} on #{opts.queue}" 

    # Note: streams in this case is 3. But it has happened in another process when streams = 1.
    subscriber = sub.listen(streams: opts.concurrency) do |msg|
      @logger.info "Processing task #{msg.message_id} with worker #{@worker} on #{opts.queue}"
      @worker.new(opts.env, @logger).call(msg)        
      @logger.info "Processed task #{msg.message_id} with worker #{@worker} on #{opts.queue}"
    end

    subscriber.on_error do |e|
      @logger.error({     # note: this is where the error occurs
        "$type" => "Error",
        "error_class" => e.class.to_s,
        "message" => e.to_s,
        "backtrace" => e.backtrace
      })
    end

    subscriber.start

Error

Error class: GRPC::Unknown
Error message: "2:Stream removed"

Full backtrace

   0: "workers/.gems/ruby/2.6.0/gems/grpc-1.26.0-x64-mingw32/src/ruby/lib/grpc/generic/active_call.rb:31:in `check_status'"
   1: "workers/.gems/ruby/2.6.0/gems/grpc-1.26.0-x64-mingw32/src/ruby/lib/grpc/generic/bidi_call.rb:209:in `block in read_loop'"    
   2: "workers/.gems/ruby/2.6.0/gems/grpc-1.26.0-x64-mingw32/src/ruby/lib/grpc/generic/bidi_call.rb:195:in `loop'"    
   3: "workers/.gems/ruby/2.6.0/gems/grpc-1.26.0-x64-mingw32/src/ruby/lib/grpc/generic/bidi_call.rb:195:in `read_loop'"    
   4: "bin/generate:1:in `each'" 

pubsub-errors

p2 bug

All 12 comments

@ericgj Thank you for opening this issue!

I only just took a quick initial look at it, but I'm having trouble matching the backtrace with the code example. Do you have any more output you can add?

The only other clue is the error class is GRPC::Unknown (updated issue above). It is definitely happening in that subscription on_error callback, since that's the only place that particular log type is written.

oh, and these are more detailed metrics:

image

Thanks for the updates. I might not have a quick answer for you, but I'll look into this.

Just an update - the error rate on StreamingPull methods has not gone down after restarting the subscription. It is still near 100%. And the "Stream removed" errors have not happened again yet. So I don't know how the two are related, if at all. Meanwhile, as I said -- the processing of actual messages seems to be happening without error.

@ericgj Thank you for this update! Anything else you find is also very welcome.

About the "2:Stream removed" errors

I have not much more info about these, unfortunately. It seems to happen within a day of starting up the subscriber process, doesn't coincide or affect the processing of actual messages. I've noticed it seems to affect all open streams at the same time. So for example one of my subscribers has 3 streams. When the error happens, it happens three times almost simultaneously. Actual messages continue to be processed after this.

About the high error rate on the StreamingPull, which does not result in logged errors:

  • According to this chart the errors are 503's (service unavailable?)
    image

  • There are 33,988 StreamingPull requests made, but only 285 Publish requests to date. Is this because StreamingPull makes some kind of heartbeat requests? Is that what may be returning 503 responses?

  • I tried logging GRPC requests using the exact instructions here. But I get no output, even with debug-level logging.

  • Should I open a separate issue about these errors, since it's unclear what the relationship is between the "2:Stream removed" errors and these errors?

Thanks for the update. I was just looking at the error handling in pubsub/subscriber/stream.rb this morning, but couldn't find anything helpful.

Should I open a separate issue about these errors, since it's unclear what the relationship is between the "2:Stream removed" errors and these errors?

I'd say wait on that for now. We can split the issue once discussion moves forward on one or the other of these behaviors.

The official API docs discuss why StreamingPull has a 100% error rate (this is to be expected).

Does this explanation address what you're seeing?

Closing since I believe this behavior is normal and expected (although certainly undesirable for users) per the docs linked above.

Thanks - sorry for not responding earlier - that does seem to explain it.

There has been some discussion about the client suppressing these "normal" errors, so that the callback only receives actionable errors. If that goes anywhere I'll post it to this issue.

Was this page helpful?
0 / 5 - 0 ratings