Google-cloud-ruby: Datastore: does not work inside Puma webserver

Created on 12 May 2016  Â·  41Comments  Â·  Source: googleapis/google-cloud-ruby

Would be good to receive independent confirmation, but from what I can tell is that Datastore queries hang if run inside Puma webserver. Same query runs fine from rails console process. Worker processes running other queries/inserts also look to be running ok. All environment is the same.

Web process just hangs when calling dataset.run(query)

datastore grpc question

Most helpful comment

In summary, this is what we discovered when testing with Ruby web servers configured to fork processes (Puma, Passenger, Unicorn):

  • the grpc library is initialized with require 'gcloud/datastore'
  • the way that the gRPC library within gcloud initializes does not persist properly across forks
  • if the grpc library is initialized prior to the web server forking, the sub-processes don't have correct initialization. For example, require 'gcloud/datastore executed in a Rails initializer or in any code that is eager loaded (such as within the Rails app/ directory)
  • if you fork the process first and then initialize the grpc library in each worker, everything works correctly

Here is an example of a Rails CloudDatastore initializer that is currently working with Puma:

module CloudDatastore
  if Rails.env.development?
    ENV['DATASTORE_EMULATOR_HOST'] = 'localhost:8180'
    ENV['GCLOUD_PROJECT'] = 'local-datastore'
  elsif Rails.env.test?
    ENV['DATASTORE_EMULATOR_HOST'] = 'localhost:8181'
    ENV['GCLOUD_PROJECT'] = 'test-datastore'
  else
    ENV['GCLOUD_KEYFILE_JSON'] = '{"private_key": "' + ENV['SERVICE_ACCOUNT_PRIVATE_KEY'] + '",
      "client_email": "' + ENV['SERVICE_ACCOUNT_CLIENT_EMAIL'] + '"}'
  end

  def self.dataset
    require 'gcloud/datastore'
    Thread.current[:dataset] ||= Gcloud.datastore(ENV['GCLOUD_PROJECT'])
  end

  def self.reset_dataset
    Thread.list.each do |thread|
      thread[:dataset] = nil if thread.key?(:dataset)
    end
  end
end

@blowmage Unrelated to the grpc initialization problem, we are also trying out your one Gcloud grpc client instance per web server thread concept.

All 41 comments

Actually same Puma server runs fine on localhost. The problem only manifests on Heroku, in web Dyno, and only using v1beta3 API. Weird

@timanovsky I don't know what this is about. I'll investigate. What size are your dynos? How many threads are you running on each one? How many dynos are you running?

@timanovsky Is there anything interesting in the logs before the process hangs? Feel free to email me a private gist if you don't want to share publicly.

@blowmage Unfortunately I can reproduce this as well on 1 standard professional dyno.

Puma config:
threads_count = 5
workers = 2
preload_app!

Heroku logs:

2016-05-12T18:04:40.011516+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/" request_id=ff9a7f0a-2202-49b3-8f56-376be41623f1 fwd="198.169.134.185" dyno=web.1 connect=20ms service=30000ms status=503 bytes=0
2016-05-12T18:08:10.052634+00:00 app[web.1]: D0512 18:08:10.052580224      37 chttp2_transport.c:698]     got goaway [0]: 73 65 73 73 69 6f 6e 5f 74 69 6d 65 64 5f 6f 75 74 'session_timed_out'

@murgatroid99 @nicolasnoble Sorry to pull you guys in on this, but do you have any insight on why the GRPC client would hang on Heroku?

I don't recognize this specific problem. It would help to have more information. If you run with the environment variable GRPC_TRACE=all, it will output a bunch of log information that might contain something useful (make sure to remove private information).

The problem could be a bug in the gRPC client, or it could be that the server is not responding properly.

@murgatroid99 I added the environment variable GRPC_TRACE=all and the output is captured here.

What I see happening in that log is: grpc connects to the server, completes a TLS handshake, and starts initializing an HTTP2 stream. And then it suddenly stops, and ~20 seconds later, Heroku kills it. Unfortunately, I don't recognize that failure case.

Can you open a bug against grpc (https://github.com/grpc/grpc) including that trace and a description of how you encountered the error?

Thanks for your help @murgatroid99!

OK, will do. @blowmage the gcloud-ruby gcloud.gemspec lists "grpc", "~> 0.13.1". It looks like the current version is 0.14.1. Any chance this has already been fixed?

@bmclean We'd like to bump the dependency, but we have an open issue that is blocking us from doing that.

@timanovsky Are you able to connect to an actual datastore while running Puma locally? If I run Puma localhost and connect to the datastore emulator it is fine but if I run Puma localhost and connect to my actual datastore it hangs with:
D0512 21:17:47.508334000 123145307635712 chttp2_transport.c:698] got goaway [0]: 73 65 73 73 69 6f 6e 5f 74 69 6d 65 64 5f 6f 75 74 'session_timed_out'

Yes, from local puma with same settings it works. This is Linux VM though.
On 13 May 2016 6:11 am, "Bryce McLean" [email protected] wrote:

@timanovsky https://github.com/timanovsky Are you able to connect to an
actual datastore while running Puma locally?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/gcloud-ruby/issues/691#issuecomment-218941956

Btw those got goaway log messages are pretty frequent even in normally working processes (local, or non-puma).

I am also able to reproduce the hang using the phusion passenger web server on Heroku. However, deploying to Heroku using WEBrick works. Obviously this isn't a production solution but perhaps it can give us a little more information.

Issue has been created against grpc/grpc#6577

@bmclean Thanks for your tests, glad problem confirmed and is easily reproducible.

So I've been experimenting with running the GRPC client in a threaded environment. My guess is before the threaded webserver hangs (Puma, Passenger, etc.) you see the following message printed to stdout:

D0518 22:47:08.134601000 123145312415744 completion_queue.c:429] Too many outstanding grpc_completion_queue_pluck calls: maximum is 6

I've looked into this and it basically boils down to the GRPC client not handling more than 6 threads. My guess is that you are using more than 6 threads in your production environment. This doesn't mean that you can't use GRPC in your production environment, just that you need to treat the GRPC client the same way you treat a database client, namely that you have one instance per thread.

This is pretty easy to do. Instead of creating one datastore object for all threads, create one on each webserver thread. Here is some example code that might work for you:

##
# Use this constant to access the Google Cloud Datastore
module Datastore
  def self.dataset
    Thread.current[:dataset] ||= Gcloud.datastore
  end
end

# Now whenever you need to call the datastore, you use that method...

class PostController < ActionController::Base
  def index
    @posts = Datastore.dataset.run(Datastore.dataset.query("Post"))
  end
end

I've run several long running experiments with up to 32 threads without seeing the above message, or seen a thread hang.

Does not look like my case. I'm using the same connection pool code
(connection_pool gem) on both worker dynos which work fine and web dynos
which hangs. Each thread gets its own connection. Also I couldn't find D0518
message in the logs.

Best regards,
Alexey Timanovsky.

On 19 May 2016 at 07:58, Mike Moore [email protected] wrote:

So I've been experimenting with running the GRPC client in a threaded
environment. My guess is before the threaded webserver hangs (Puma,
Passenger, etc.) you see the following message printed to stdout:

D0518 22:47:08.134601000 123145312415744 completion_queue.c:429] Too many outstanding grpc_completion_queue_pluck calls: maximum is 6

I've looked into this and it basically boils down to the GRPC client not
handling more than 6 threads. My guess is that you are using more than 6
threads in your production environment. This doesn't mean that you can't
use GRPC in your production environment, just that you need to treat the
GRPC client the same way you treat a database client, namely that you have
one instance per thread.

This is pretty easy to do. Instead of creating one datastore object for
all threads, create one on each webserver thread. Here is some example code
that might work for you:

Use this constant to access the Google Cloud Datastoremodule Datastore

def self.dataset
Thread.current[:dataset] ||= Gcloud.datastore
endend

Now whenever you need to call the datastore, you use that method...

class PostController < ActionController::Base
def index
@posts = Datastore.dataset.run(Datastore.dataset.query("Post"))
endend

I've run several long running experiments with up to 32 threads without
seeing the above message, or seen a thread hang.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/gcloud-ruby/issues/691#issuecomment-220227847

How many threads are you running on your worker dynos and your web dynos?

That Too many outstanding grpc_completion_queue_pluck calls message isn't related to the number of threads. That's related to a bug that should have been fixed in the most recent release that's related to the internal structure of the library.

If you look at the most recent updates on that grpc issue linked above, it looks like the root cause of this problem is eagerly loading the grpc library before forking worker subprocesses. As far as I know, in the most recent release, we have no problems using a single client in an arbitrary number of threads.

I was running configurations with up to 20 threads, but not sure what
number of connection did they checkout out of the pool at any given time.
May be less than 6. And also web dynow which do have problems, they are
single thread on Heroku (but multiple forked processes)

Best regards,
Alexey Timanovsky.

On 19 May 2016 at 08:35, Mike Moore [email protected] wrote:

How many threads are you running on your worker dynos and your web dynos?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/gcloud-ruby/issues/691#issuecomment-220231815

@murgatroid99 Unfortunately we are blocked by grpc/grpc#6571 for depending on the most recent release. I can reliably reproduce that message using only threads, no forked processes.

In summary, this is what we discovered when testing with Ruby web servers configured to fork processes (Puma, Passenger, Unicorn):

  • the grpc library is initialized with require 'gcloud/datastore'
  • the way that the gRPC library within gcloud initializes does not persist properly across forks
  • if the grpc library is initialized prior to the web server forking, the sub-processes don't have correct initialization. For example, require 'gcloud/datastore executed in a Rails initializer or in any code that is eager loaded (such as within the Rails app/ directory)
  • if you fork the process first and then initialize the grpc library in each worker, everything works correctly

Here is an example of a Rails CloudDatastore initializer that is currently working with Puma:

module CloudDatastore
  if Rails.env.development?
    ENV['DATASTORE_EMULATOR_HOST'] = 'localhost:8180'
    ENV['GCLOUD_PROJECT'] = 'local-datastore'
  elsif Rails.env.test?
    ENV['DATASTORE_EMULATOR_HOST'] = 'localhost:8181'
    ENV['GCLOUD_PROJECT'] = 'test-datastore'
  else
    ENV['GCLOUD_KEYFILE_JSON'] = '{"private_key": "' + ENV['SERVICE_ACCOUNT_PRIVATE_KEY'] + '",
      "client_email": "' + ENV['SERVICE_ACCOUNT_CLIENT_EMAIL'] + '"}'
  end

  def self.dataset
    require 'gcloud/datastore'
    Thread.current[:dataset] ||= Gcloud.datastore(ENV['GCLOUD_PROJECT'])
  end

  def self.reset_dataset
    Thread.list.each do |thread|
      thread[:dataset] = nil if thread.key?(:dataset)
    end
  end
end

@blowmage Unrelated to the grpc initialization problem, we are also trying out your one Gcloud grpc client instance per web server thread concept.

@bmclean Thank you to you and your team for the effort you've put into this! It is an incredible help.

@bmclean Thanks for such a great writeup! This is incredibly useful!

If you (or anyone reading this) are bothered by the fact that you require the library each time you access the dataset object, you can change the implementation a bit to run the require only when you need to create the object:

  def self.dataset
    Thread.current[:dataset] ||= begin
      require "gcloud/datastore"
      Gcloud.datastore ENV["GCLOUD_PROJECT"]
    end
  end

That is awesome, thanks @blowmage!

@timanovsky @bmclean Could you fellas try out a branch that uses the recently released grpc 0.15.0 gem? So far my tests have been successful. I'm curious if the latest gem solves some of the issues you've had running in puma.

https://github.com/blowmage/gcloud-ruby/tree/grpc-0.15

@blowmage Unfortunately Puma still hangs. I tried your branch as follows:
gem 'gcloud', :git => 'https://github.com/blowmage/gcloud-ruby.git', :branch => 'grpc-0.15'
I ran bundle install and verified that I was now using grpc (0.15.0 universal-darwin).
I moved the require 'gcloud/datastore' back to the top of the Rails CloudDatastore initializer.
I ran my app with rails s -e production connected to an actual cloud datastore.
It still hangs.

Hello, any updates on this? Unfortunately I have to require "gcloud/datastore" at the top, as I have several functions which do not need dataset object, but work with objects likeGcloud::Datastore::Key. It would be a mess to change all such functions to require "gcloud/datastore"inside.

Update workaround for those with similar problem:

require 'gcloud'
module ::Gcloud
  def self.const_missing(name)
    if name.to_s == 'Datastore'
      require 'gcloud/datastore'
      const_get(name)
    end || super
  end
end

Does the 0.20.0 release using grpc 1.0 change this at all?

@blowmage No, the problem still exists with 0.20.0.

Can you take a look at #875? I think it might help remove the need for const_missing hacks. It should even allow you to create a Google::Cloud::Datastore::Dataset object in an initializer as long as you don't make any API calls with it before Puma forks the sub-processes.

@blowmage Sure, I will give it a try.

@blowmage I tried your branch:
:git => 'https://github.com/blowmage/gcloud-ruby.git', :branch => 'grpc-delay-require'
Using an initializer like this:


require 'google/cloud/datastore'

module CloudDatastore
  def self.dataset
    @dataset ||= Google::Cloud.datastore(ENV['GCLOUD_PROJECT'])
  end
end

and Puma works fine configured with 2 workers.

Great. Thank you!

Does the 0.20.1 release (with PR #875) resolve this issue?

Yes.

@timanovsky Can you confirm 0.20.1 release (with PR #875) as a fix?

Seems to work, thanks for fixing this!

Was this page helpful?
0 / 5 - 0 ratings