During a recent bundle update of our app the version of grpc changed from 1.2.0 to 1.2.2. With the new version of grpc I am seeing runaway memory use. Here is a screenshot from a test I ran on Heroku. The initial flat spot is the Rails app running happily after deploy and the start of the large ramp was caused by a single datastore query. I then left the app alone and didn't make any more requests. Memory use continued to climb until 8:10am when Heroku killed the dyno. The app restarted and once again ran happily until I made another datastore request at 11:30 am and the ramp started over. The deploy of v 223 was with a single change of gem 'grpc', '1.2.0' to the Gemfile and the app has been running happily since.

I can also reproduce this locally using the derailed gem. Has anything changed with version 1.2.2 of the grpc gem that needs a change to the datastore code? Or should this issue be moved to the grpc repo?
@bmclean Thanks for opening the issue.
@swcloud Can we get someone from the grpc team to look at this?
Might be related: https://github.com/grpc/grpc/issues/10658
The require grpc would be happening when the first datastore query occurs (shown above).
Adding @apolcyn
I do think this is running into the same issue as in https://github.com/grpc/grpc/issues/10658.
If at all there is a fork after a 'require grpc', with grpc version 1.2.2, then unfortunately I'd expect these issues right now. There is a background thread in 1.2.2 involved in grpc channel lifecycles, and after a fork, if it's not there in the child process, then garbage collection of grpc channels will start to fail.
^ comment above is still a guess of what the app here is doing though, I'm not certain this is the same as in grpc/grpc#10658
I am not certain either. The Rails app used in the Heroku test was using Puma configured with 1 worker and 5 threads. So it could not have forked another process. The ruby client delays loading grpc until it is used (see here) and the memory starts ramping up immediately after the first datastore query (when grpc was loaded).
I have created a simplified Rails 5 app that uses the derailed gem to show the memory usage. It uses an actual cloud datastore instance (so needs a project ID), authenticated locally through gcloud auth login. Clone the grpc-1.2.2-memory branch from here. Note that when running derailed to profile the memory usage all tests are run without a webserver, as it uses Rack directly. The Rails app is set to use grpc 1.2.0 initially.
cd test/support/datastore_example_rails_app
bundle
Check if the app is connecting to datastore by running:
RAILS_ENV=production GCLOUD_PROJECT=project-id-goes-here rails server
Navigate to localhost:3000 with a browser and create a few users.
Execute the following command to run the derailed benchmark (substitute your datastore project ID).
DERAILED_SKIP_ACTIVE_RECORD=true PATH_TO_HIT=/users GCLOUD_PROJECT=project-id-goes-here TEST_COUNT=5000 bundle exec derailed exec perf:mem_over_time
The profile will take about 15 minutes.
Then change the version of grpc to 1.2.2 in the Gemfile.
bundle update
Run the derailed command again to compare.
You should end up with results that when graphed look something like this:

Inconclusive yet but looking into this and seem to be getting similar results. Thanks for the repro, this is really helpful!
@bmclean to be certain about the initial problem, you noticed runaway memory usage after only one datastore call? I'm not sure exactly which query it was, do we know if this translates to only one grpc call?
I'm seeing similar results with the benchmark memory comparisons in the graph above, but I'm not sure if these differences are necessarily hitting the bug in the original issue (trying reproduce that).
@apolcyn Correct. An ancestor query with an equality filter on one property:
query = CloudDatastore.dataset.query 'User'
query.ancestor(ancestor_key)
query.where('disabled', '=', false)
entities = CloudDatastore.dataset.run query
You have a valid point that the memory benchmark isn't exactly the same. It is performing the query repeatedly. I didn't see a way for derailed to hit the url only once but keep monitoring the memory.
So, I have added displaying the memory of the process to the example app. The index page of the app does not perform any queries. Start at the index page, then click to view the users (which performs a query) and then go back to the index page. Refresh the index page every few minutes and with 1.2.0 the memory eventually stays constant but with 1.2.2 it keeps climbing.
You can start the Rails server locally with:
RAILS_ENV=production GCLOUD_PROJECT=project-id-goes-here rails server
So it looks like this is definitely a pure grpc problem, it can actually be reproduced easily with a grpc example tweaked to:
def main
stub = Helloworld::Greeter::Stub.new('localhost:50051', :this_channel_is_insecure)
user = ARGV.size > 0 ? ARGV[0] : 'world'
message = stub.say_hello(Helloworld::HelloRequest.new(name: user)).message
p "Greeting: #{message}"
loop do
sleep 30
p "#{(`ps -o rss= -p #{Process.pid}`.to_i * 1024).to_f / 2**20}"
end
stub.inspect
end
main
(it looks like the background thread that was added in v1.2.x to keep fix connectivity-related failures seems to be accumulating a lot of memory - it makes a check on a timer, and memory accumulates proportionally to the speed of it. e.g., setting https://github.com/grpc/grpc/blob/master/src/ruby/ext/grpc/rb_channel.c#L425 to a lower value can cause almost all of heap allocs to be from within grpc_channel_watch_connectivity_state)
Thanks @apolcyn!
A heads up that the fix for this is under WIP but isn't immediate.
In the meantime, going to release a 1.2.5 gem that reverts the connectivity fix (in https://github.com/grpc/grpc/pull/9986), which is causing this issue.
@apolcyn I deployed grpc version 1.3.4 to our staging environment today and so far everything looks fine.
24 hours later and memory is still holding steady. Nice work @apolcyn! Are we ok to close this issue?
Great news @bmclean! Feel free to close this issue if you feel it is resolved.
thanks for updates @bmclean, glad to hear that you're seeing the issue fixed!
Most helpful comment
In the meantime, going to release a 1.2.5 gem that reverts the connectivity fix (in https://github.com/grpc/grpc/pull/9986), which is causing this issue.