While looping through rows on a either a very large table or over a low-bandwidth connection I get:
Traceback (most recent call last):
21: from migrate.rb:145:in `<main>'
20: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/client.rb:905:in `snapshot'
19: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/pool.rb:57:in `with_session'
18: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/client.rb:913:in `block in snapshot'
17: from migrate.rb:151:in `block in <main>'
16: from migrate.rb:151:in `each'
15: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:177:in `rows'
14: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:177:in `each'
13: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:185:in `block in rows'
12: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:185:in `each_slice'
11: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:185:in `each'
10: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:186:in `block (2 levels) in rows'
9: from migrate.rb:153:in `block (2 levels) in <main>'
8: from migrate.rb:103:in `backup_table'
7: from /Users/untoldone/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/csv.rb:1289:in `open'
6: from migrate.rb:113:in `block in backup_table'
5: from migrate.rb:113:in `to_a'
4: from migrate.rb:113:in `each'
3: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:119:in `rows'
2: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:119:in `loop'
1: from /Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:120:in `block in rows'
/Users/untoldone/.rvm/gems/ruby-2.5.0@spanner-to-cockroach/gems/google-cloud-spanner-1.6.0/lib/google/cloud/spanner/results.rb:170:in `rescue in block in rows': 4:Deadline Exceeded (Google::Cloud::DeadlineExceededError)
Code is pretty strait forward:
spanner = Google::Cloud::Spanner.new project: GOOGLE_PROJECT
client = spanner.client SPANNER_INSTANCE_ID, SPANNER_DATABASE_ID
client.snapshot do |snapshot|
results = snapshot.execute("SELECT * FROM table_name_here")
results.rows.each do |row|
# Do some stuff
end
end
I couldn't find anywhere documented or undocumented that I could easily extend the deadline to prevent this. I would have expected either the library handles or prevents this error or at a minimum allows me to override the default deadline.
Hi @untoldone, thanks for opening the issue about this. Can you tell us how long your code runs until you get a DeadlineExceededError? Do you know how many rows have been processed by the time you get an error? How much work is being performed in the # Do some stuff block?
I had a query that was running over a slow broadband internet connection (maybe 20mbps) that was slow but consistent in speed/reliability, it would get through about 110,000 rows of a 260,000 row table. I think it was running for about 2 min before throwing (didn't have an exact time as I only wrote times out every 10k rows).
"Do some stuff" was just writing rows to a CSV without any processing.
Rerunning on a faster internet connection resolved the issue for me but it also finished its query in far under 2 minutes.
Let me know if anything else would be helpful here!
FWIW, the only similar issue I can find is GoogleCloudPlatform/python-docs-samples#975.
@snehashah16 Can you weigh on in this? Currently streaming executes and reads rescue only on UNAVAILABLE, and not DEADLINE_EXCEEDED. Should they? I notice that idempotent API endpoints are currently configured for both.
Is there anything you can share on the expectations the API has for how quickly streaming results are to be pulled off the stream? Are low bandwidth connections, or slower clients supported on streaming API endpoints? if so, what would need to be changed to support this better?
@snehashah16 Ping.
@snehashah16 Ping.
hey @untoldone -
I would one of the following
@snehashah16 Thanks for weighing in. When the Spanner client was first implemented the design guidance we were given was very specific that broken streams should be retried for only UNAVAILABLE errors. But it seems that a stream might fail for other reasons that can be recovered from, such as the situation offered in this issue.
So let me rephrase my question:
Can you please confirm that the only error streams should be recovered from are UNAVAILABLE?
To support my assertion that we may want to rethink how Spanner recovers from broken streams, here are the errors that Firestore is currently recovering from:
rescue GRPC::Cancelled, GRPC::DeadlineExceeded, GRPC::Internal,
GRPC::ResourceExhausted, GRPC::Unauthenticated,
GRPC::Unavailable, GRPC::Core::CallError
@blowmage - Spanner Java Client retries for the following:
https://github.com/GoogleCloudPlatform/google-cloud-java/blob/e8cff3f90645c38723d08a8fecfacb8635fe64ea/google-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerExceptionFactory.java#L123
TKIM, Aborts should also be retried w/ backoff sleep duration from the response:
https://github.com/GoogleCloudPlatform/google-cloud-java/blob/e8cff3f90645c38723d08a8fecfacb8635fe64ea/google-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerImpl.java#L246
@snehashah16 @blowmage tried switching to batch client and still have the same issues... at this point, the queries take long enough even from a gcp box to spanner that this query no longer runs successfully.
Short term -- is there a specific place in code this timeout is set? If yes, I can override it / monkeypatch for now just to up the timeout. Long term this clearly isn't a fix, but I'll at least be set for a little while.
@blowmage - was my response (from Sep 17) useful and did u have a chance to repro this scenario ? Can the customer set a custom timeout ?
@untoldone You can pass in connection configuration values using the client_config argument on the Google::Cloud::Spanner.new method. I believe this structure is the same as the values for the "google.spanner.v1.Spanner" key in the spanner_client_config.json file.
Before you get too far configuring your client, can you update your code to use a version of the gem defined in a branch on my repo and see if it has the same problems? You can do this by placing the following in your Gemfile:
gem "google-cloud-spanner",
github: "blowmage/google-cloud-ruby",
branch: "spanner-stream-rescue"
I'm curious to know if the code in the branch recovers from the errors and allows the query to complete.
Oh, there is also a timeout argument on the Google::Cloud::Spanner.new method, which is probably useful as well. :)
Tried your repo and got
Traceback (most recent call last):
12: from /app/run.rb:76:in `<main>'
11: from /app/spanner_processor.rb:41:in `backup'
10: from /usr/local/bundle/bundler/gems/google-cloud-ruby-10238b9efc4a/google-cloud-storage/lib/google/cloud/storage/project.rb:215:in `bucket'
9: from /usr/local/bundle/bundler/gems/google-cloud-ruby-10238b9efc4a/google-cloud-storage/lib/google/cloud/storage/service.rb:84:in `get_bucket'
8: from /usr/local/bundle/bundler/gems/google-cloud-ruby-10238b9efc4a/google-cloud-storage/lib/google/cloud/storage/service.rb:568:in `execute'
7: from /usr/local/bundle/bundler/gems/google-cloud-ruby-10238b9efc4a/google-cloud-storage/lib/google/cloud/storage/service.rb:85:in `block in get_bucket'
6: from /usr/local/bundle/gems/google-api-client-0.23.6/generated/google/apis/storage_v1/service.rb:379:in `get_bucket'
5: from /usr/local/bundle/gems/google-api-client-0.23.6/lib/google/apis/core/base_service.rb:360:in `execute_or_queue_command'
4: from /usr/local/bundle/gems/google-api-client-0.23.6/lib/google/apis/core/http_command.rb:93:in `execute'
3: from /usr/local/bundle/gems/retriable-3.1.2/lib/retriable.rb:56:in `retriable'
2: from /usr/local/bundle/gems/retriable-3.1.2/lib/retriable.rb:56:in `times'
1: from /usr/local/bundle/gems/retriable-3.1.2/lib/retriable.rb:61:in `block in retriable'
/usr/local/bundle/gems/google-api-client-0.23.6/lib/google/apis/core/http_command.rb:102:in `block in execute': uninitialized constant Signet::RemoteServerError (NameError)
Not sure if I didn't install it correctly / run my script correctly or something else.
Also, not sure if im doing this right on the config ... tried increasing a bunch of the timeouts but can't seem to get it to work -- but I'm just guessing at which times to be editing as not sure which timeouts impact this issue -- e.g.
spanner = Google::Cloud::Spanner.new project: @project_id, client_config: {"interfaces"=>
{"google.spanner.v1.Spanner"=>
{"retry_params"=>{"default"=>{"initial_rpc_timeout_millis"=>200000,"max_rpc_timeout_millis"=>200000}},
"methods"=> {
"ExecuteSql"=> {
"timeout_millis"=>200000
}
}
}
}
}
You鈥檒l need to update your other dependencies to use the latest version of the signet gem.
Cool -- the queries now complete with your branch while removing the client config overrides.
FYI: had to remove signet from my Gemfile.lock as no constraints pushed it to the correct version.
Great news! I will create a PR to merge this change and release it.
Unfortunately, the exact version of google-api-client you are using had the issue and it was fixed in the next version, released a day later.
This is fixed in Spanner 1.7.2. Thank you for your help! If you continue to have problems like this please open a new issue and reference this one.
Hi, I am having the same problem even though i set timeout parameter gives the same error. Which version is the latest one ? I am using spanner 1.9.0 @blowmage
@kizilipek can you open a new issue linking to this one? I am no longer an active contributor to these libraries.