Google-cloud-ruby: Broken pipe when uploading files (cloud storage)

Created on 11 Oct 2018  路  18Comments  路  Source: googleapis/google-cloud-ruby

We use GCS to store backups and for the last two days multiple servers have thrown a 'Broken pipe' exception when performing their backups.

I've tried uploading a similar size file on my machine locally but it throws the same exception so I don't think it's a network related issue.

Strangely, I can upload a 100MB file but trying to upload a 1.29GB file or larger causes the broken pipe.

Here's a test script I'm using locally:

#!/usr/bin/ruby

require 'logger'
require 'google/cloud/storage'

logger = Logger.new($stderr)
logger.level = Logger::DEBUG

Google::Apis.logger = logger

storage = Google::Cloud::Storage.new(
    timeout: 600,
    project: 'nialto-services',
    keyfile: File.expand_path('~/gcs/nialto-services-a6dbccf5fe51.json')
)

bucket = storage.bucket('nialto-testing')

local_path  = File.expand_path('~/gcs/20181010080000_example.com.1.dar')
remote_path = File.join('files', 'example.com', File.basename(local_path))

p bucket.create_file(local_path, remote_path)

And here's the output of uploading a 1.29GB file:

D, [2018-10-11T11:45:46.610028 #8992] DEBUG -- : Sending HTTP get https://www.googleapis.com/storage/v1/b/bucket-name?
D, [2018-10-11T11:45:46.984530 #8992] DEBUG -- : 200
D, [2018-10-11T11:45:46.984900 #8992] DEBUG -- : #<HTTP::Message:0x00007fb8122e0190 @http_header=#<HTTP::Message::Headers:0x00007fb8122e0168 @http_version="1.1", @body_size=0, @chunked=false, @request_method="GET", @request_uri=#<Addressable::URI:0x3fdc09170ce4 URI:https://www.googleapis.com/storage/v1/b/bucket-name?>, @request_query=nil, @request_absolute_uri=nil, @status_code=200, @reason_phrase="OK", @body_type=nil, @body_charset=nil, @body_date=nil, @body_encoding=#<Encoding:UTF-8>, @is_request=false, @header_item=[["X-GUploader-UploadID", "xxxx"], ["ETag", "CAI="], ["Vary", "Origin"], ["Vary", "X-Origin"], ["Content-Type", "application/json; charset=UTF-8"], ["Expires", "Thu, 11 Oct 2018 10:45:46 GMT"], ["Date", "Thu, 11 Oct 2018 10:45:46 GMT"], ["Cache-Control", "private, max-age=0, must-revalidate, no-transform"], ["Content-Length", "370"], ["Server", "UploadServer"], ["Alt-Svc", "quic=\":443\"; ma=2592000; v=\"44,43,39,35\""]], @dumped=false>, @peer_cert=#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name CN=*.googleapis.com,O=Google LLC,L=Mountain View,ST=California,C=US>, issuer=#<OpenSSL::X509::Name CN=Google Internet Authority G3,O=Google Trust Services,C=US>, serial=#<OpenSSL::BN:0x00007fb810a92e30>, not_before=2018-09-18 12:34:00 UTC, not_after=2018-12-11 12:34:00 UTC>, @http_body=#<HTTP::Message::Body:0x00007fb81233fcd0 @body="{\n \"kind\": \"storage#bucket\",\n \"id\": \"bucket-name\",\n \"selfLink\": \"https://www.googleapis.com/storage/v1/b/bucket-name\",\n \"projectNumber\": \"000000000000\",\n \"name\": \"bucket-name\",\n \"timeCreated\": \"2018-10-11T10:01:50.330Z\",\n \"updated\": \"2018-10-11T10:05:04.537Z\",\n \"metageneration\": \"2\",\n \"location\": \"EUROPE-WEST1\",\n \"storageClass\": \"REGIONAL\",\n \"etag\": \"CAI=\"\n}\n", @size=0, @positions=nil, @chunk_size=nil>, @previous=nil>
D, [2018-10-11T11:45:46.986936 #8992] DEBUG -- : Success - #<Google::Apis::StorageV1::Bucket:0x00007fb810a926b0
 @etag="CAI=",
 @id="bucket-name",
 @kind="storage#bucket",
 @location="EUROPE-WEST1",
 @metageneration=2,
 @name="bucket-name",
 @project_number=000000000000,
 @self_link="https://www.googleapis.com/storage/v1/b/bucket-name",
 @storage_class="REGIONAL",
 @time_created=
  #<DateTime: 2018-10-11T10:01:50+00:00 ((2458403j,36110s,330000000n),+0s,2299161j)>,
 @updated=
  #<DateTime: 2018-10-11T10:05:04+00:00 ((2458403j,36304s,537000000n),+0s,2299161j)>>

D, [2018-10-11T11:45:47.006798 #8992] DEBUG -- : Sending upload start command to https://www.googleapis.com/upload/storage/v1/b/bucket-name/o?name=files%2Fexample.com%2F20181010080000_example.com.1.dar
D, [2018-10-11T11:45:47.089064 #8992] DEBUG -- : Upload status active
D, [2018-10-11T11:45:47.089145 #8992] DEBUG -- : Sending upload command to https://www.googleapis.com/upload/storage/v1/b/bucket-name/o?name=files%2Fexample.com%2F20181010080000_example.com.1.dar&upload_id=xxxx&upload_protocol=resumable
D, [2018-10-11T11:45:52.167122 #8992] DEBUG -- : Error - #<HTTPClient::KeepAliveDisconnected: HTTPClient::KeepAliveDisconnected: Broken pipe>

Traceback (most recent call last):
    28: from Untitled.rb:22:in `<main>'
    27: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/bucket.rb:1142:in `create_file'
    26: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/service.rb:309:in `insert_file'
    25: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/service.rb:568:in `execute'
    24: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/service.rb:310:in `block in insert_file'
    23: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/generated/google/apis/storage_v1/service.rb:1898:in `insert_object'
    22: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/base_service.rb:360:in `execute_or_queue_command'
    21: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/http_command.rb:93:in `execute'
    20: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `retriable'
    19: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `times'
    18: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:61:in `block in retriable'
    17: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/http_command.rb:101:in `block in execute'
    16: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `retriable'
    15: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `times'
    14: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:61:in `block in retriable'
    13: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/http_command.rb:104:in `block (2 levels) in execute'
    12: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/upload.rb:254:in `execute_once'
    11: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/upload.rb:228:in `send_upload_command'
    10: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:765:in `post'
     9: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:854:in `request'
     8: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1104:in `follow_redirect'
     7: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1014:in `do_request'
     6: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1131:in `protect_keep_alive_disconnected'
     5: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1138:in `rescue in protect_keep_alive_disconnected'
     4: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1019:in `block in do_request'
     3: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1242:in `do_get_block'
     2: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:177:in `query'
     1: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:514:in `query'
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:524:in `rescue in query': HTTPClient::KeepAliveDisconnected: Broken pipe (HTTPClient::KeepAliveDisconnected)

Any ideas what's happening here?

Thanks 馃榾

storage p1 bug

Most helpful comment

We heard from the GCS team last night. They identified a possible culprit and deployed a fix. My repro script that was raising is now working as well.

We also received confirmation that large uploads of a known size should be sent all at once, and not in chunks. So the current behavior is correct.

It looks like this has been fixed so I will close this issue now. A huge thanks to everyone who participated. Let us know if you have any questions about this.

All 18 comments

@tobyaherbert Thanks for the issue. The log shows the broken pipe after the upload has been active for ~5 seconds. Is that 5 seconds constant? Or does it vary?

I've just run a few tests and it looks like it's always around the 5 second mark - it varies by about 0.4 seconds max.

We're experiencing this too.

Uploads with the same hosts/files/buckets via gsutil succeed.

@dkkerr Is there a file size that this starts happening for you? Or is it all file sizes?

And do you get the error ~5 seconds after the upload starts?

Does seem to primarily affect larger files. The behavior is random though so it's hard to get a good fix on it. Seems files less than 1GB generally work, 1.5GB and larger generally fail.

I have attached output of a test harness for various file sizes.

gcs-keepalive-1.txt

Enabling HTTPClient debugging shows an initial POST that appears to be terminated without a response by the Cloud Storage endpoint in roughly a couple of seconds, then a retry, then the exception.

Success rates at various upload sizes after 100 attempts of each:

900m 100%
1000m 98%
1010m 87%
1500m 0%

Looks like the GC Python client bundles JSON resumable uploads in 100MB chunks to be consistent with gsutil and based on advice of the GCS team. I wonder if GCS endpoints now limit these transactions.

Apologies for being so quiet, I've been actively looking at this and don't have much to share just yet. I have been able to reproduce this, however. Last time I actively worked on uploads we successfully uploaded large files, 5GB+, so this does seem to be a recent change in behavior. I don't know if this is intentional or not.

The upload implementation has changed a couple of times previously, and we are now wholly reliant on google-api-client for performing uploads. That means my understanding of how uploads are to be managed is out of date. Here is what I can say so far: I am seeing GCS return the X-Goog-Upload-Chunk-Granularity header now, and I don't remember it doing that before. It is possible its been doing that for a long time however, as I've not been working on uploads for a while. I don't know what this header is supposed to indicate to the client, if anything at all, but I _think_ it means that the client should chunk the upload. I don't see chunking implemented for Resumable Uploads in the google-api-client code. I've got a spike that chunks uploads by calculating the multiple of the provided chunk granularity closest to 1GB. I've been able to upload a 1.9GB file using those changes. So that seems to be a viable option to fix this, depending on what we hear from folks at Google.

Thanks for your patience. Hopefully we will get this resolved soon.

Here is my repro script using google-api-client instead of google-cloud-storage:

#!/usr/bin/ruby

def print_error e
  p "!"*42
  p e
  p e.instance_variables.map { |attribute| { attribute => e.instance_variable_get(attribute) } }.inspect
  p "!"*42

  print_error e.cause unless e.cause.nil?
end

require "logger"

logger = Logger.new($stderr)
logger.level = Logger::DEBUG

require "google/apis"

Google::Apis.logger = logger

require "googleauth"

project = ENV["GOOGLE_CLOUD_PROJECT"] || "<project-id>"
keyfile = ENV["GOOGLE_CLOUD_KEYFILE"] || "<keyfile-json-path>"

scope = "https://www.googleapis.com/auth/devstorage.full_control"

authorizer = Google::Auth::ServiceAccountCredentials.make_creds(
  json_key_io: File.open(keyfile),
  scope: scope
)
authorizer.fetch_access_token!

require "google/apis/storage_v1"

storage = Google::Apis::StorageV1::StorageService.new
storage.client_options.open_timeout_sec = 600
storage.client_options.read_timeout_sec = 600
storage.client_options.send_timeout_sec = 600
storage.authorization = authorizer

bucket_name = "<bucket-name>"

# local_path  = File.expand_path "~/Downloads/alpine-standard-3.8.1-x86_64.iso"
local_path  = File.expand_path "~/Downloads/ubuntu-18.04.1-desktop-amd64.iso"
remote_path = File.join "files", "example.org", File.basename(local_path)

begin
  uploaded_file = storage.insert_object bucket_name, upload_source: local_path, name: remote_path
  p uploaded_file
rescue => e
  print_error e
end

@blowmage Great, thanks!

Much appreciated - thanks Mike. Does appear to be a recent change; we were operating successfully until a day or two ago.

We also started seeing this today.

We maintain a minimal, internal GCS client for EOLed versions of Ruby. It鈥檚 based on Faraday and Net::HTTP and performs resumable uploads via the publicly documented JSON API. Using our custom client, I successfully uploaded a file that triggers this error in the official client.

@georgeclaghorn Not sure I understand. Are you saying your internal client is seeing the same behavior? Where connections for uploading larger files are being closed by the server? Or are you saying that your internal client can upload larger files, while google-cloud-storage cannot?

Our internal client doesn鈥檛 see the same behavior.

@georgeclaghorn Okay, thanks for the clarification. Does your internal client upload in chunks?

No, we send as much data as we can at once, like the official client does now.

Hi,
Our large uploads didn't work from October 12th to October 15th. I was about to apply the patch you submitted but it seems the issue has been fixed today.

Yup, seems to be fixed here too - all the large uploads are working again.

Sounds like it's been fixed server side.

We heard from the GCS team last night. They identified a possible culprit and deployed a fix. My repro script that was raising is now working as well.

We also received confirmation that large uploads of a known size should be sent all at once, and not in chunks. So the current behavior is correct.

It looks like this has been fixed so I will close this issue now. A huge thanks to everyone who participated. Let us know if you have any questions about this.

Was this page helpful?
0 / 5 - 0 ratings