This is a cross post originally detailed at https://issuetracker.google.com/issues/113672049
Essentially the problem is that in a Google Cloud Functions python endpoint the google-cloud-storage API is intermittently throwing a ProtocolError and ConnectionResetError when getting a blob.
Linking possibly relevant Golang issue: GoogleCloudPlatform/google-cloud-go#108
/cc @frankyn
I'm not sure what's happening here, acking for now in GCS library weekly.
I did not get additional input in the weekly meeting.
@brianmhunt could you tell me more about your use-case so I can try to reproduce?
Also what is the size of a file you're trying to read?
@frankyn Of course, thanks for the follow-up.
Our function is given an array of slices of PDFs and converts that to a single PDF. An oversimplified version is something like this:
def consolidatePdf(pdfSlices):
"""Create a PDF from the given slices.
pdfSlices is an iterable of dicts of this form: { url: "gs://", range: [start, end] }
"""
newPdf = PdfWriter()
for slice in pdfSlices:
reader = PdfReader(getPdfFromGS(slice['url']))
start, end = slice['range']
pdfWriter.append(reader.getPages(start, end))
return newPdf.getBytes()
Where the getPdfFromGS performs the blob storage read. If you think it'll help I'm happy to share the code if you email me at brianmhunt at gmail.com.
I've only seen the failure occur on the very first file being read (but that doesn't mean that the problem is limited to the first file being read).
The files failing are fairly small, in the 150k range.
Thanks, I'd like to keep this discussion as much as possible through Github. So if someone else hits a similar issue they can find it later.
Is PdfReader wrapping around the google-cloud-storage package? Could you share a portion of that code as well as PdfWriter?
Thanks, here's the salient bit of code that's throwing.
def generateUrlPdfBytesMap(urls):
upm = dict()
for url in urls:
bucket, path = urlToBucketPath(url)
account = path.split('/')[1]
blob = storage.bucket(bucket).get_blob(path) # 🔥
upm[url] = blob.download_as_string()
return upm
Where urlToBucketPath converts a gs:// or https:// Firebase url to the bucket, path pair per https://stackoverflow.com/questions/52064868.
It really is as simple as one could possibly imagine; I was going to put the PDF reader code in but it looks like it never gets called because the slurp (generateUrlPdfBytesMap) occurs before.
Related, according to [email protected], it appears that a newer version of google-cloud-storage is available (i.e. I was using 1.10.0; version 1.11.0 is out).
I will switch to the new version and report any occurrences.
This continues to occur with google-cloud-storage version 1.11.0.
This issue occurs to me as well. But in my case, it throws connection error when I'm trying to get a bucket before signing a url.
Code:
bucket = storage.get_bucket(bucket)
Traceback:
File "/code/xxx/yyy/models.py", line 106, in generate_signed_url
storage_bucket = STORAGE_CLIENT.get_bucket(GCS_BUCKET)
File "/usr/lib/python3.6/site-packages/google/cloud/storage/client.py", line 225, in get_bucket
bucket.reload(client=self)
File "/usr/lib/python3.6/site-packages/google/cloud/storage/_helpers.py", line 108, in reload
_target_object=self)
File "/usr/lib/python3.6/site-packages/google/cloud/_http.py", line 290, in api_request
headers=headers, target_object=_target_object)
File "/usr/lib/python3.6/site-packages/google/cloud/_http.py", line 183, in _make_request
return self._do_request(method, url, headers, data, target_object)
File "/usr/lib/python3.6/site-packages/google/cloud/_http.py", line 212, in _do_request
url=url, method=method, headers=headers, data=data)
File "/usr/lib/python3.6/site-packages/google/auth/transport/requests.py", line 201, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 495, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Same here, my context is I'm in a pubsub worker and am retrieving a URL to download and store into a bucket. I've spent an ungodly amount of time trying to debug the retrieval method I was running only to realise it's actually the cloud stores https machinery that is failing.
I get the same error as OP @brianmhunt and @arvindnrbt.
Connection reset by peer
It happens during:
storage.bucket(bucket).get_blob(path)bigquery_client.insert_rows(table, rows_to_insert).This is running Google Cloud Functions with Python 3.7 and google-cloud-storage==1.11.0.
Not all the time, about 10% failure rate. Function deployed to us-east1 (I also tried us-central1, about the same).
@anyone-watching, is this still occurring ? We were considering migrating from AWS Lambda but this may hold us up.
I'm heads down on another issue. @tseaver could you take a look?
@frankyn This is a request for the same feature as [Python] Storage: automatic retry behavior for transient server failures (exponential backoff + jitter) in our feature backlog (we would just need to ensure that ProtocolError and ConnectionError are tracked as transient errors for that feature).
Would like to add that I'm experiencing same error in GCF environment using google-cloud-storage==1.12.0
I see that 1.13.0 has been released and I may try that to see if problem persists.
Hi,
I'm also having this issue. There is a scenario for which this occurs for me every time.
I send a dataset to a non-google ML provider. We then wait for the prediction to finish. When the ML prediction does finish, I take the .CSV file and then I wish to deposit it on the Google Storage. The wait can be up to 10mins (normally 6mins). I get the ConnectionResetError. If this is useful I'm happy to share more details.
g-c-s==1.13.0
runtime python37
Remember CF times out after 9min so the error could be "normal".
I would check the Stackdriver log to see what exact http code is returned and raise the issue on issuetracker. Not here as it sounds like related to the core API.
If, as @yiga2 suggests, the CF run is being terminated due to a time limit, there is nothing we can do in google-cloud-storage to address it: the Python process itself is being killed in that case.
Not to disregard @stephenk289 s issue, but lots of us are having issues when GCS throws 50x errors that are not related to timeouts.
Some retry semantics built-in would be really useful.
Thanks very much for the answers / feedback @tseaver and @yiga2.
If you'll forgive me a follow-up -
I tried to respond to the error by recreating my client object [client = storage.Client()] as this was the closest I could find to re-establishing a connection so that I could interact afresh with Google Storage with a decaying re-try logic - it didn't seem to work. Given the advice that there is a time limit on any connection to GS (~9mins), I will, therefore, risk losing said connections during long-wait times. Any steer most welcome.
I'm also hitting this error from Cloud Function trying to read from Cloud Storage.
The majority of invocations works, but I get this every so often, and it makes a lot of noise even if I retry.
My CF are finishing in under 2 seconds in most cases, and I've followed the recommendations on https://cloud.google.com/functions/docs/bestpractices/networking to avoid re-establishing connections, which I suspect is also the culprit of the problem.
Could it be that the same connection are (as intended) re-used for multiple invocations of the CF, and eventually the remote server (GCS) drops it when I try to use it?
After a lot of research and interactions with G support, it turns out that indeed, connection may either timeout (our case, resumable upload) or just disconnect sporadically - happens at other cloud storage providers.
No big deal as ConnectionResetError (104) is a retriable error but this must be handled on the client side.
Unfortunately google-client-python - and many others - is delegating exponential backup to lower-level dependencies - signaled by deprecated num_retries - which means less control on when retries should trigger. And these dependencies do not catch 104 as retriable.
If you search out there, you will see there is a lot of debate (resistance ?) on where and how best to address this, in googleapi libs, requests or urllib3 (as I even read this may be Py3 specifc),...
For me, this is as simple as adding 104 to the list of transient errors (500-505 range or so) but I may oversimplify.
Despite the numerous posts - many recent ones -, no real resolution.
If you can't wait for resolution or patch your own fork, you can look at gcsfs which we use for streaming from/to GCS (from GCF). The ConnectionResetError is retriable there - it does log an exception (looks like an Error in SD logging) but the retries do happen and function does not end abruptly.
@yiga2 Can you elaborate on how "ConnectionResetError" is retriable in gcsfs? Is there an argument I need to pass to the constructor to enable this? My Cloud Function is crashing right after the error is thrown.
I had found that that exception is simply a generic unhandled exception. Be sure to wrap all of your connections . I.e Api calls, GCS file transfers etc.. with try except error handling. You will see that one of your connections maybe failing. But it could be for any number of reasons.
-------- Original message --------
From: Dustin Farris notifications@github.com
Date: 12/27/18 1:43 PM (GMT-05:00)
To: googleapis/google-cloud-python google-cloud-python@noreply.github.com
Cc: "Bugbee, Jeffrey" Jeffrey.Bugbee@Kronos.com, Manual manual@noreply.github.com
Subject: Re: [googleapis/google-cloud-python] Cloud Functions & Storage: fails intermittently with ProtocolError + ConnectionResetError (#5879)
@yiga2https://github.com/yiga2 Can you elaborate on how "ConnectionResetError" is retriable in gcsfs? Is there an argument I need to pass to the constructor to enable this? My Cloud Function is crashing right after the error is thrown.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com/googleapis/google-cloud-python/issues/5879#issuecomment-450208017, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AqeO-fPy-s4RxZ_-iRQ67rP6kVdKWDsuks5u9RTcgaJpZM4WWQoH.
@dustinfarris @jbugbee126 you don't need to wrap a connection call in gcsfs for the 104 error - although a good practice overall for any (new) uncaught error.
See https://github.com/dask/gcsfs/issues/12 and related commit , gcsfs just logs the retriable 104 error as debug - which shows as such in Stackdriver Logging - and your GCF should not break on it.
(Thanks @martindurant, the main contributor (and author) of gcsfs !)
CHEERS!
Lots of stuff to learn from the comments itself.
So, I was trying to do a similar thing, like accessing a file from GCS and then updating it. Same error still exists, so other than adding a 'try except' statement and trying to retrieve the file again, is there any other solution? Because this also might be unreliable.
@crwilcox @frankyn @jkwlui @tseaver (Is there a storage team alias?)
Would you mind looking into the internal issue? It's been closed as 'Won't fix - not reproducible', but folks have commented here and over on the issue since then.
@busunkim96 what's the internal tracking issue?
@brianmhunt ConnectionResetError seems like the kind of error one might see when the VM is being torn down. Can you tell whether your function is failing due to a time limit? If so, there isn't really much we can do in google-cloud-storage to mitigate the issue.
@tseaver Thanks. The problem was not the time limit. i.e. it can fail in the first ~5 seconds.
@brianmhunt OK, good to know. Here is a workaround:
from urllib3.exceptions import ProtocolError
from google.api_core import retry
predicate = retry.if_exception_type(
ConnectionResetError, ProtocolError)
reset_retry = retry.Retry(predicate)
def generateUrlPdfBytesMap(urls):
upm = dict()
for url in urls:
bucket_name, path = urlToBucketPath(url)
account = path.split('/')[1]
bucket = storage.bucket(bucket_name)
blob = bucket.get_blob(path) # Note: makes API call
upm[url] = reset_retry(blob.download_as_string)()
return upm
@tseaver awesome, have passed this on to our devs. Some GCP client libraries seem to expose a retry option, some don't. This will be very handy, thanks!
Thanks @tseaver, IIUC, updating retry strategy for download and upload blob can help with GCF time limits when the VM is unscheduled.
How are retry defaults defined in the GCS Python library? Do they mainly come from api_core and do you have examples of best practices with modifying retry strategy in other Python manual libraries?
@tritone is a new GCS DPE who will be taking a look at fixing it.
team alias: @googleapis/storage
We are working to correct the issues with Python libraries retry strategy and will continue to update on-going work in https://github.com/googleapis/google-cloud-python/issues/9298.
As stated by @tseaver, the workaround is the following:
from urllib3.exceptions import ProtocolError
from google.api_core import retry
predicate = retry.if_exception_type(
ConnectionResetError, ProtocolError)
reset_retry = retry.Retry(predicate)
def generateUrlPdfBytesMap(urls):
upm = dict()
for url in urls:
bucket_name, path = urlToBucketPath(url)
account = path.split('/')[1]
bucket = storage.bucket(bucket_name)
blob = bucket.get_blob(path) # Note: makes API call
upm[url] = retry_reset(blob.download_as_string)()
return upm
Thank you for your patience.
Two weeks ago I've started getting ConnectionResetError 10% of the calls to blob.download_to_filename() from a GCE instance.
I've tried wrapping download_to_filename with Retry but still getting the same error.
Is there something wrong with my code?
How can I verify that ConnectionResetError was caught and the download failed several times?
from google.api_core import retry
from google.api_core.exceptions import InternalServerError
from google.api_core.exceptions import TooManyRequests
predicate = retry.if_exception_type(ConnectionResetError, InternalServerError, TooManyRequests)
r = retry.Retry(predicate=predicate)
r(blob.download_to_filename)(filename)
I've wrapped the call to blob.download_to_filename with try/catch and the exception e.__class__.__name__ gives me ChunkedEncodingError.
Why does the error say ConnectionResetError while the exception is ChunkedEncodingError?
Shouldn't ChunkedEncodingError added to google.api_core.retry.if_transient_error?
Most helpful comment
@brianmhunt OK, good to know. Here is a workaround: