Google-cloud-python: Storage: File upload fail from file with some caracters

Created on 12 Feb 2019  路  2Comments  路  Source: googleapis/google-cloud-python

Environment details

  1. Storage
  2. ubuntu 18.04
  3. Python 3.6
  4. google-cloud-core==0.29.1; google-cloud-storage==1.14.0

Steps to reproduce

  1. Open a file in w+
  2. Write a special caracter
  3. Try to upload this fd to a blob

Code example

import tempfile
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('test_ascii')
fd = tempfile.TemporaryFile('w+')
fd.write('\u0090')
fd.seek(0)
blob.upload_from_file(fd)

Stack trace

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/blob.py", line 1085, in upload_from_file
    client, file_obj, content_type, size, num_retries, predefined_acl
  File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/blob.py", line 995, in _do_upload
    client, stream, content_type, size, num_retries, predefined_acl
  File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/blob.py", line 942, in _do_resumable_upload
    response = upload.transmit_next_chunk(transport)
  File "/usr/local/lib/python3.6/dist-packages/google/resumable_media/requests/upload.py", line 396, in transmit_next_chunk
    self._process_response(result, len(payload))
  File "/usr/local/lib/python3.6/dist-packages/google/resumable_media/_upload.py", line 574, in _process_response
    self._get_status_code, callback=self._make_invalid)
  File "/usr/local/lib/python3.6/dist-packages/google/resumable_media/_helpers.py", line 93, in require_status_code
    status_code, u'Expected one of', *status_codes)
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>, 308)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/blob.py", line 1089, in upload_from_file
    _raise_from_invalid_response(exc)
  File "/usr/local/lib/python3.6/dist-packages/google/cloud/storage/blob.py", line 1960, in _raise_from_invalid_response
    raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.BadRequest: 400 PUT https://www.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=resumable&upload_id=AEnB2Uo0YkAqrWxqv4zVpm7bsO1mbUCGNIjxQPrQa4OV5HPad6kQatXYUF0UWVc8rWTMGEoYRIKH-QBUGmd35-u6FLRw04c4-A: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>, 308)

And when printing some extra information with the custom exception I have the following information
b'Invalid request. There were 3 byte(s) in the request body. There should have been 6 byte(s) (starting at offset 0 and ending at offset 5) according to the Content-Range header.'

When I open the file in binary mode, and then I encode the string it's working, but in my mind it was not necessary on linux?

Thanks!

question storage

Most helpful comment

@Alexis-Jacob The short answer to your question is that Blob objects always want bytes: Blob.upload_from_string does, as a convenience, encode text values to UTF-8, but Blob.upload_from_file doesn't have any way to check the mode of an already-opened file. So, either open your file in binary mode and write bytes to it, or else use NamedTemporaryFile and read from the name, e.g.:

import tempfile
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('test_ascii')
with tempfile.NamedTemporaryFile('w+') as fd:
    fd.write('\u0090')
    fd.flush()
    blob.upload_from_filename(fd.name)

All 2 comments

@Alexis-Jacob The short answer to your question is that Blob objects always want bytes: Blob.upload_from_string does, as a convenience, encode text values to UTF-8, but Blob.upload_from_file doesn't have any way to check the mode of an already-opened file. So, either open your file in binary mode and write bytes to it, or else use NamedTemporaryFile and read from the name, e.g.:

import tempfile
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('test_ascii')
with tempfile.NamedTemporaryFile('w+') as fd:
    fd.write('\u0090')
    fd.flush()
    blob.upload_from_filename(fd.name)

@tseaver maybe the documentation could be updated to explain this requirement?

Also, maybe Blob.upload_from_file could check if 'b' in file_obj.mode, couldn't it ? I believe that early error/warning would be better than unexpected error later.

The scenario does work with more usual non-ascii unicode code-points such as U+00E9 () (UTF8: C3A9); this delays the moment when bad code is found in production.

Was this page helpful?
0 / 5 - 0 ratings