I'm on Python 3.5.2 with google.cloud.storage.__version__ = '0.23.0'.
I'm attempting to upload objects to a bucket such that the object supports decompressive gzip transcoding. I haven't been able to figure out how to accomplish this after searching through the documentation and the code as well as reviewing existing issues. My most promising attempt was setting the blob.content_encoding property, which seems like it should work but doesn't. See below for an example.
Does/can the API support this?
import google.cloud.storage
import gzip
import os
import requests
import datetime
import io
BUCKET_NAME = ...
GOOGLE_APPLICATION_CREDENTIALS_PATH = ...
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = GOOGLE_APPLICATION_CREDENTIALS_PATH
client = google.cloud.storage.Client()
bucket = client.get_bucket(BUCKET_NAME)
blob = bucket.blob('plaintext')
blob.content_type = 'text/plain'
with io.BytesIO() as f:
f.write(b' '.join(100*(b'plaintext', )))
blob.upload_from_file(f, size=f.tell(), rewind=True)
url = blob.generate_signed_url(datetime.datetime.max)
"""
This prints:
b'plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext plaintext' None
"""
response = requests.get(url)
print(response.content, response.headers.get('Content-Encoding', None))
blob = bucket.blob('compressed')
blob.content_type = 'text/plain'
blob.content_encoding = 'gzip'
with io.BytesIO() as f:
with gzip.GzipFile(fileobj=f, mode='wb', compresslevel=9) as fgz:
fgz.write(b' '.join(100*(b'compressed', )))
blob.upload_from_file(f, size=f.tell(), rewind=True)
url = blob.generate_signed_url(datetime.datetime.max)
"""
This prints:
b'\x1f\x8b\x08\x00\xac}\xbbX\x02\xffK\xce\xcf-(J-.NMQH\x1ee\x8e2G\x99\xa3L2\x99\x00/\x80\x15\xa7K\x04\x00\x00' None
"""
response = requests.get(url)
print(response.content, response.headers.get('Content-Encoding', None))
"""
If I manually set the content-encoding header through the metadata option on this object in the console, I get the appropriate response:
b'compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed compressed' gzip
"""
response = requests.get(url)
print(response.content, response.headers.get('Content-Encoding', None))
Hi @brianjpetersen,
Thanks for raising this, and sorry it took a couple days for us to say anything in response.
Let me summarize to make sure I understand the problem. Basically there seems to be no obvious valid way to set the Content-Encoding in the metadata to gzip and have it stick in storage. Is that correct?
That's right. Setting the content_encoding attribute to 'gzip' on the object before uploading doesn't actually result in proper transcoding on subsequent GETs to Cloud Storage. Furthermore, the metadata on the uploaded object (viewed in the Cloud Storage in the web console) doesn't reflect that the Content-Encoding was set to 'gzip' (see below).

Thanks. We will look into it.
This is a duplicate of several bugs, summed up in this comment, which also has a work-around.
Many thanks @pdknsk.
This workaround fixes the problem, although as noted in #754, the property update isn't atomic which has all sorts of nasty implications. As another commenter noted in a linked thread, unfortunately this prevents me from using gcloud-python (and Google Cloud Platform) at this time.
I remembered a patch I had once used, which I've updated now. An alternative is to use the API directly, which is more complex.
This unfortunately didn't seem to do the trick for the content_encoding property.
Works for me.
>>> compressobj = zlib.compressobj(9, zlib.DEFLATED, 31) # 31 = gzip
>>> text_gzip = compressobj.compress('text') + compressobj.flush()
>>> len(text_gzip)
24
>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.content_encoding = 'gzip'
>>> text.upload_from_string(text_gzip)
>>> text.reload()
>>> text.size
24
>>> req = requests.get(text.public_url)
>>> req.content
'text'
>>> req.headers.get('Content-Encoding')
'gzip'
In the browser too.
Apologies @pdknsk, pip and I weren't getting along last night. This does indeed address my need. You've been super helpful - thanks.
Although this contrived example works, now I'm getting a gzip-decoding error from requests with larger payloads. Using your example (slightly modified for Python 3):
>>> compressobj = zlib.compressobj(9, zlib.DEFLATED, 31) # 31 = gzip
>>> text_gzip = compressobj.compress(100*b'text') + compressobj.flush()
>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.content_encoding = 'gzip'
>>> text.upload_from_string(text_gzip)
>>> req = requests.get(text.public_url)
Traceback (most recent call last):
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 192, in _decode
data = self._decoder.decompress(data)
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 58, in decompress
return self._obj.decompress(data)
zlib.error: Error -3 while decompressing data: invalid distance too far back
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/models.py", line 664, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 349, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 503, in read_chunked
flush_decoder=False)
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 197, in _decode
"failed to decode it." % content_encoding, e)
requests.packages.urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: invalid distance too far back',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 47, in <module>
response = requests.get(url)
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/api.py", line 71, in get
return request('get', url, params=params, **kwargs)
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/api.py", line 57, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/sessions.py", line 617, in send
r.content
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/models.py", line 741, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/Users/brianjpetersen/Anaconda/python3/anaconda/lib/python3.5/site-packages/requests/models.py", line 669, in generate
raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: invalid distance too far back',))
Is this possibly related to #1724?
@brianjpetersen googling that error for requests got me to this SO question which lead me to the following issue.
See: https://bugs.python.org/issue27164
It sounds like there's an issue with Python 3.5.2. If you upgrade do you still have the same issue?
That seems to be it. It's working on my 2.7 binary. Thanks.
OK great! I'm going to close this then.
Using Python 2.7 and the latest (as of this date) google-cloud module, this problem still occurs when using upload_from_string.
To make it even stranger, I get this intermittently on Python 3.7.1 and latest google-cloud-python.

Most helpful comment
Works for me.
In the browser too.